Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

In sports, individuals and teams are typically interested
in final rankings. Final results, such as times or distances, dictate
these rankings, also known as places. Places can be further associated
with ordered random variables, commonly referred to as order
statistics. In this work, we introduce a simple, yet accurate order
statistical ordinal regression function that predicts relay race places
with changeover-times. We call this function the Fenton-Wilkinson
Order Statistics model. This model is built on the following educated
assumption: individual leg-times follow log-normal distributions.
Moreover, our key idea is to utilize Fenton-Wilkinson approximations
of changeover-times alongside an estimator for the total number
of teams as in the notorious German tank problem. This original
place regression function is sigmoidal and thus correctly predicts
the existence of a small number of elite teams that significantly
outperform the rest of the teams. Our model also describes how place
increases linearly with changeover-time at the inflection point of the
log-normal distribution function. With real-world data from Jukola
2019, a massive orienteering relay race, the model is shown to be
highly accurate even when the size of the training set is only 5%
of the whole data set. Numerical results also show that our model
exhibits smaller place prediction root-mean-square-errors than linear
regression, mord regression and Gaussian process regression.




References:
[1] P. A. Gutierrez, M. Perez-Ortiz, J. Sanchez-Monedero,
F. Fernandez-Navarro, and C. Hervas-Martinez, “Ordinal regression
methods: Survey and experimental study,” IEEE Trans. Knowl. and
Data Eng., vol. 28, no. 1, pp. 127–146, 2016.
[2] M. Raghu and E. Schmidt. (2020, March) A survey of deep learning
for scientific discovery. [Online]. Available: arXiv:2003.11755
[3] M. Strand and D. Boes, “Modeling road racing times of competitive
recreational runners using extreme value theory,” Am. Stat., vol. 52,
no. 3, pp. 205–210, 1998.
[4] H. Spearing, J. A. Tawn, D. B. Irons, T. Paulden, and G. A. Bennett.
(2020, June) Ranking, and other properties, of elite swimmers using
extreme value theory. [Online]. Available: arXiv:1910.10070
[5] L. F. Fenton, “The sum of log-normal probability distibutions in
scattered transmission systems,” IRE Trans. Commun. Syst., vol. 8, pp.
57–67, 1960.
[6] R. I. Wilkinson, “Unpublished, cited in 1967,” Bell Telephone Labs,
1934.
[7] B. R. Cobb, R. Rum´ı, and A. Salmer´on, “Approximating the distribution
of a sum of log-normal random variables,” in Proc. 6th Eur. Workshop
Probab. Graph. Models, 2012, pp. 67–74.
[8] S. Nadarajah, “Explicit expressions for moments of log normal order
statistics,” Economic Quality Control, vol. 23, no. 2, pp. 267–279, 2008.
[9] E. T. Jaynes, “Information theory and statistical mechanics,” Phys. Rev.,
vol. 106, no. 4, pp. 620–630, 1957.
[10] E. J. Allen, P. M. Dechow, D. G. Pope, and G. Wu,
“Reference-dependent preferences: Evidence from marathon runners,”
Manag. Sci., vol. 63, no. 6, pp. 1657–2048, 2017.
[11] D. Ruiz-Mayo, E. Pulido, and G. Mart´ı˜noz, “Marathon performance
prediction of amateur runners based on training session data,” in Proc.
Mach. Learn. and Data Min. for Sports Anal., 2016.
[12] J. Esteve-Lanao, S. D. Rosso, E. Larumbe-Zabala, C. Cardona,
A. Alcocer-Gamboa, and D. A. Boullosa, “Predicting recreational
runners’ marathon performance time during their training preparation,” J.
Strength Cond. Res. doi: 10.1519/JSC.0000000000003199 [Epub ahead
of print], 2019.
[13] K. A. Wang, G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and
A. G. Wilson, “Exact gaussian processes on a million data points,” in
Proc. Adv. Neural Inf. Process. Syst. 32, 2019, pp. 14 648–14 659.
[14] C. E. Rasmussen and C. K. I. Williams, “Gaussian processes for machine
learning,” The MIT Press, 2006.
[15] Gpytorch regression tutorial. [Online]. Available:
https://gpytorch.readthedocs.io/en/latest/examples/01 Exact GPs/
Simple GP Regression.html
[16] Mord: Ordinal regression in python. [Online]. Available: https:
//pythonhosted.org/mord/
[17] F. Pedregosa-Izquierdo, “Feature extraction and supervised learning
on fmri: from practice to theory,” Ph.D. dissertation, Universit´e
Pierre-et-Marie-Curie, 2015.
[18] Jukola 2019. [Online]. Available: https://results.jukola.com/tulokset/en/
j2019 ju/
[19] E. Limpert, W. A. Stahel, and M. Abbt, “Log-normal distributions across
the sciences: Keys and clues,” Bioscience, vol. 51, pp. 341–352, 2001.
[20] P. Chen, R. Tong, G. Lu, and Y. Wang, “Exploring travel time
distribution and variability patterns using probe vehicle data: Case study
in beijing,” J. Adv. Transp., pp. 1–13, 2018.
[21] R. Ruggles and H. Brodie, “An empirical approach to economic
intelligence in world war ii,” J. Am. Stat. Assoc., vol. 42, no. 237, pp.
72–91, 1947.
[22] L. A. Goodman, “Serial number analysis,” J. Am. Stat. Assoc., vol. 47,
no. 270, pp. 622–634, 1952.