Ranking - Convex Risk Minimization

The problem of ranking (rank regression) has become popular in the machine learning community. This theory relates to problems, in which one has to predict (guess) the order between objects on the basis of vectors describing their observed features. In many ranking algorithms a convex loss function is used instead of the 0-1 loss. It makes these procedures computationally efficient. Hence, convex risk minimizers and their statistical properties are investigated in this paper. Fast rates of convergence are obtained under conditions, that look similarly to the ones from the classification theory. Methods used in this paper come from the theory of U-processes as well as empirical processes.


Authors:



References:
[1] S. Boucheron, O. Bousquet, G. Lugosi, Theory of classification: a survey
and recent advances, Probability and Statistics, vol. 9, pp. 323-375, 2005.
[2] L. Devroye, L. Gyorfi, G. Lugosi, A probabilistic theory of pattern
recognition. Springer-Verlag, New York, 1996.
[3] S. Clemenon, G. Lugosi, N. Vayatis, Ranking and empirical minimization
of U-statistics, Ann. Statist., vol. 36, pp. 844-874, 2008.
[4] V. H. de la Pena, E. Gine, Decoupling: from dependence to independence.
Springer-Verlag, New York, 1999.
[5] R. J. Serfling, Approximation theorems of mathematical statistics. Wiley,
New York, 1980.
[6] J. Abrevaya, Computation of the maximum rank correlation estimator,
Economics Letters, vol. 62, pp. 279-285, 1999.
[7] P. L. Bartlett, S. Ben-David, Hardness results for neural network approximation
problems, Theoretical Computer Science, vol. 284, pp. 53-66,
2002.
[8] V. N. Vapnik, Statistical learning theory. John Wiley, New York, 1998.
[9] C. Cortes, V. N. Vapnik, Support vector networks, Machine Learning, vol.
20, pp. 273-297, 1995.
[10] R. E. Schapire, Y. Freund, P. L. Bartlett, W. S. Lee, Boosting the margin:
a new explanation for the effectiveness of voting methods, Ann. Statist.,
vol. 26, pp. 1651-1686, 1998.
[11] Y. Freund, R. Iyer, R. E. Schapire, Y. Singer, An efficient boosting
algorithm for combining preferences, J. Machine Learning Research, vol.
4, pp. 933-969, 2004.
[12] W. Niemiro, W. Rejchel, Rank correlation estimators and their limiting
distributions, Statistical Papers, to be published.
[13] A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning,
Ann. Statist., vol. 32, pp. 135-166, 2004.
[14] M. A. Arcones, E. Gine (1993). Limit theorems for U-processes. Ann.
Probab., vol. 21, pp. 1494-1542.
[15] P. Major, An estimate of the supremum of a nice class of stochastic
integrals and U-statistics, Probab. Theory Related Fields, vol. 134, pp.
489-537, 2006.
[16] V. Koltchinskii, D. Panchenko, Empirical margin distributions and
bounding the generalization error of combined classifiers, Ann. Statist.,
vol. 30, pp. 1-50, 2002.
[17] P. L. Bartlett, O. Bousquet, S. Mendelson, Local Rademacher complexities,
Ann. Statist., vol. 33, pp. 1497-1537, 2005.
[18] P. L. Bartlett, M. I. Jordan, J. D. Mcauliffe, Convexity, Classification,
and Risk Bounds, Journal of the American Statistical Association, vol.
101, pp. 138-156, 2006.
[19] P. Massart, Some applications of concentration inequalities to statistics,
Probability theory. Ann. Fac. Sci. Toulouse Math., vol. 9 , pp. 245-303,
2000.
[20] P. Massart, Concentration Inequalities and Model Selection. Springer,
Berlin, 2007.
[21] S. Mendelson, Improving the sample complexity using global data, IEEE
Trans. Inform. Theory, vol. 48, pp. 1977-1991, 2002.
[22] A. Pakes, D. Pollard, Simulation and the asymptotics of optimization
estimators, Econometrica, vol. 57, pp. 1027-1057, 1989.
[23] S. Mendelson, A few notes on statistical learning theory. Advanced
Lectures on Machine Learning. Lecture Notes in Comput. Sci., pp. 1-
40. Springer, New York, 2003.
[24] D. Pollard, Asymptotics via Empirical Processes., Statist. Sci., vol. 4,
pp. 365-366, 1989.
[25] M. A. Arcones, E. Gine, U-processes indexed by VapnikChervonenkis
classes of functions with applications to asymptotics and bootstrap of Ustatistics
with estimated parameters, Stochastic Process. Appl., vol. 52,
pp. 17-38, 1994.
[26] D. Nolan, D. Pollard, U-processes: rates of convergence., Ann. Statist.,
vol. 15, pp. 780-799, 1987.