#### DMCA

## RANKING AND EMPIRICAL MINIMIZATION OF U-STATISTICS (2008)

Citations: | 37 - 2 self |

### Citations

2216 | Probability inequalities for sums of bounded random variables,”
- Hoeffding
- 1963
(Show Context)
Citation Context ...X → R is a symmetric real-valued function. U-statistics have been studied in depth and their behavior is well understood. One of the classical inequalities concerning U-statistics is due to Hoeffding =-=[23]-=- which implies that, for all t>0, P{|Un − EUn| >t}≤2e −2⌊(n/2)⌋t2 Hoeffding also shows that, if σ 2 = Var(q(X1,X2)),then (A.1) ≤ 2e −(n−1)t2 . ( P{|Un − EUn| >t}≤2exp − ⌊(n/2)⌋t2 2σ 2 ) . + 2t/3 It is... |

1766 |
Signal detection theory and psychophysics.
- Green, Swets
- 1966
(Show Context)
Citation Context ...pectation and c is some constant. APPENDIX B: CONNECTION WITH THE ROC CURVE AND THE AUC CRITERION In the bipartite ranking problem, the ROC curve (ROC standing for Receiver Operator Characteristic;see=-=[20]-=-) and the AUC criterion are popular measures for evaluating the performance of scoring functions in applications. ) ,RANKING AND ERM OF U-STATISTICS 871 Let s : X → R be a scoring function. The ROC c... |

1326 | A Probabilistic Theory of Pattern Recognition. - Devroye, Gyorfi, et al. - 1996 |

1032 |
Approximation Theorems in Mathematical Statistics.
- Serfling
- 1980
(Show Context)
Citation Context ...cribed below. The U-statistic Un is said to be degenerate if its kernel q satisfies E(q(x, X)) = 0 forall x ∈ X. There are two basic representations of U-statistics which we recall next (see Serfling =-=[37]-=- for more details). Average of “sums-of-i.i.d.” blocks. This representation is the key for obtaining the “first-order” results of Section 3 for nondegenerate U-statistics. The U-statistic Un can be ex... |

727 | An efficient boosting algorithm for combining preferences.
- Freund, Iyer, et al.
- 2003
(Show Context)
Citation Context ...ing attention both in the statistical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri [12], Cossock and Zhang [13], Freund, Iyer, Schapire and Singer =-=[17]-=-, Rudin [35], Usunier et al. [44] and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two different observations and decide which one is “better.” For example, in an applicatio... |

555 |
On the method of bounded differences.
- McDiarmid
- 1989
(Show Context)
Citation Context ...se this result to derive probabilistic performance bounds for the empirical risk minimizer. For example, by taking ψ(x) = eλx for some λ>0, and using the bounded differences inequality (see McDiarmid =-=[34]-=-), we have ( E exp ( λ L(rn) − inf r∈R L(r) )) ≤ E exp(4λRn) ( ≤ exp 4λERn + 4λ2 ) n − 1 where we used the fact that Rn may be considered as a function of ⌊n/2⌋ independent random vectors (ɛi,Zi,⌊n/2⌋... |

375 |
A class of statistics with asymptotically normal distribution.
- Hoeffding
- 1948
(Show Context)
Citation Context ...istic Wn is of is thus the leading term in this orthogonal decomposithe order 1/n2 . Tn tion. Indeed, the limit distribution of √ n(Un − EUn) is the normal distribution N (0, 4Var(E(q(X1,X)|X1)) (see =-=[22]-=-). This suggests that inequality (A.1) may be quite loose. Indeed, exploiting further Hoeffding’s decomposition (combined with arguments related to decoupling, randomization and hypercontractivity of ... |

330 | On the mathematical foundations of learning.
- Smale, Cucker
- 2001
(Show Context)
Citation Context ...ightforward way if the approximation error inff ∈FB A(f ) − A∗ can be guaranteed to go to zero as B →∞. For the approximation properties of such kernel classes we refer the reader to Cucker and Smale =-=[14]-=-, Scovel and Steinwart [36], Smale and Zhou [38], Steinwart [39] etc. REMARK 7(Fast rates). A natural question is whether the arguments of Section 4 can be extended to prove fast rates of convergence ... |

220 | Optimal aggregation of classifiers in statistical learning.
- Tsybakov
- 2004
(Show Context)
Citation Context ...ry classification may be obtained if one can control the variance of the excess risk by its expected value. In classification this can be guaranteed under certain “low-noise” conditions (see Tsybakov =-=[43]-=-, Massart and Nédélec [33], Koltchinskii [26]). Next we examine the possibilities of obtaining such improved performance bounds for empirical ranking risk minimization. The main message is that in the... |

212 | On the influence of the kernel on the consistency of support vector machines,”
- Steinwart
- 2001
(Show Context)
Citation Context ...an be guaranteed to go to zero as B →∞. For the approximation properties of such kernel classes we refer the reader to Cucker and Smale [14], Scovel and Steinwart [36], Smale and Zhou [38], Steinwart =-=[39]-=- etc. REMARK 7(Fast rates). A natural question is whether the arguments of Section 4 can be extended to prove fast rates of convergence for minimizers of the convex ranking risk. For ordinary binary c... |

181 |
New concentration inequalities in product spaces.
- Talagrand
- 1996
(Show Context)
Citation Context ...n given X1,...,Xn). Then we write EZ q ɛ = EEɛZ q ɛ and study the quantity EɛZ q ɛ , with the Xi fixed. But then Zɛ is a so-called Rademacher chaos whose tail behavior has been studied, see Talagrand =-=[42]-=-, Ledoux [28], Boucheron, Bousquet, Lugosi and Massart [9]. In particular, for any q ≥ 2, (EɛZ q ɛ )1/q ≤ EɛZɛ + ( ( ) q ) 1/q Eɛ Zɛ − EɛZɛ + ≤ EɛZɛ + 3 √ q EɛUɛ + 4qB with Uɛ defined above and B = su... |

158 | Empirical margin distributions and bounding the generalization error of combined classifiers.
- Koltchinskii, Panchenko
- 2002
(Show Context)
Citation Context ...o derive an upper bound for the expected supremum appearing in the exponent. This may be done by standard symmetrization and contraction inequalities. In fact, by mimicking Koltchinskii and Panchenko =-=[27]-=- (seealsothe proof of Lemma 2 in Lugosi and Vayatis [30]), we obtain E sup f ∈F ( 1 ⌊n/2⌋ ∑ ⌊n/2⌋ i=1 ≤ 4Bφ ′ (B)E sup f ∈F φ ( − sgn ( ) ( )) Zi,⌊n/2⌋+i · f Xi,X⌊n/2⌋+i − A(f ) ( 1 ⌊n/2⌋ ∑ ⌊n/2⌋ i=1 ... |

158 | Statistical behavior and consistency of classification methods based on convex risk minimization’,
- Zhang
- 2004
(Show Context)
Citation Context ...erstanding the statistical behavior of such methods, see, for example, Bartlett, Jordan and McAuliffe [5], Blanchard, Lugosi and Vayatis [7], Breiman [10], Jiang [25], Lugosi and Vayatis [30] andZhang=-=[48]-=-. The purpose of this section is to extend the principle of convex risk minimization to the ranking problem studied in this paper. Our analysis also provides a theoretical framework for the analysis o... |

148 | AUC optimization vs. error rate minimization.
- Cortes, Mohri
- 2004
(Show Context)
Citation Context ... credit-risk screening, the ranking problem has received increasing attention both in the statistical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri =-=[12]-=-, Cossock and Zhang [13], Freund, Iyer, Schapire and Singer [17], Rudin [35], Usunier et al. [44] and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two different observations... |

137 | Some limit theorems for empirical processes,
- Gine, Zinn
- 1984
(Show Context)
Citation Context ...decreasing function ψ, ( Eψ L(rn) − inf r∈R L(r) ) ≤ Eψ(4Rn). PROOF. The inequality follows immediately from (3.1), Lemma A.1, anda standard symmetrization inequality; see, for example, Giné and Zinn =-=[19]-=-. □ ∣ ,850 S. CLÉMENÇON, G. LUGOSI AND N. VAYATIS One may easily use this result to derive probabilistic performance bounds for the empirical risk minimizer. For example, by taking ψ(x) = eλx for som... |

126 |
Local Rademacher complexities and oracle inequalities in risk minimization.
- Koltchinskii
- 2006
(Show Context)
Citation Context ...l learning, theory of classification, VC classes, fast rates, convex risk minimization, moment inequalities, U-processes. 844RANKING AND ERM OF U-STATISTICS 845 Bousquet and Lugosi [8], Koltchinskii =-=[26]-=-, Massart [32] for surveys and recent development. The important feature of the ranking problem is that natural estimates of the ranking risk involve U-statistics. Therefore, our methodology is based ... |

125 |
Theory of Pattern Recognition.
- Chervonenkis
- 1974
(Show Context)
Citation Context ...ve universally consistent ranking rules. Our approach here is different. Instead of local averages, we consider empirical minimizers of U-statistics, more in the spirit of empirical risk minimization =-=[45]-=- popular in statistical learning theory, see, for example, Bartlett and Mendelson [6], Boucheron, Received March 2006; revised April 2007. 1 Supported in part by the Spanish Ministry of Science and Te... |

124 | Adapting ranking SVM to document retrieval,”
- Cao, Xu, et al.
- 2006
(Show Context)
Citation Context ...o document retrieval or credit-risk screening, the ranking problem has received increasing attention both in the statistical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. =-=[11]-=-, Cortes and Mohri [12], Cossock and Zhang [13], Freund, Iyer, Schapire and Singer [17], Rudin [35], Usunier et al. [44] and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two... |

113 |
Concentration inequalities and model selection.
- Massart
- 2006
(Show Context)
Citation Context ...eory of classification, VC classes, fast rates, convex risk minimization, moment inequalities, U-processes. 844RANKING AND ERM OF U-STATISTICS 845 Bousquet and Lugosi [8], Koltchinskii [26], Massart =-=[32]-=- for surveys and recent development. The important feature of the ranking problem is that natural estimates of the ranking risk involve U-statistics. Therefore, our methodology is based on the theory ... |

112 | Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension.
- Haussler
- 1995
(Show Context)
Citation Context ... for EZɛ, EUɛ, EM and ρ. To bound EZɛ, observe that Zɛ is a Rademacher chaos indexed by R for which Propositions 2.2 and 2.6 of Arcones and Giné [3] may be applied. In particular, by using Haussler’s =-=[21]-=- metric entropy bound for VC classes, it is easy to see that there exists a constant C such that EZɛ ≤ CnV. Similarly, EɛM is just an expected Rademacher average that may be bounded by C √ Vn(see, e.g... |

112 | Talagrand deviation inequalities for product measures. ESAIM: Probabilty and Statistics,
- Ledoux
- 1996
(Show Context)
Citation Context ....,Xn). Then we write EZ q ɛ = EEɛZ q ɛ and study the quantity EɛZ q ɛ , with the Xi fixed. But then Zɛ is a so-called Rademacher chaos whose tail behavior has been studied, see Talagrand [42], Ledoux =-=[28]-=-, Boucheron, Bousquet, Lugosi and Massart [9]. In particular, for any q ≥ 2, (EɛZ q ɛ )1/q ≤ EɛZɛ + ( ( ) q ) 1/q Eɛ Zɛ − EɛZɛ + ≤ EɛZɛ + 3 √ q EɛUɛ + 4qB with Uɛ defined above and B = sup f ∈F sup α,... |

96 | Theory of classification: a survey of some recent advances.
- Boucheron, Bousquet, et al.
- 2005
(Show Context)
Citation Context ...hrases. Statistical learning, theory of classification, VC classes, fast rates, convex risk minimization, moment inequalities, U-processes. 844RANKING AND ERM OF U-STATISTICS 845 Bousquet and Lugosi =-=[8]-=-, Koltchinskii [26], Massart [32] for surveys and recent development. The important feature of the ranking problem is that natural estimates of the ranking risk involve U-statistics. Therefore, our me... |

96 |
Decoupling. From dependence to independence.
- Pena, V, et al.
- 1999
(Show Context)
Citation Context ...nequalities, symmetrization tricks and a contraction principle for U-processes. For an excellent account of the theory of U-statistics and U-processes we refer to the monograph of de la Peña and Giné =-=[15]-=-. We also provide a theoretical analysis of certain nonparametric ranking methods that are based on an empirical minimization of convex cost functionals over convex sets of scoring functions. The meth... |

73 | On the Bayes-risk consistency of regularized boosting methods.
- Lugosi, Vayatis
- 2004
(Show Context)
Citation Context ...n made in understanding the statistical behavior of such methods, see, for example, Bartlett, Jordan and McAuliffe [5], Blanchard, Lugosi and Vayatis [7], Breiman [10], Jiang [25], Lugosi and Vayatis =-=[30]-=- andZhang[48]. The purpose of this section is to extend the principle of convex risk minimization to the ranking problem studied in this paper. Our analysis also provides a theoretical framework for t... |

69 | Moment inequalities for functions of independent random variables. - Boucheron, Bousquet, et al. - 2005 |

65 | Generalization bounds for the area under the ROC curve. - Agarwal, Graepel, et al. - 2005 |

59 | Risk bounds for statistical learning. - Massart, Nedelec - 2006 |

54 | On the rate of convergence of regularized boosting classifiers.
- Blanchard, Lugosi, et al.
- 2003
(Show Context)
Citation Context .... Recently significant theoretical advance has been made in understanding the statistical behavior of such methods, see, for example, Bartlett, Jordan and McAuliffe [5], Blanchard, Lugosi and Vayatis =-=[7]-=-, Breiman [10], Jiang [25], Lugosi and Vayatis [30] andZhang[48]. The purpose of this section is to extend the principle of convex risk minimization to the ranking problem studied in this paper. Our a... |

51 |
Limit theorems for U-processes
- ARCONES, GINE
- 1993
(Show Context)
Citation Context ...order to apply Theorem 5, we need suitable upper bounds for EZɛ, EUɛ, EM and ρ. To bound EZɛ, observe that Zɛ is a Rademacher chaos indexed by R for which Propositions 2.2 and 2.6 of Arcones and Giné =-=[3]-=- may be applied. In particular, by using Haussler’s [21] metric entropy bound for VC classes, it is easy to see that there exists a constant C such that EZɛ ≤ CnV. Similarly, EɛM is just an expected R... |

47 |
Estimating the approximation error in learning theory.
- Smale, Zhou
- 2003
(Show Context)
Citation Context ...∈FB A(f ) − A∗ can be guaranteed to go to zero as B →∞. For the approximation properties of such kernel classes we refer the reader to Cucker and Smale [14], Scovel and Steinwart [36], Smale and Zhou =-=[38]-=-, Steinwart [39] etc. REMARK 7(Fast rates). A natural question is whether the arguments of Section 4 can be extended to prove fast rates of convergence for minimizers of the convex ranking risk. For o... |

27 | Empirical minimization.
- Bartlett, Mendelson
- 2006
(Show Context)
Citation Context ...cal averages, we consider empirical minimizers of U-statistics, more in the spirit of empirical risk minimization [45] popular in statistical learning theory, see, for example, Bartlett and Mendelson =-=[6]-=-, Boucheron, Received March 2006; revised April 2007. 1 Supported in part by the Spanish Ministry of Science and Technology, Grant MTM 2006-05650 and by the PASCAL Network of Excellence under EC Grant... |

24 | Exponential and Moment Inequalities for U-statistics - Giné, Latala, et al. - 2000 |

24 | Ranking with a p-norm push.
- Rudin
- 2006
(Show Context)
Citation Context ...n both in the statistical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri [12], Cossock and Zhang [13], Freund, Iyer, Schapire and Singer [17], Rudin =-=[35]-=-, Usunier et al. [44] and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two different observations and decide which one is “better.” For example, in an application of documen... |

21 | Exponential inequalities, with constants, for U-statistics of order two.
- Houdre, Reynaud-Bouret
- 2003
(Show Context)
Citation Context ...f [4] due to Major [31] for VC and other “nice” classes. We also refer to the corresponding results obtained for U-statistics by Adamczak [1], Giné, Latala and Zinn [18] and Houdré and Reynaud-Bouret =-=[24]-=-. We860 S. CLÉMENÇON, G. LUGOSI AND N. VAYATIS point out here that the recent work of Adamczak [1] establishes very general moment inequalities for Banach space-valued degenerate U-statistics of arbi... |

19 | Pattern classification and learning theory
- Lugosi
- 2002
(Show Context)
Citation Context ...ility at least 1 − δ, L(rn) − inf r∈R L(r) ≤ 4ERn √ ln(1/δ) + 4 n − 1 . The expected value of the Rademacher average Rn may now be bounded by standard metric entropy methods, see, for example, Lugosi =-=[29]-=-, Boucheron, Bousquet and Lugosi [8].Forexample,iftheclassRof indicator functions has finite VC dimension V ,then ERn ≤ c for a universal constant c. This result is similar to the one proved in the bi... |

14 |
Population theory for boosting ensembles
- Breiman
(Show Context)
Citation Context ...gnificant theoretical advance has been made in understanding the statistical behavior of such methods, see, for example, Bartlett, Jordan and McAuliffe [5], Blanchard, Lugosi and Vayatis [7], Breiman =-=[10]-=-, Jiang [25], Lugosi and Vayatis [30] andZhang[48]. The purpose of this section is to extend the principle of convex risk minimization to the ranking problem studied in this paper. Our analysis also p... |

11 | Moment Inequalities for U-statistics
- Adamczak
- 2006
(Show Context)
Citation Context ...a permutation π of {1,...,m} and the goal is that π should coincide with (or at least resemble to) the permutation π for which Y (π(1)) ≥···≥Y (π(m)). Given a loss function ℓ that assigns a number in =-=[0, 1]-=- to a pair of permutations, the ranking risk is defined as L(r) = Eℓ ( r ( X (1) ,...,X (m)) , π ) . In this general case, natural estimates of L(r) involve mth order U-statistics. Some results of thi... |

11 |
U -processes indexed by Vapnik{·Cervonenkis classes of functions with application to asymptotics and bootstrap of U -statistics with estimated parameters
- Arcones, Gin¶e
- 1994
(Show Context)
Citation Context ...oment inequality for U-processes that may be interesting on its own right. This inequality is presented in Section 6. We mention here that for VC classes one may use an inequality of Arcones and Giné =-=[4]-=- and its significant improvement due to Major [31]. It is well known from the theory of empirical risk minimization (see Tsybakov [43], Bartlett and Mendelson [6], Koltchinskii [26], Massart [32]), th... |

9 |
An estimate on the supremum of a nice class of stochastic integrals and U-statistics. Probab. Theory Related Fields
- Major
- 2006
(Show Context)
Citation Context ...t additional work. The moment inequality of Theorem 11 should be possible to generalize by induction as all ingredients of the proof are available. In fact, the inequalities of Adamczak [1] and Major =-=[31]-=- are stated for general U-statistics of m variables. The key question is how the results of Section 4 can be generalized. In order to see this, one needs to understand what the analog of Assumption 4 ... |

5 |
Process consistency for AdaBoost (with discussion). The Annals of Statistics 32 13–29 (disc
- Jiang
- 2004
(Show Context)
Citation Context ...eoretical advance has been made in understanding the statistical behavior of such methods, see, for example, Bartlett, Jordan and McAuliffe [5], Blanchard, Lugosi and Vayatis [7], Breiman [10], Jiang =-=[25]-=-, Lugosi and Vayatis [30] andZhang[48]. The purpose of this section is to extend the principle of convex risk minimization to the ranking problem studied in this paper. Our analysis also provides a th... |

5 |
Conditional U–statistics
- Stute
- 1991
(Show Context)
Citation Context ...ich aims at indicating reliability. In this paper we define a statistical framework for studying such ranking problems. The ranking problem defined here is closely related to the one studied by Stute =-=[40, 41]-=-. Indeed, Stute’s results imply that certain nonparametric estimates basedonlocalU-statistics give universally consistent ranking rules. Our approach here is different. Instead of local averages, we c... |

4 |
Fast rates for support vector machines. Learning Theory
- Steinwart, Scovel
- 2005
(Show Context)
Citation Context ...oximation error inff ∈FB A(f ) − A∗ can be guaranteed to go to zero as B →∞. For the approximation properties of such kernel classes we refer the reader to Cucker and Smale [14], Scovel and Steinwart =-=[36]-=-, Smale and Zhou [38], Steinwart [39] etc. REMARK 7(Fast rates). A natural question is whether the arguments of Section 4 can be extended to prove fast rates of convergence for minimizers of the conve... |

4 | Ranking with unlabeled data: A first study
- Usunier, Truong, et al.
- 2005
(Show Context)
Citation Context ...ical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri [12], Cossock and Zhang [13], Freund, Iyer, Schapire and Singer [17], Rudin [35], Usunier et al. =-=[44]-=- and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two different observations and decide which one is “better.” For example, in an application of document retrieval, one is c... |

3 |
Machine learning ranking for structured information retrieval
- Vittaut, Gallinari
- 2006
(Show Context)
Citation Context ...ature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri [12], Cossock and Zhang [13], Freund, Iyer, Schapire and Singer [17], Rudin [35], Usunier et al. [44] and Vittaut and Gallinari =-=[46]-=-). In the ranking problem, one has to compare two different observations and decide which one is “better.” For example, in an application of document retrieval, one is concerned with comparing documen... |

2 | Using rankboost to compare retrieval systems
- Vu, Gallinari
- 2005
(Show Context)
Citation Context ...maybe interested in the more general problem of ranking m independent observations X (1),...,X(m). The problem of ranking pairs is considerably simpler and has many practical applications (see, e.g., =-=[12, 17, 46, 47]-=-, and the connection to the AUC detailed in the Appendix B) but ranking more than two items has also been considered in the literature (see Stute [40, 41], Cossock and Zhang [13]). In the general prob... |

1 |
Subset ranking uning regression
- COSSOCK, T
- 2006
(Show Context)
Citation Context ...the ranking problem has received increasing attention both in the statistical and machine learning literature (see, e.g., Agarwal et al. [2], Cao et al. [11], Cortes and Mohri [12], Cossock and Zhang =-=[13]-=-, Freund, Iyer, Schapire and Singer [17], Rudin [35], Usunier et al. [44] and Vittaut and Gallinari [46]). In the ranking problem, one has to compare two different observations and decide which one is... |

1 |
Universally consistent conditional U-statistics
- STUTE
- 1994
(Show Context)
Citation Context ...ich aims at indicating reliability. In this paper we define a statistical framework for studying such ranking problems. The ranking problem defined here is closely related to the one studied by Stute =-=[40, 41]-=-. Indeed, Stute’s results imply that certain nonparametric estimates basedonlocalU-statistics give universally consistent ranking rules. Our approach here is different. Instead of local averages, we c... |