Results

**1 - 3**of**3**### SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

"... In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tu-ples (e.g., pairs or triplets) of observations, rather than over individual observa-tions. In this paper, we focus on how to best implement ..."

Abstract
- Add to MetaCart

(Show Context)
In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tu-ples (e.g., pairs or triplets) of observations, rather than over individual observa-tions. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the large-scale setting, gradient estimates should be obtained by sampling tuples of data points with replacement (incomplete U-statistics) instead of sampling data points without replacement (complete U-statistics based on subsamples). We develop a theoretical framework accounting for the substantial impact of this strategy on the generalization ability of the prediction model returned by the Stochastic Gradient Descent (SGD) algorithm. It reveals that the method we promote achieves a much better trade-off between statistical accuracy and computational cost. Beyond the rate bound analysis, experiments on AUC maximization and metric learning pro-vide strong empirical evidence of the superiority of the proposed approach. 1

### Extending Gossip Algorithms to Distributed Estimation of U-Statistics

"... Efficient and robust algorithms for decentralized estimation in networks are es-sential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of U-statistics, relying on more expensive averaging over pairs ..."

Abstract
- Add to MetaCart

(Show Context)
Efficient and robust algorithms for decentralized estimation in networks are es-sential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of U-statistics, relying on more expensive averaging over pairs of observations, is a less investigated area. Yet, such data functionals are essential to describe global prop-erties of a statistical population, with important examples including Area Under the Curve, empirical variance, Gini mean difference and within-cluster point scat-ter. This paper proposes new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and main-tain local estimates of the U-statistic of interest. We establish convergence rate bounds of O(1/t) and O(log t/t) for the synchronous and asynchronous cases respectively, where t is the number of iterations, with explicit data and network dependent terms. Beyond favorable comparisons in terms of rate analysis, numer-ical experiments provide empirical evidence the proposed algorithms surpasses the previously introduced approach. 1

### Maximal Deviations of Incomplete U-statistics with Applications to Empirical Risk Sampling

, 2013

"... It is the goal of this paper to extend the Empirical Risk Minimization (ERM) paradigm, from a practical perspective, to the situation where a natural estimate of the risk is of the form of a K-sample U-statistics, as it is the case in the K-partite ranking problem for instance. Indeed, the numerical ..."

Abstract
- Add to MetaCart

(Show Context)
It is the goal of this paper to extend the Empirical Risk Minimization (ERM) paradigm, from a practical perspective, to the situation where a natural estimate of the risk is of the form of a K-sample U-statistics, as it is the case in the K-partite ranking problem for instance. Indeed, the numerical computation of the empirical risk is hardly feasible if not infeasible, even for moderate samples sizes. Precisely, it involves averaging O(n d1+...+d K) terms, when considering a U-statistic of degrees (d1,..., dK) based on samples of sizes proportional to n. We propose here to consider a drastically simpler Monte-Carlo version of the empirical risk based on O(n) terms solely, which can be viewed as an incomplete generalized U-statistic, and prove that, remarkably, the approximation stage does not damage the ERM procedure and yields a learning rate of order OP(1 / √ n). Beyond a theoretical analysis guaranteeing the validity of this approach, numerical experiments are displayed for illustrative purpose.