Results 1 
4 of
4
MultiClass Deep Boosting
"... We present new ensemble learning algorithms for multiclass classification. Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new datadependent learning bounds for convex e ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present new ensemble learning algorithms for multiclass classification. Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new datadependent learning bounds for convex ensembles in the multiclass classification setting expressed in terms of the Rademacher complexities of the subfamilies composing the base classifier set, and the mixture weight assigned to each subfamily. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble’s mixture weights. We introduce and discuss several new multiclass ensemble algorithms benefiting from these guarantees, prove positive results for the Hconsistency of several of them, and report the results of experiments showing that their performance compares favorably with that of multiclass versions of AdaBoost and Logistic Regression and their L1regularized counterparts. 1
On the Consistency of AUC Pairwise Optimization
"... AUC (Area Under ROC Curve) has been an important criterion widely used in diverse learning tasks. To optimize AUC, many learning approaches have been developed, most working with pairwise surrogate losses. Thus, it is important to study the AUC consistency based on minimizing pairwise surrogate l ..."
Abstract
 Add to MetaCart
AUC (Area Under ROC Curve) has been an important criterion widely used in diverse learning tasks. To optimize AUC, many learning approaches have been developed, most working with pairwise surrogate losses. Thus, it is important to study the AUC consistency based on minimizing pairwise surrogate losses. In this paper, we introduce the generalized calibration for AUC optimization, and prove that it is a necessary condition for AUC consistency. We then provide a sufficient condition for AUC consistency, and show its usefulness in studying the consistency of various surrogate losses, as well as the invention of new consistent losses. We further derive regret bounds for exponential and logistic losses, and present regret bounds for more general surrogate losses in the realizable setting. Finally, we prove regret bounds that disclose the equivalence between the pairwise exponential loss of AUC and univariate exponential loss of accuracy. 1
Under the supervision of
, 2014
"... We survey the problem of learning linear models, in the binary and multiclass settings. In both cases, our goal is to find a linear model with least probability of mistake. This problem is known to be NPhard and even NPhard to learn improperly (under relevant assumptions). Nonetheless, under cert ..."
Abstract
 Add to MetaCart
(Show Context)
We survey the problem of learning linear models, in the binary and multiclass settings. In both cases, our goal is to find a linear model with least probability of mistake. This problem is known to be NPhard and even NPhard to learn improperly (under relevant assumptions). Nonetheless, under certain assumptions about the input the problem has an algorithm with worstcase polynomial time complexity. At first glance these assumptions seem to vary greatly. Starting from the realizable assumption, which entails that the labeling is deterministic and can be realized by a linear function, and ending with simply the existence of a predictor with margin and a low error rate. However all of these methods can be seen as generalized linear models, namely the optimal classifier can be used to estimate the distribution of the labels given any example. On a different note, all of these methods are based on convex optimization