Results 1  10
of
19
The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List
"... Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximize ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞norm extreme of the lpnorm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on realworld data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions
"... In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to prov ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexitybased generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al. (2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees. 1.
Twolayer Generalization Analysis for Ranking Using Rademacher Average
"... This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of twolayer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from querydocument matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of twolayer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1
A Learning Theory Framework for Association Rules and Sequential Events
"... We present a framework and generalization analysis for the use of association rules in the setting of supervised learning. We are specifically interested in a sequential event prediction problem where data are revealed one by one, and the goal is to determine what will next be revealed. In the con ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We present a framework and generalization analysis for the use of association rules in the setting of supervised learning. We are specifically interested in a sequential event prediction problem where data are revealed one by one, and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, to our knowledge there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence ” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.
An O(n logn) Cutting Plane Algorithm for Structured Output Ranking
"... Abstract. In this work, we consider ranking as a training strategy for structured output prediction. Recent work has begun to explore structured output prediction in the ranking setting, but has mostly focused on the special case of bipartite preference graphs. The bipartite special case is computa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. In this work, we consider ranking as a training strategy for structured output prediction. Recent work has begun to explore structured output prediction in the ranking setting, but has mostly focused on the special case of bipartite preference graphs. The bipartite special case is computationally efficient as there exists a linear time cutting plane training strategy for hinge loss bounded regularized risk, but it is unclear how to feasibly extend the approach to complete preference graphs. We develop here a highly parallelizableO(n logn) algorithm for cutting plane training with complete preference graphs that is scalable to millions of samples on a single core. We explore theoretically and empirically the relationship between slack rescaling and margin rescaling variants of the hinge loss bound to structured losses, showing that the slack rescaling variant has better stability properties and empirical performance with no additional computational cost per cutting plane iteration. We further show generalization bounds based on uniform convergence. Finally, we demonstrate the effectiveness of the proposed family of approaches on the problem of object detection in computer vision. 1
Stability of MultiTask Kernel Regression Algorithms
, 2013
"... Abstract. We study the stability properties of nonlinear multitask regression in reproducing Hilbert spaces with operatorvalued kernels. Such kernels, a.k.a. multitask kernels, are appropriate for learning problems with nonscalar outputs like multitask learning and structured output prediction. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We study the stability properties of nonlinear multitask regression in reproducing Hilbert spaces with operatorvalued kernels. Such kernels, a.k.a. multitask kernels, are appropriate for learning problems with nonscalar outputs like multitask learning and structured output prediction. We show that multitask kernel regression algorithms are uniformly stable in the general case of infinitedimensional output spaces. We then derive under mild assumption on the kernel generalization bounds of such algorithms, and we show their consistency even with non HilbertSchmidt operatorvalued kernels 1. We demonstrate how to apply the results to various multitask kernel regression methods such as vectorvalued SVR and functional ridge regression. 1
Intelligence Uniform Convergence, Stability and Learnability for Ranking Problems
 PROCEEDINGS OF THE TWENTYTHIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL
, 2013
"... Most studies were devoted to the design of efficient algorithms and the evaluation and application on diverse ranking problems, whereas few work has been paid to the theoretical studies on ranking learnability. In this paper, we study the relation between uniform convergence, stability and learnabil ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Most studies were devoted to the design of efficient algorithms and the evaluation and application on diverse ranking problems, whereas few work has been paid to the theoretical studies on ranking learnability. In this paper, we study the relation between uniform convergence, stability and learnability of ranking. In contrast to supervised learning where the learnability is equivalent to uniform convergence, we show that the ranking uniform convergence is sufficient but not necessary for ranking learnability with AERM, and we further present a sufficient condition for ranking uniform convergence with respect to bipartite ranking loss. Considering the ranking uniform convergence being unnecessary for ranking learnability, we prove that the ranking average stability is a necessary and sufficient condition for ranking learnability.
KLbased Control of the Learning Schedule for Surrogate BlackBox Optimization
"... This paper investigates the control of an ML component within the Covariance Matrix Adaptation Evolution Strategy (CMAES) devoted to blackbox optimization. The known CMAES weakness is its sample complexity, the number of evaluations of the objective function needed to approximate the global op ..."
Abstract
 Add to MetaCart
This paper investigates the control of an ML component within the Covariance Matrix Adaptation Evolution Strategy (CMAES) devoted to blackbox optimization. The known CMAES weakness is its sample complexity, the number of evaluations of the objective function needed to approximate the global optimum. This weakness is commonly addressed through surrogate optimization, learning an estimate of the objective function a.k.a. surrogate model, and replacing most evaluations of the true objective function with the (inexpensive) evaluation of the surrogate model. This paper presents a principled control of the learning schedule (when to relearn the surrogate model), based on the KullbackLeibler divergence of the current search distribution and the training distribution of the former surrogate model. The experimental validation of the proposed approach shows significant performance gains on a comprehensive set of illconditioned benchmark problems, compared to the best state of the art including the quasiNewton highprecision BFGS method. Motsclef: expensive blackbox optimization, evolutionary algorithms, surogate models, KullbackLeibler divergence, CMAES. 1
Graphbased Generalization Bounds for Learning Binary Relations
"... We investigate the generalizability of learned binary relations: functions that map pairs of instances to a logical indicator. This problem has application in numerous areas of machine learning, such as ranking, entity resolution and link prediction. Our learning framework incorporates an example la ..."
Abstract
 Add to MetaCart
We investigate the generalizability of learned binary relations: functions that map pairs of instances to a logical indicator. This problem has application in numerous areas of machine learning, such as ranking, entity resolution and link prediction. Our learning framework incorporates an example labeler that, given a sequence X of n instances and a desired training size m, subsamples m pairs from X × X without replacement. The challenge in analyzing this learning scenario is that pairwise combinations of random variables are inherently dependent, which prevents us from using traditional learningtheoretic arguments. We present a unified, graphbased analysis, which allows us to analyze this dependence using wellknown graph identities. We are then able to bound the generalization error of learned binary relations using Rademacher complexity and algorithmic stability. The rate of uniform convergence is partially determined by the labeler’s subsampling process. We thus examine how various assumptions about subsampling affect generalization; under a natural random subsampling process, our bounds guarantee Õ(1/√n) uniform convergence. 1