Results 1  10
of
95
MaximumMargin Matrix Factorization
 Advances in Neural Information Processing Systems 17
, 2005
"... We present a novel approach to collaborative prediction, using lownorm instead of lowrank factorizations. The approach is inspired by, and has strong connections to, largemargin linear discrimination. We show how to learn lownorm factorizations by solving a semidefinite program, and discuss ..."
Abstract

Cited by 264 (21 self)
 Add to MetaCart
(Show Context)
We present a novel approach to collaborative prediction, using lownorm instead of lowrank factorizations. The approach is inspired by, and has strong connections to, largemargin linear discrimination. We show how to learn lownorm factorizations by solving a semidefinite program, and discuss generalization error bounds for them.
Gaussian processes for ordinal regression
 Journal of Machine Learning Research
, 2004
"... We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation ..."
Abstract

Cited by 116 (4 self)
 Add to MetaCart
(Show Context)
We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and realworld data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.
New approaches to support vector ordinal regression
 In ICML ’05: Proceedings of the 22nd international conference on Machine Learning
, 2005
"... In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these opt ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The SMO algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on benchmark datasets verify the usefulness of these approaches. 1.
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
(Show Context)
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
An efficient method for gradientbased adaptation of hyperparameters in svm models
, 2007
"... We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed kfold crossvalidation error, using nonlinear optimization techniques. The key computation in this approach is that of the gradient of the validation function wi ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed kfold crossvalidation error, using nonlinear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for largescale problems involving a wide choice of kernelbased models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a nearoptimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations. 1
Support vector ordinal regression
 Neural Computation
, 2007
"... In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these opt ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The SMO algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on some benchmark and realworld data sets, including applications of ordinal regression to information retrieval and collaborative filtering, verify the usefulness of these approaches. 1
Ordinal regression by extended binary classification
 In
, 2007
"... We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on welltuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework. 1
On the Consistency of Ranking Algorithms
"... We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency ev ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency even in lownoise settings. We present a newvalueregularizedlinear loss, establishits consistency under reasonable assumptions on noise, and show that it outperforms conventional ranking losses in a collaborative filtering experiment. The goal in ranking is to order a set of inputs in accordance with the preferences of an individual or a population. In this paper we consider a general formulation of the supervised ranking problem in which each training example consists of a query q, a set of inputs x, sometimes called results, and a weighted graph G representing preferences over the results. The learning task is to discover a function that provides a queryspecific ordering of the inputs that best respects the observed preferences. This queryindexed setting is natural for tasks like web search in which a different ranking is needed for each query. Following existing literature, we assume the existence of a scoring function f(x,q) that gives a score to each result in x; the scoresaresortedtoproducearanking(Herbrich et al., 2000; Freund et al., 2003). We assume simply that the observed preference graph G is a directed acyclic graph (DAG). Finally, we cast our work in a decisiontheoretic framework in which ranking procedures are evaluated via a loss function L(f(x,q),G).
Magnitudepreserving ranking algorithms
, 2007
"... This paper studies the learning problem of ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, a problem motivated by its key importance in the design of search engines, movie recommendation, a ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
This paper studies the learning problem of ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, a problem motivated by its key importance in the design of search engines, movie recommendation, and other similar ranking systems. We describe and analyze several algorithms for this problem and give stability bounds for their generalization error, extending previously known stability results to nonbipartite ranking and magnitude of preferencepreserving algorithms. We also report the results of experiments comparing these algorithms on several datasets and compare these results with those obtained using an algorithm minimizing the pairwise misranking error and standard regression. 1.
Multitask learning via conic programming
 In Advances in Neural Information Processing Systems 20
, 2007
"... When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multitask learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as uniformly related to each other an ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multitask learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein superfamily classification and ordinal regression problems. 1