Results 11  20
of
213
Gaussian processes for ordinal regression
 Journal of Machine Learning Research
, 2004
"... We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation ..."
Abstract

Cited by 115 (4 self)
 Add to MetaCart
(Show Context)
We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and realworld data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.
Unifying collaborative and contentbased filtering
 In ICML
, 2004
"... Collaborative and contentbased filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past useritem ratings as w ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
(Show Context)
Collaborative and contentbased filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past useritem ratings as well as attributes of items or users to learn a prediction function. The key ingredient of our method is the design of a suitable kernel or similarity function between useritem pairs that allows simultaneous generalization across the user and item dimensions. We propose an online algorithm (JRank) that generalizes perceptron learning. Experimental results on the EachMovie data set show significant improvements over standard approaches. 1.
LogLinear Models for Label Ranking
, 2003
"... Label ranking is the task of inferring a total order over a predefined set of labels for each given instance. We present a general framework for batch learning of label ranking functions from supervised data. We assume that each instance in the training data is associated with a list of preferenc ..."
Abstract

Cited by 107 (5 self)
 Add to MetaCart
Label ranking is the task of inferring a total order over a predefined set of labels for each given instance. We present a general framework for batch learning of label ranking functions from supervised data. We assume that each instance in the training data is associated with a list of preferences over the labelset, however we do not assume that this list is either complete or consistent. This enables us to accommodate a variety of ranking problems. In contrast to the general form of the supervision, our goal is to learn a ranking function that induces a total order over the entire set of labels. Special cases of our setting are multilabel categorization and hierarchical classification. We present a general boostingbased learning algorithm for the label ranking problem and prove a lower bound on the progress of each boosting iteration. The applicability of our approach is demonstrated with a set of experiments on a largescale text corpus.
Ranking with large margin principle: Two approaches
 In Proceedings of Advances in Neural Information Processing Systems
, 2002
"... We discuss the problem of ranking instances with the use of a “large margin ” principle. We introduce two main approaches: the first is the “fixed margin ” policy in which the margin of the closest neighboring classes is being maximized — which turns out to be a direct generalization of SVM to ranki ..."
Abstract

Cited by 94 (0 self)
 Add to MetaCart
(Show Context)
We discuss the problem of ranking instances with the use of a “large margin ” principle. We introduce two main approaches: the first is the “fixed margin ” policy in which the margin of the closest neighboring classes is being maximized — which turns out to be a direct generalization of SVM to ranking learning. The second approach allows for different margins where the sum of margins is maximized. This approach is shown to reduce toSVM when the number of classes. Both approaches are optimal in size of where is the total number of training examples. Experiments performed on visual classification and “collaborative filtering ” show that both approaches outperform existing ordinal regression algorithms applied for ranking and multiclass SVM applied to general multiclass classification. 1
Multiple aspect ranking using the good grief algorithm
 In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLTNAACL
, 2007
"... We address the problem of analyzing multiple related opinions in a text. For instance, in a restaurant review such opinions may include food, ambience and service. We formulate this task as a multiple aspect ranking problem, where the goal is to produce a set of numerical scores, one for each aspect ..."
Abstract

Cited by 83 (9 self)
 Add to MetaCart
(Show Context)
We address the problem of analyzing multiple related opinions in a text. For instance, in a restaurant review such opinions may include food, ambience and service. We formulate this task as a multiple aspect ranking problem, where the goal is to produce a set of numerical scores, one for each aspect. We present an algorithm that jointly learns ranking models for individual aspects by modeling the dependencies between assigned ranks. This algorithm guides the prediction of individual rankers by analyzing metarelations between opinions, such as agreement and contrast. We prove that our agreementbased joint model is more expressive than individual ranking models. Our empirical results further confirm the strength of the model: the algorithm provides significant improvement over both individual rankers and a stateoftheart joint ranking model. 1
Discriminative reranking for machine translation
 In HLTNAACL 2004
, 2004
"... This paper describes the application of discriminative reranking techniques to the problem of machine translation. For each sentence in the source language, we obtain from a baseline statistical machine translation system, a ranked nbest list of candidate translations in the target language. We intr ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
(Show Context)
This paper describes the application of discriminative reranking techniques to the problem of machine translation. For each sentence in the source language, we obtain from a baseline statistical machine translation system, a ranked nbest list of candidate translations in the target language. We introduce two novel perceptroninspired reranking algorithms that improve on the quality of machine translation over the baseline system based on evaluation using the BLEU metric. We provide experimental results on the NIST 2003 ChineseEnglish large data track evaluation. We also provide theoretical analysis of our algorithms and experiments that verify that our algorithms provide stateoftheart performance in machine translation. 1
New approaches to support vector ordinal regression
 In ICML ’05: Proceedings of the 22nd international conference on Machine Learning
, 2005
"... In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these opt ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The SMO algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on benchmark datasets verify the usefulness of these approaches. 1.
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
(Show Context)
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
A Family of Additive Online Algorithms for Category Ranking
 Journal of Machine Learning Research
, 2003
"... We describe a new family of topicranking algorithms for multilabeled documents. The motivation for the algorithms stem from recent advances in online learning algorithms. The algorithms are simple to implement and are also time and memory efficient. We provide a unified analysis of the family o ..."
Abstract

Cited by 68 (0 self)
 Add to MetaCart
We describe a new family of topicranking algorithms for multilabeled documents. The motivation for the algorithms stem from recent advances in online learning algorithms. The algorithms are simple to implement and are also time and memory efficient. We provide a unified analysis of the family of algorithms in the mistake bound model. We then discuss experiments with the proposed family of topicranking algorithms on the Reuters21578 corpus and the new corpus released by Reuters in 2000. On both corpora, the algorithms we present achieve stateoftheart results and outperforms topicranking adaptations of Rocchio's algorithm and of the Perceptron algorithm.
discriminant model for information retrieval
 In the Proceedings of SIGIR’2005
, 2005
"... This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic featu ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
(Show Context)
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic features that are derived from the component models of HMM that is widely used in language modeling approaches to IR. Therefore, LDM is a means of melding discriminative and generative models for IR. We present two algorithms of parameter learning for LDM. One is to optimize the average precision (AP) directly using an iterative procedure. The other is a perceptronbased algorithm that minimizes the number of discordant documentpairs in a rank list. The effectiveness of our approach has been evaluated on the task of ad hoc retrieval using six English and Chinese TREC test sets. Results show that (1) in most test sets, LDM significantly outperforms the stateoftheart language modeling approaches and the classical probabilistic retrieval model; (2) it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP; and (3) linguistic features (e.g. phrases and dependences) are effective for IR if they are incorporated properly.