Results 1  10
of
30
Natural language processing (almost) from scratch
, 2011
"... We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including partofspeech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid taskspecific eng ..."
Abstract

Cited by 244 (18 self)
 Add to MetaCart
(Show Context)
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including partofspeech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid taskspecific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting manmade input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
Generalization Bounds for Ranking Algorithms via Algorithmic Stability
 J. of Machine Learning Research
"... The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained much attention in machine learning. We study generalization properties of ranking algorithms using the notion of algorithmic stability; ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained much attention in machine learning. We study generalization properties of ranking algorithms using the notion of algorithmic stability; in particular, we derive generalization bounds for ranking algorithms that have good stability properties. We show that kernelbased ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore our bounds can be applied to these algorithms; this is in contrast with generalization bounds based on uniform convergence, which in many cases cannot be applied to these algorithms. Our results generalize earlier results that were derived in the special setting of bipartite ranking (Agarwal and Niyogi, 2005) to a more general setting of the ranking problem that arises frequently in applications.
The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List
"... Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximize ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞norm extreme of the lpnorm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on realworld data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Future directions in learning to rank
, 2011
"... The results of the learning to rank challenge showed that the quality of the predictions from the top competitors are very close from each other. This raises a question: is learning to rank a solved problem? On the on hand, it is likely that only small incremental progress can be made in the “core” ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The results of the learning to rank challenge showed that the quality of the predictions from the top competitors are very close from each other. This raises a question: is learning to rank a solved problem? On the on hand, it is likely that only small incremental progress can be made in the “core” and traditional problematics of learning to rank. The challenge was set in this standard learning to rank scenario: optimize a ranking measure on a test set. But on the other hand, there are a lot of related questions and settings in learning to rank that have not been yet fully explored. We review some of them in this paper and hope that researchers interested in learning to rank will try to answer these challenging and exciting research questions. 1. Learning Theory for Ranking Many learning to rank algorithms have been shown effective through benchmark experiments. However, sometimes benchmark experiments are not as reliable as expected due to the small scales of the training and test data. In this situation, a theory is needed to guarantee the performance of an algorithm on infinite unseen data.
Overlaying classifiers: A practical approach for optimal ranking
 Adv. Neural Inf. Process. Syst
, 2009
"... The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC curve. In this bipartite setup, the target is known to be the regression function up to an increasing transform and solving the optimization problem boils down to recovering the collection of level sets of the latter, which we interpret here as a continuum of imbricated classification problems. We propose a discretization approach, consisting in building a finite sequence of N classifiers by constrained empirical risk minimization and then constructing a piecewise constant scoring function sN(x) by overlaying the resulting classifiers. Given the functional nature of the ROC criterion, the accuracy of the ranking induced by sN(x) can be conceived in a variety of ways, depending on the distance chosen for measuring closeness to the optimal curve in the ROC space. By relating the ROC curve of the resulting scoring function to piecewise linear approximates of the optimal ROC curve, we establish the consistency of the method as well as rate bounds to control its generalization ability in supnorm. Eventually, we also highlight the fact that, as a byproduct, the algorithm proposed provides an accurate estimate of the optimal ROC curve.
An Efficient Reduction of Ranking to Classification
, 2007
"... This paper describes an efficient reduction of the learning problem of ranking to binary classification. The reduction is randomized and guarantees a pairwise misranking regret bounded by that of the binary classifier, improving on a recent result of Balcan et al. (2007) which ensures only twice tha ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
This paper describes an efficient reduction of the learning problem of ranking to binary classification. The reduction is randomized and guarantees a pairwise misranking regret bounded by that of the binary classifier, improving on a recent result of Balcan et al. (2007) which ensures only twice that upperbound. Moreover, our reduction applies to a broader class of ranking loss functions, admits a simple proof, and the expected time complexity of our algorithm in terms of number of calls to a classifier or preference function is also improved from Ω(n 2) to O(n log n). In addition, when the top k ranked elements only are required (k ≪ n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O(k log k+n). Our reduction and algorithm are thus practical for realistic applications where the number of points to rank exceeds several thousands. Much of our results also extend beyond the bipartite case previously studied. To further complement them, we also derive lower bounds for any deterministic reduction of ranking to binary classification, proving that randomization is necessary to achieve our reduction guarantees. 1
Accuracy at the Top
"... We introduce a new notion of classification accuracy based on the top τquantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and show how its solution can be obt ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We introduce a new notion of classification accuracy based on the top τquantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and show how its solution can be obtained by solving a set of convex optimization problems. We also present marginbased guarantees for this algorithm based on the top τquantile of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top. 1
AUC optimization and the twosample problem
"... The purpose of the paper is to explore the connection between multivariate homogeneity tests and AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that, in the twosample problem setup, the null assumptio ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
The purpose of the paper is to explore the connection between multivariate homogeneity tests and AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that, in the twosample problem setup, the null assumption corresponds to the situation where the area under the optimal ROC curve is equal to 1/2, we propose a twostage testing method based on data splitting. A nearly optimal scoring function in the AUC sense is first learnt from one of the two halfsamples. Data from the remaining halfsample are then projected onto the real line and eventually ranked according to the scoring function computed at the first stage. The last step amounts to performing a standard MannWhitney Wilcoxon test in the onedimensional framework. We show that the learning step of the procedure does not affect the consistency of the test as well as its properties in terms of power, provided the ranking produced is accurate enough in the AUC sense. The results of a numerical experiment are eventually displayed in order to show the efficiency of the method. 1
On partitioning rules for bipartite ranking
 In Proceedings of AISTATS, number 5
"... The purpose of this paper is to investigate the properties of partitioning scoring rules in the bipartite ranking setup. We focus on ranking rules based on scoring functions. General sufficient conditions for the AUC consistency of scoring functions that are constant on cells of a partition of the f ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The purpose of this paper is to investigate the properties of partitioning scoring rules in the bipartite ranking setup. We focus on ranking rules based on scoring functions. General sufficient conditions for the AUC consistency of scoring functions that are constant on cells of a partition of the feature space are provided. Rate bounds are obtained for cubic histogram scoring rules under mild smoothness assumptions on the regression function. In this setup, it is shown how to penalize the empirical AUC criterion in order to select a scoring rule nearly as good as the one that can be built when the degree of smoothness of the regression function is known. 1
Empirical Performance Maximization for Linear Rank Statistics
 Advances in Neural Information Processing Systems
, 2008
"... The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical dia ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the pnorm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1