Results 1 - 10
of
450
Large margin methods for structured and interdependent output variables
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract
-
Cited by 624 (12 self)
- Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Online passive-aggressive algorithms
- JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. The end result is new alg ..."
Abstract
-
Cited by 435 (24 self)
- Add to MetaCart
(Show Context)
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. The end result is new algorithms and accompanying loss bounds for hinge-loss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
A support vector method for multivariate performance measures
- Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear per ..."
Abstract
-
Cited by 305 (6 self)
- Add to MetaCart
(Show Context)
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
Learning Structural SVMs with Latent Variables
"... It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables ..."
Abstract
-
Cited by 215 (2 self)
- Add to MetaCart
(Show Context)
It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables
Discriminative models for multi-class object layout
"... Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from ima ..."
Abstract
-
Cited by 197 (6 self)
- Add to MetaCart
(Show Context)
Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image
Learning to rank with nonsmooth cost functions
- In Advances in Neural Information Processing Systems (NIPS) 20
, 2006
"... The quality measures used in information retrieval are particularly difficult to op-timize directly, since they depend on the model scores only through the sorted order of the documents returned for a given query. Thus, the derivatives of the cost with respect to the model parameters are either zero ..."
Abstract
-
Cited by 176 (11 self)
- Add to MetaCart
(Show Context)
The quality measures used in information retrieval are particularly difficult to op-timize directly, since they depend on the model scores only through the sorted order of the documents returned for a given query. Thus, the derivatives of the cost with respect to the model parameters are either zero, or are undefined. In this paper, we propose a class of simple, flexible algorithms, called LambdaRank, which avoids these difficulties by working with implicit cost functions. We de-scribe LambdaRank using neural network models, although the idea applies to any differentiable function class. We give necessary and sufficient conditions for the resulting implicit cost function to be convex, and we show that the general method has a simple mechanical interpretation. We demonstrate significantly im-proved accuracy, over a state-of-the-art ranking algorithm, on several datasets. We also show that LambdaRank provides a method for significantly speeding up the training phase of that ranking algorithm. Although this paper is directed towards ranking, the proposed method can be extended to any non-smooth and multivariate cost functions. 1
Hierarchical document categorization with support vector machines
- IN: PROCEEDINGS OF THE 13TH CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT
, 2004
"... Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fac ..."
Abstract
-
Cited by 161 (4 self)
- Add to MetaCart
(Show Context)
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fact that they ignore the inter-class relationships. In this paper, we propose a novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy. Our method can work with arbitrary, not necessarily singly connected taxonomies and can deal with task-specific loss functions. All parameters are learned jointly by optimizing a common objective function corresponding to a regularized upper bound on the empirical loss. We present experimental results on the WIPO-alpha patent collection to show the competitiveness of our approach.
Learning to localize objects with structured output regression
- In ECCV
, 2008
"... Abstract. Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, ..."
Abstract
-
Cited by 125 (15 self)
- Add to MetaCart
(Show Context)
Abstract. Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves performance over binary training as well as the best previously published scores. 1
Multi-Label Prediction via Compressed Sensing
, 902
"... We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for e ..."
Abstract
-
Cited by 100 (3 self)
- Add to MetaCart
(Show Context)
We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multilabel regression problems to binary regression problems. It is shown that the number of subproblems need only be logarithmic in the total number of label values, making this approach radically more efficient than others. We also state and prove performance guarantees for this method, and test it empirically. 1.