Results 1  10
of
1,250,719
Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories
, 2004
"... Abstract — Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and cannot incorporate prior information into the learning process. In addition, no algorithm presented in the literature has been te ..."
Abstract

Cited by 782 (16 self)
 Add to MetaCart
Abstract — Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and cannot incorporate prior information into the learning process. In addition, no algorithm presented in the literature has been
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
, 2002
"... We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modific ..."
Abstract

Cited by 659 (13 self)
 Add to MetaCart
We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a
Training Linear SVMs in Linear Time
, 2006
"... Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for highdimensional sparse data commonly encountered in applications like text classification, wordsense disambiguation, and drug design. These applications involve a large number of examples n ..."
Abstract

Cited by 549 (6 self)
 Add to MetaCart
as well as a large number of features N, while each example has only s << N nonzero features. This paper presents a CuttingPlane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm
Examplebased learning for viewbased human face detection
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Abstract—We present an examplebased learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few viewbased “face ” and “nonface ” model clusters. At each image location, a difference feature v ..."
Abstract

Cited by 693 (24 self)
 Add to MetaCart
Abstract—We present an examplebased learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few viewbased “face ” and “nonface ” model clusters. At each image location, a difference feature
Combining labeled and unlabeled data with cotraining
, 1998
"... We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the ta ..."
Abstract

Cited by 1631 (28 self)
 Add to MetaCart
data, but our goal is to use both views together to allow inexpensive unlabeled data to augment amuch smaller set of labeled examples. Speci cally, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each
Object Detection with Discriminatively Trained Part Based Models
"... We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves stateoftheart results in the PASCAL object detection challenges. While deformable part models have become quite popular, their ..."
Abstract

Cited by 1424 (49 self)
 Add to MetaCart
, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets. Our system relies on new methods for discriminative training with partially labeled data. We combine a marginsensitive approach for datamining hard negative examples with a formalism we call latent SVM. A latent SVM
Boosting the margin: A new explanation for the effectiveness of voting methods
 IN PROCEEDINGS INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1997
"... One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this ..."
Abstract

Cited by 894 (52 self)
 Add to MetaCart
that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show
A discriminatively trained, multiscale, deformable part model
 In IEEE Conference on Computer Vision and Pattern Recognition (CVPR2008
, 2008
"... This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a twofold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge ..."
Abstract

Cited by 557 (11 self)
 Add to MetaCart
training. We combine a marginsensitive approach for data mining hard negative examples with a formalism we call latent SVM. A latent SVM, like a hidden CRF, leads to a nonconvex training problem. However, a latent SVM is semiconvex and the training problem becomes convex once latent information
Optimal Brain Damage
, 1990
"... We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved sp ..."
Abstract

Cited by 509 (5 self)
 Add to MetaCart
We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved
Training Products of Experts by Minimizing Contrastive Divergence
, 2002
"... It is possible to combine multiple latentvariable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual “expert ” models makes it hard to generate samples from the combined model but easy to infer the values of the l ..."
Abstract

Cited by 850 (75 self)
 Add to MetaCart
is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called “contrastive divergence ” whose
Results 1  10
of
1,250,719