Results 1  10
of
39
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 757 (8 self)
 Add to MetaCart
(Show Context)
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Generalized expectation criteria for semisupervised learning of conditional random fields
 In In Proc. ACL, pages 870 – 878
, 2008
"... This paper presents a semisupervised training method for linearchain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distri ..."
Abstract

Cited by 105 (11 self)
 Add to MetaCart
(Show Context)
This paper presents a semisupervised training method for linearchain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instancelabeling to featurelabeling, and the methods presented outperform traditional CRF training and other semisupervised methods when limited human effort is available. 1
Semisupervised Learning by Entropy Minimization
"... We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to ..."
Abstract

Cited by 101 (2 self)
 Add to MetaCart
We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to any model of posterior probabilities. Our approach provides a new motivation for some existing semisupervised learning algorithms which are particular or limiting instances of minimum entropy regularization. A series of experiments illustrates that the proposed solution benefits from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the “cluster assumption”. Finally, we also illustrate that the method can be far superior to manifold learning in high dimension spaces, and also when the manifolds are generated by moving examples along the discriminating directions.
Semisupervised selftraining of object detection models
 In Proc. of IEEE Workshop on Application of Computer Vision
, 2005
"... The construction of appearancebased object detection systems is timeconsuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semisupervised training is a means for reducing the effort needed ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
(Show Context)
The construction of appearancebased object detection systems is timeconsuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semisupervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled examples and an additional set of unlabeled or weakly labeled examples. In this work we present a semisupervised approach to training object detection systems based on selftraining. We implement our approach as a wrapper around the training process of an existing object detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector. 1.
Regularization and feature selection in leastsquares temporal difference learning
, 2009
"... We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the LeastSquares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is la ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
(Show Context)
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the LeastSquares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can overfit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.
On semisupervised classification
 In
, 2005
"... A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for train ..."
Abstract

Cited by 48 (10 self)
 Add to MetaCart
(Show Context)
A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the cotraining information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the cotraining information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors. 1
Multiconditional learning: generative/discriminative training for clustering and classification
, 2006
"... This paper presents multiconditional learning (MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of “label given input ” with a generative probability of “input given label ” the later acts as a surprisingly eff ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
This paper presents multiconditional learning (MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of “label given input ” with a generative probability of “input given label ” the later acts as a surprisingly effective regularizer. When applied to models with latent variables, MCL combines the structurediscovery capabilities of generative topic models, such as latent Dirichlet allocation and the exponential family harmonium, with the accuracy and robustness of discriminative classifiers, such as logistic regression and conditional random fields. We present results on several standard text data sets showing significant reductions in classification error due to MCL regularization, and substantial gains in precision and recall due to the latent structure discovered under MCL.
Online Domain Adaptation of a PreTrained Cascade of Classifiers
"... Many classifiers are trained with massive training sets only to be applied at test time on data from a different distribution. How can we rapidly and simply adapt a classifier to a new test distribution, even when we do not have access to the original training data? We present an online approach fo ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
Many classifiers are trained with massive training sets only to be applied at test time on data from a different distribution. How can we rapidly and simply adapt a classifier to a new test distribution, even when we do not have access to the original training data? We present an online approach for rapidly adapting a “black box ” classifier to a new test data set without retraining the classifier or examining the original optimization criterion. Assuming the original classifier outputs a continuous number for which a threshold gives the class, we reclassify points near the original boundary using a Gaussian process regression scheme. We show how this general procedure can be used in the context of a classifier cascade, demonstrating performance that far exceeds stateoftheart results in face detection on a standard data set. We also draw connections to work in semisupervised learning, domain adaptation, and information regularization. 1.
Efficient GraphBased SemiSupervised Learning of Structured Tagging Models
"... We describe a new scalable algorithm for semisupervised training of conditional random fields (CRF) and its application to partofspeech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domai ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
We describe a new scalable algorithm for semisupervised training of conditional random fields (CRF) and its application to partofspeech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy. 1
Discriminative Clustering by Regularized Information Maximization
"... Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive informationtheoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semisupervised learning. In particular, we instantiate the framework to unsupervised, multiclass kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1