Results 1 -
8 of
8
Semi-Supervised Learning Literature Survey
, 2006
"... We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter ..."
Abstract
-
Cited by 268 (7 self)
- Add to MetaCart
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Semi-supervised regression with co-training style algorithms
, 2007
"... The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-sup ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning has attracted much attention. Previous research on semi-supervised learning mainly focuses on semi-supervised classification. Although regression is almost as important as classification, semi-supervised regression is largely understudied. In particular, although co-training is a main paradigm in semi-supervised learning, few works has been devoted to co-training style semi-supervised regression algorithms. In this paper, a co-training style semi-supervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.
Semi-supervised learning with very few labeled training examples
- Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07
, 2007
"... In semi-supervised learning, a number of labeled examples are usually required for training an initial weakly useful predictor which is in turn used for exploiting the unlabeled examples. However, in many real-world applications there may exist very few labeled training examples, which makes the wea ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In semi-supervised learning, a number of labeled examples are usually required for training an initial weakly useful predictor which is in turn used for exploiting the unlabeled examples. However, in many real-world applications there may exist very few labeled training examples, which makes the weakly useful predictor difficult to generate, and therefore these semisupervised learning methods cannot be applied. This paper proposes a method working under a two-view setting. By taking advantages of the correlations between the views using canonical component analysis, the proposed method can perform semi-supervised learning with only one labeled training example. Experiments and an application to content-based image retrieval validate the effectiveness of the proposed method.
Semi-supervised structured output learning based on a hybrid generative and discriminative approach
- IN: PROCEEDINGS OF THE 45RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL’07), PRAGUE, CZECH, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2007) 791�800
, 2007
"... This paper proposes a framework for semi-supervised structured output learning (SOL), specifically for sequence labeling, based on a hybrid generative and discriminative approach. We define the objective function of our hybrid model, which is written in log-linear form, by discriminatively combining ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper proposes a framework for semi-supervised structured output learning (SOL), specifically for sequence labeling, based on a hybrid generative and discriminative approach. We define the objective function of our hybrid model, which is written in log-linear form, by discriminatively combining discriminative structured predictor(s) with generative model(s) that incorporate unlabeled data. Then, unlabeled data is used in a generative manner to increase the sum of the discriminant functions for all outputs during the parameter estimation. Experiments on named entity recognition (CoNLL-2003) and syntactic chunking (CoNLL-2000) data show that our hybrid model significantly outperforms the state-of-the-art performance obtained with supervised SOL methods, such as conditional random fields (CRFs).
Exponential Family Hybrid Semi-Supervised Learning
"... We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several d ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice. 1
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Label Classification Using Conditional Dependency Networks
"... In this paper, we tackle the challenges of multilabel classification by developing a general conditional dependency network model. The proposed model is a cyclic directed graphical model, which provides an intuitive representation for the dependencies among multiple label variables, and a well integ ..."
Abstract
- Add to MetaCart
In this paper, we tackle the challenges of multilabel classification by developing a general conditional dependency network model. The proposed model is a cyclic directed graphical model, which provides an intuitive representation for the dependencies among multiple label variables, and a well integrated framework for efficient model training using binary classifiers and label predictions using Gibbs sampling inference. Our experiments show the proposed conditional model can effectively exploit the label dependency to improve multi-label classification performance. 1
Semi-supervised or Semi-unsupervised?
"... We are interested in learning something using both labeled and unlabeled data, or else we wouldn’t be at this workshop. The question I’d like to think about is: why do we want to do this? Is it: 1. because we think that adding a little labeled data to our pile of unlabeled data will help; or 2. beca ..."
Abstract
- Add to MetaCart
We are interested in learning something using both labeled and unlabeled data, or else we wouldn’t be at this workshop. The question I’d like to think about is: why do we want to do this? Is it: 1. because we think that adding a little labeled data to our pile of unlabeled data will help; or 2. because we think that adding a little unlabeled data to our pile of labeled data will help? Typical approaches in NLP to the unlabeled+labeled problem fall into one of these two categories. In the first case, we basically have some unsupervised learning system that we know does fairly well on its own, and we’re adding labeled data just to help it tweak things in a slightly better way. In the second case, we basically have some

