Results 1  10
of
20
Coregularization Based Semisupervised Domain Adaptation
"... This paper presents a coregularization based approach to semisupervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a coregularization based approach to semisupervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source to target. This semisupervised approach to domain adaptation is extremely simple to implement and can be applied as a preprocessing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as few other representative baseline approaches. 1
Towards Making Unlabeled Data Never Hurt
"... It is usually expected that, when labeled data are limited, the learning performance can be improved by exploiting unlabeled data. In many cases, however, the performances of current semisupervised learning approaches may be even worse than purely using the limited labeled data. It is desired to ha ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
It is usually expected that, when labeled data are limited, the learning performance can be improved by exploiting unlabeled data. In many cases, however, the performances of current semisupervised learning approaches may be even worse than purely using the limited labeled data. It is desired to have safe semisupervised learning approaches which never degenerate learning performance by using unlabeled data. In this paper, we focus on semisupervised support vector machines (S3VMs) and propose S4VMs, i.e., safe S3VMs. Unlike S3VMs which typically aim at approaching an optimal lowdensity separator, S4VMs try to exploit the candidate lowdensity separators simultaneously to reduce the risk of identifying a poor separator with unlabeled data. We describe two implementations of S4VMs, and our comprehensive experiments show that the overall performance of S4VMs are highly competitive to S3VMs, while in contrast to S3VMs which degenerate performance in many cases, S4VMs are never significantly inferior to inductive SVMs. 1.
Empirical risk minimization for probabilistic grammars: Sample complexity and hardness of learning
 Computational Linguistics
, 2012
"... Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammar ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the logloss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distributiondependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NPhard. We therefore suggest an approximate algorithm, similar to expectationmaximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotated
Learning largemargin halfspaces with more malicious noise
, 2011
"... We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown ndimensional γmargin halfspace to accuracy 1 − ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ √ log(1/γ)). Previous efficient algorithms could only learn to accuracy ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown ndimensional γmargin halfspace to accuracy 1 − ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ √ log(1/γ)). Previous efficient algorithms could only learn to accuracy ε in the presence of malicious noise of rate at most Θ(εγ). Our algorithm does not work by optimizing a convex loss function. We show that no algorithm for learning γmargin halfspaces that minimizes a convex proxy for misclassification error can tolerate malicious noise at a rate greater than Θ(εγ); this may partially explain why previous algorithms could not achieve the higher noise tolerance of our new algorithm.
Asymptotic Analysis of Generative SemiSupervised Learning
"... Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semisupervised learning. In doing so, we complement distributionfre ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semisupervised learning. In doing so, we complement distributionfree analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP. 1.
On the complexity of trial and error
, 2013
"... Motivated by certain applications from physics, biochemistry, economics, and computer science in which the objects under investigation are unknown or not directly accessible because of various limitations, we propose a trialanderror model to examine search problems in which inputs are unknown. M ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Motivated by certain applications from physics, biochemistry, economics, and computer science in which the objects under investigation are unknown or not directly accessible because of various limitations, we propose a trialanderror model to examine search problems in which inputs are unknown. More specifically, we consider constraint satisfaction problems i Ci, where the constraints Ci are hidden, and the goal is to find a solution satisfying all constraints. We can adaptively propose a candidate solution (i.e., trial), and there is a verification oracle that either confirms that it is a valid solution, or returns the index i of a violated constraint (i.e., error), with the exact content of Ci still hidden. We studied the time and trial complexities of a number of
Smart PAClearners
 Theoretical Computer Science
"... The PAClearning model is distributionindependent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are val ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
The PAClearning model is distributionindependent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are valid uniformly for every domain distribution. These bounds are (almost) tight in the sense that there is a domain distribution which does not admit a generalization error being significantly smaller than the general bound. Note however that this leaves open the possibility to achieve the learning goal faster if the underlying distribution is “simple”. Informally speaking, we say a PAClearner L is “smart ” if, for a “vast majority ” of domain distributions D, L does not require significantly more examples to reach the “learning goal ” than the best learner whose strategy is specialized to D. In this paper, focusing on sample complexity and ignoring computational issues, we show that smart learners do exist. This implies (at least from an informationtheoretical perspective) that full prior knowledge of the domain distribution (or access to a huge collection of unlabeled examples) does (for a vast majority of domain distributions) not significantly reduce the number of labeled examples required to achieve the learning goal. 1
Efficient Semisupervised and Active Learning of Disjunctions
, 2013
"... We provide efficient algorithms for learning disjunctions in the semisupervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005). We prove bounds on the sample complexity of our algorithms under a mild restriction on the data distribution. We also give an active ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We provide efficient algorithms for learning disjunctions in the semisupervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005). We prove bounds on the sample complexity of our algorithms under a mild restriction on the data distribution. We also give an active learning algorithm with improved sample complexity and extend all our algorithms to the random classification noise setting.
ProbabilityOne Homotopy Maps for Tracking Constrained Clustering Solutions
"... semisupervised learning Abstract. Modern machine learning problems typically have multiple criteria, but there is currently no systematic mathematical theory to guide the design of formulations and exploration of alternatives. Homotopy methods are a promising approach to characterize solution spaces ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
semisupervised learning Abstract. Modern machine learning problems typically have multiple criteria, but there is currently no systematic mathematical theory to guide the design of formulations and exploration of alternatives. Homotopy methods are a promising approach to characterize solution spaces by smoothly tracking solutions from one formulation (typically an “easy” problem) to another (typically a “hard ” problem). New results in constructing homotopy maps for constrained clustering problems are here presented, which combine quadratic loss functions with discrete evaluations of constraint violations are presented. These maps balance requirements of locality in clusters as well as those of discrete mustlink and mustnotlink constraints. Experimental results demonstrate advantages in tracking solutions compared to stateoftheart constrained clustering algorithms.
CoTraining with Insufficient Views
"... Cotraining is a famous semisupervised learning paradigm exploiting unlabeled data with two views. Most previous theoretical analyses on cotraining are based on the assumption that each of the views is sufficient to correctly predict the label. However, this assumption can hardly be met in real ap ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Cotraining is a famous semisupervised learning paradigm exploiting unlabeled data with two views. Most previous theoretical analyses on cotraining are based on the assumption that each of the views is sufficient to correctly predict the label. However, this assumption can hardly be met in real applications due to feature corruption or various feature noise. In this paper, we present the theoretical analysis on cotraining when neither view is sufficient. We define the diversity between the two views with respect to the confidence of prediction and prove that if the two views have large diversity, cotraining is able to improve the learning performance by exploiting unlabeled data even with insufficient views. We also discuss the relationship between view insufficiency and diversity, and give some implications for understanding of the difference between cotraining and coregularization.