Results 1 - 10
of
38
Agnostic active learning
- In ICML
, 2006
"... We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement ..."
Abstract
-
Cited by 80 (10 self)
- Add to MetaCart
We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement (i.e., requires only O � ln 1 ɛ samples to find an ɛ-optimal classifier) over the usual sample complexity of supervised learning, for several settings considered before in the realizable case. These include learning threshold classifiers and learning homogeneous linear separators with respect to an input distribution which is uniform over the unit sphere. 1.
Analysis of perceptron-based active learning
- In COLT
, 2005
"... Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron ..."
Abstract
-
Cited by 51 (12 self)
- Add to MetaCart
Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error ffl after asking for just ~O(d log 1ffl) labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm. 1 Introduction In many machine learning applications, unlabeled data is abundant but labelingis expensive. This distinction is not captured in the standard PAC or online models of supervised learning, and has motivated the field of active learning, inwhich the labels of data points are initially hidden, and the learner must pay for each label it wishes revealed. If query points are chosen randomly, the numberof labels needed to reach a target generalization error ffl, at a target confidencelevel 1- ffi, is similar to the sample complexity of supervised learning. The hopeis that there are alternative querying strategies which require significantly fewer
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
A general agnostic active learning algorithm
, 2007
"... We present a simple, agnostic active learning algorithm that works for any hypothesis class of bounded VC dimension, and any data distribution. Our algorithm extends a scheme of Cohn, Atlas, and Ladner to the agnostic setting, by (1) reformulating it using a reduction to supervised learning and (2) ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
We present a simple, agnostic active learning algorithm that works for any hypothesis class of bounded VC dimension, and any data distribution. Our algorithm extends a scheme of Cohn, Atlas, and Ladner to the agnostic setting, by (1) reformulating it using a reduction to supervised learning and (2) showing how to apply generalization bounds even for the non-i.i.d. samples that result from selective sampling. We provide a general characterization of the label complexity of our algorithm. This quantity is never more than the usual PAC sample complexity of supervised learning, and is exponentially smaller for some hypothesis classes and distributions. We also demonstrate improvements experimentally.
The True Sample Complexity of Active Learning
"... We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we find that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we find that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts with the traditional analysis of active learning problems such as non-homogeneous linear separators or depth-limited decision trees, in which Ω(1/ɛ) lower bounds are common; we point out that such results must be interpreted carefully, and that finding an ɛ-good classifier can always be accomplished with a number of samples asymptotically smaller than any such bound. These new insights arise from a subtle variation on the traditional definition of sample complexity, not previously recognized in the active learning literature. 1
Importance Weighted Active Learning
"... We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning p ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process. 1.
Hierarchical sampling for active learning
- Proceedings of the 25th International Conference on Machine learning
, 2008
"... We present an active learning scheme that exploits cluster structure in data. 1. ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We present an active learning scheme that exploits cluster structure in data. 1.
Theoretical Foundations of Active Learning
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning. viAcknowledgments There are so many people I am indebted to for helping to make this thesis, and indeed my entire career, possible. To begin, I am grateful to the faculty of Webster University, where my journey into science truly began. Support from the teachers I was privileged to have there, including Gary Coffman, Britt-Marie Schiller, Ed and Anna B. Sakurai, and John Aleshunas, to name a few, inspired in me a deep curiosity
RATES OF CONVERGENCE IN ACTIVE LEARNING
- SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.
Efficient learning with partially observed attributes
- In ICML, 2010. Kun
"... We describe and analyze efficient algorithms for learning a linear predictor from examples when the learner can only view a few attributes of each training example. This is the case, for instance, in medical research, where each patient participating in the experiment is only willing to go through a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We describe and analyze efficient algorithms for learning a linear predictor from examples when the learner can only view a few attributes of each training example. This is the case, for instance, in medical research, where each patient participating in the experiment is only willing to go through a small number of tests. Our analysis bounds the number of additional examples sufficient to compensate for the lack of full information on each training example. We demonstrate the efficiency of our algorithms by showing that when running on digit recognition data, they obtain a high prediction accuracy even when the learner gets to see only four pixels of each image. 1.

