Results 1  10
of
11
A Statistical Theory of Active Learning
, 2013
"... Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learnin ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learning is to produce a highlyaccurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same. This article describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the article focuses on a particular technique, namely disagreement based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing
SURROGATE LOSSES IN PASSIVE AND ACTIVE LEARNING
"... Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 01 loss, ideally using fewer label requests than the number of random labeled data points sufficient to achieve the same. This work investigates the potential uses of surrogate loss functions in the context of active learning. Specifically, it presents an active learning algorithm based on an arbitrary classificationcalibrated surrogate loss function, along with an analysis of the number of label requests sufficient for the classifier returned by the algorithm to achieve a given risk under the 01 loss. Interestingly, these results cannot be obtained by simply optimizing the surrogate risk via active learning to an extent sufficient to provide a guarantee on the 01 loss, as is common practice in the analysis of surrogate losses for passive learning. Some of the results have additional implications for the use of surrogate losses in passive learning.
Active Learning with a Drifting Distribution
"... Abstract. We study the problem of active learning in a streambased setting, allowing the distribution of the examples to change over time. We prove upper bounds on the number of prediction mistakes and number of label requests for established disagreementbased active learning algorithms, both in t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We study the problem of active learning in a streambased setting, allowing the distribution of the examples to change over time. We prove upper bounds on the number of prediction mistakes and number of label requests for established disagreementbased active learning algorithms, both in the realizable case and under Tsybakov noise. We further prove minimax lower bounds for this problem. 1
Buyin Bulk Active Learning
, 2012
"... In many practical applications of active learning, it is more costeffective to request labels in large batches, rather than oneatatime. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the lab ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In many practical applications of active learning, it is more costeffective to request labels in large batches, rather than oneatatime. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning.
⎩ 1,
"... This document provides specifications of the estimators used in Subroutine 1, along with a formal proof of Lemma 1. 1. Specification of Estimators Following (Hanneke, 2009; 2012), we specify the estimators ˆ P used in the algorithm as follows. For convenience, we suppose we have access to two indepe ..."
Abstract
 Add to MetaCart
This document provides specifications of the estimators used in Subroutine 1, along with a formal proof of Lemma 1. 1. Specification of Estimators Following (Hanneke, 2009; 2012), we specify the estimators ˆ P used in the algorithm as follows. For convenience, we suppose we have access to two independent sequences W1 = {w1,w2,...} and W2 = {w ′ 1,w ′ 2,...} of independent Ddistributed random variables, with (W1,W2) independent of Z. Such sequences could easily be taken from the unlabeled data sequence in a preprocessing step, in which case we interpret the {Xi} ∞ i=1 sequence referenced in the algorithms as those points remaining in the pool after extracting the sequences W1 andW2. Fix anyH ⊆ C andm ∈ N. For anyk ∈ N, define S k−1 (H) = {S ∈ X k−1: H shatters S}. For any (x,y) ∈ X ×{−1,+1}, define ˆ Γ (1) m (x,y,W2,H) = 1 ⋂ h∈H {h(x)}(y) and ˆ ∆ (1) m (x,W2,H) = 1 S 1 (H)(x). For any k ∈ {2,...,d+1}, ∀i ∈ N, let S (k) i = {w ′ 1+(i−1)(k−1),...,w ′ i(k−1)}; then let M (k) m (H) = max
JMLR: Workshop and Conference Proceedings vol (2012) 1–12 Activized Learning with Uniform Classification Noise
"... We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition. This generalizes a similar resul ..."
Abstract
 Add to MetaCart
(Show Context)
We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition. This generalizes a similar result proven by (Hanneke, 2009) for the realizable case, and is the first result establishing that such general improvement guarantees are possible in the presence of restricted types of classification noise. 1.
Activized Learning with Uniform Classification Noise
, 2011
"... We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition. This generalizes a similar resul ..."
Abstract
 Add to MetaCart
(Show Context)
We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition. This generalizes a similar result proven by [Han09] for the realizable case, and is the first result establishing that such general improvement guarantees are possible in the presence of restricted types of classification noise. 1
SURROGATE LOSSES IN PASSIVE AND ACTIVE LEARNING BY STEVE HANNEKE ∗ AND LIU YANG†
"... Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0 ..."
Abstract
 Add to MetaCart
Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 01 loss, ideally using fewer label requests than the number of random labeled data points sufficient to achieve the same. This work investigates the potential uses of surrogate loss functions in the context of active learning. Specifically, it presents an active learning algorithm based on an arbitrary classificationcalibrated surrogate loss function, along with an analysis of the number of label requests sufficient for the classifier returned by the algorithm to achieve a given risk under the 01 loss. Interestingly, these results cannot be obtained by simply optimizing the surrogate risk via active learning to an extent sufficient to provide a guarantee on the 01 loss, as is common practice in the analysis of surrogate losses for passive learning. Some of the results have additional implications for the use of surrogate losses in passive learning.