Results 1  10
of
319
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract

Cited by 326 (1 self)
 Add to MetaCart
(Show Context)
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is wellmotivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, timeconsuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 125 (13 self)
 Add to MetaCart
(Show Context)
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
Combining Active Learning and SemiSupervised Learning Using Gaussian Fields and Harmonic Functions
 ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining
, 2003
"... Active and semisupervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semisupervi ..."
Abstract

Cited by 121 (5 self)
 Add to MetaCart
Active and semisupervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semisupervised learning problem is then formulated in terms of a Gaussian random field on this graph, the mean of which is characterized in terms of harmonic functions. Active learning is performed on top of the semisupervised learning scheme by greedily selecting queries from the unlabeled data to minimize the estimated expected classification error (risk); in the case of Gaussian fields the risk is efficiently computed using matrix methods. We present experimental results on synthetic data, handwritten digit recognition, and text classification tasks. The active learning scheme requires a much smaller number of queries to achieve high accuracy compared with random query selection. 1.
Bayesian inference and optimal design in the sparse linear model
 Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsityfavouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract

Cited by 111 (13 self)
 Add to MetaCart
(Show Context)
The linear model with sparsityfavouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from microarray expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
Toward community sensing.
 In ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN),
, 2008
"... Abstract A great opportunity exists to fuse information from populations of privatelyheld sensors to create useful sensing applications. For example, GPS devices, embedded in cellphones and automobiles, might one day be employed as distributed networks of velocity sensors for traffic monitoring an ..."
Abstract

Cited by 65 (8 self)
 Add to MetaCart
(Show Context)
Abstract A great opportunity exists to fuse information from populations of privatelyheld sensors to create useful sensing applications. For example, GPS devices, embedded in cellphones and automobiles, might one day be employed as distributed networks of velocity sensors for traffic monitoring and routing. Unfortunately, privacy and resource considerations limit access to such data streams. We describe principles of community sensing that offer mechanisms for sharing data from privately held sensors. The methods take into account the likely availability of sensors, the contextsensitive value of sensor information, based on models of phenomena and demand, and sensor owners' preferences about privacy and resource usage. We present efficient and wellcharacterized approximations of optimal sensing policies. We provide details on key principles of community sensing and highlight their use within a case study for road traffic monitoring.
Rational explanation of the selection task
 Psychological Review
, 1996
"... M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's ( ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
(Show Context)
M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's (1994b) or P. Pollard and J. St B T Evans's (1983) data. It is argued that an equivalent measure satisfies their normative concerns and that a modification of O&C's model accounts for their empirical concerns. D. Laming (1996) argues that O&C made unjustifiable psychological assumptions and that a "correct" Bayesian analysis agrees with logic. It is argued that O&C's model makes normative and psychological sense and that Laming's analysis is not Bayesian. A. Almor and S. A. Sloman (1996) argue that O&C cannot explain their data. It is argued that Almor and Sloman's data do not bear on O&C's model because they alter the nature of the task. It is concluded that O&C's model remains the most compelling and comprehensive account of the selection task. Research on Wason's (1966, 1968) selection task questions human rationality because performance is not "logically correct?' Recently, Oaksford and Chater (O&C; 1994) provided a rational analysis (Anderson, 1990, 1991) of the selection task that appeared to vindicate human rationality. O&C argued that the selection task is an inductive, rather than a deductive, reasoning task: Participants must assess the truth or falsity of a general rule from specific instances. In particular, participants face a problem of optimal data selection (Lindley, 1956): They must decide which of four cards (p, notp, q, or notq) is likely to provide the most useful data to test a conditional rule,/fp then q. The "logical " solution is to select the p and the notq cards. O&C argued that this solution presupposes falsificationism (Popper, 1959), which argues that only data that can disconfirm, not confirm, hypotheses are of interest. In contrast, O&C's rational analysis uses a Bayesian approach to inductive
Active Learning of Causal Bayes Net Structure
, 2001
"... We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC t ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
(Show Context)
We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC to estimate the posterior over graph structures, and use importance sampling to find the best action to perform at each step. We assume the data is discretevalued and fully observed.
Learning From Measurements in Exponential Families
"... Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decisiontheoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks. 1.
A Concept Exploration Method for Product Family Design
 in Mechanical Engineering. Atlanta, GA: Georgia Institute of Technology
, 1998
"... ii ..."
(Show Context)
The interplay of bayesian and frequentist analysis
 Statist. Sci
, 2004
"... Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fi ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fight has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility; Bayesian model checking; conditional frequentist; confidence intervals; consistency; coverage; design; hierarchical models; nonparametric