Results 1  10
of
125
Practical bayesian optimization of machine learning algorithms
, 2012
"... In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at ..."
Abstract

Cited by 130 (16 self)
 Add to MetaCart
(Show Context)
In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at
A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning
, 2010
"... We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased se ..."
Abstract

Cited by 91 (11 self)
 Add to MetaCart
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments—active user modelling with preferences, and hierarchical reinforcement learning— and a discussion of the pros and cons of Bayesian optimization based on our experiences.
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 70 (15 self)
 Add to MetaCart
(Show Context)
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Algorithms for hyperparameter optimization
 In NIPS
, 2011
"... Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyperparameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyperparameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (yx) in which many elements of hyperparameter assignment (x) are known to be irrelevant given particular values of other elements. 1
Convergence rates of efficient global optimization algorithms
 Journal of Machine Learning Research
, 2011
"... In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
(Show Context)
In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussianprocess prior, which determines an associated space of functions, its reproducingkernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
AutoWEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
"... Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previo ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
(Show Context)
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA’s standard distribution, spanning 2 ensemble methods, 10 metamethods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR10, we show classification performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help nonexpert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
Efficient Optimal Learning for Contextual Bandits
"... We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal policy ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal policy given rewards for all actions for each x. The algorithm has running time polylog(N), where N is the number of policies that we compete with. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work. 1.
Entropy Search for InformationEfficient Global Optimization
"... Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizer ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.
On bayesian upper confidence bounds for bandit problems
 In AISTATS
, 2012
"... Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second per ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed BayesUCB, satisfies finitetime regret bounds that imply its asymptotic optimality. More generally, BayesUCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multiarmed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling. 1
Informationtheoretic regret bounds for Gaussian process optimization in the bandit setting. Information Theory
 IEEE Transactions on
, 2012
"... Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve th ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound ( algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, compares favorably with other heuristical GP optimization approaches. Index Terms—Bandit problems, Bayesian prediction, experimental design, Gaussian process (GP), information gain,