Results 1  10
of
24
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 125 (13 self)
 Add to MetaCart
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 70 (15 self)
 Add to MetaCart
(Show Context)
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Convergence rates of efficient global optimization algorithms
 Journal of Machine Learning Research
, 2011
"... In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussianprocess prior, which determines an associated space of functions, its reproducingkernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
Entropy Search for InformationEfficient Global Optimization
"... Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizer ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.
Informationtheoretic regret bounds for Gaussian process optimization in the bandit setting. Information Theory
 IEEE Transactions on
, 2012
"... Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve th ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound ( algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, compares favorably with other heuristical GP optimization approaches. Index Terms—Bandit problems, Bayesian prediction, experimental design, Gaussian process (GP), information gain,
Portfolio Allocation for Bayesian Optimization
"... Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods t ..."
Abstract

Cited by 23 (14 self)
 Add to MetaCart
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multiarmed bandit strategy. We propose several portfolio strategies, the best of which we call GPHedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm’s performance. 1
Contextual Gaussian Process Bandit Optimization
"... How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at each round, we receive context (about the experimental conditions, the query), and have to choose an action (parameters, documents). The key challenge is to trade off exploration by gathering data for estimating the mean payoff function over the contextaction space, and to exploit by choosing an action deemed optimal based on the gathered data. We model the payoff function as a sample from a Gaussian process defined over the joint contextaction space, and develop CGPUCB, an intuitive upperconfidence style algorithm. We show that by mixing and matching kernels for contexts and actions, CGPUCB can handle a variety of practical applications. We further provide generic tools for deriving regret bounds when using such composite kernel functions. Lastly, we evaluate our algorithm on two case studies, in the context of automated vaccine design and sensor management. We show that contextsensitive optimization outperforms no or naive use of context. 1
Hierarchical Knowledge Gradient for Sequential Sampling
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2011
"... We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in settings such as (i) ranking and selection, (ii) simulation optimization where ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on MonteCarlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledgegradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit.
Robust Gaussian processbased global optimization using a fully Bayesian expected
"... Abstract. We consider the problem of optimizing a realvalued continuous function f, which is supposed to be expensive to evaluate and, consequently, can only be evaluated a limited number of times. This article focuses on the Bayesian approach to this problem, which consists in combining evaluati ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of optimizing a realvalued continuous function f, which is supposed to be expensive to evaluate and, consequently, can only be evaluated a limited number of times. This article focuses on the Bayesian approach to this problem, which consists in combining evaluation results and prior information about f in order to efficiently select new evaluation points, as long as the budget for evaluations is not exhausted. The algorithm called efficient global optimization (EGO), proposed by Jones, Schonlau and Welch (J. Global Optim., 13(4):455–492, 1998), is one of the most popular Bayesian optimization algorithms. It is based on a sampling criterion called the expected improvement (EI), which assumes a Gaussian process prior about f. In the EGO algorithm, the parameters of the covariance of the Gaussian process are estimated from the evaluation results by maximum likelihood, and these parameters are then plugged in the EI sampling criterion. However, it is wellknown that this plugin strategy can lead to very disappointing results when the evaluation results do not carry enough information about f to estimate the parameters in a satisfactory manner. We advocate a fully Bayesian approach to this problem, and derive an analytical expression for the EI criterion in the case of Student predictive distributions. Numerical experiments show that the fully Bayesian approach makes EIbased optimization more robust while maintaining an average loss similar to that of the EGO algorithm. 1