Results 1  10
of
142
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 116 (11 self)
 Add to MetaCart
(Show Context)
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 110 (44 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 64 (15 self)
 Add to MetaCart
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Toward Community Sensing
, 2008
"... A great opportunity exists to fuse information from populations of privatelyheld sensors to create useful sensing applications. For example, GPS devices, embedded in cellphones and automobiles, might one day be employed as distributed networks of velocity sensors for traffic monitoring and routing. ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
A great opportunity exists to fuse information from populations of privatelyheld sensors to create useful sensing applications. For example, GPS devices, embedded in cellphones and automobiles, might one day be employed as distributed networks of velocity sensors for traffic monitoring and routing. Unfortunately, privacy and resource considerations limit access to such data streams. We describe principles of community sensing that offer mechanisms for sharing data from privately held sensors. The methods take into account the likely availability of sensors, the contextsensitive value of sensor information, based on models of phenomena and demand, and sensor owners’ preferences about privacy and resource usage. We present efficient and wellcharacterized approximations of optimal sensing policies. We provide details on key principles of community sensing and highlight their use within a case study for road traffic monitoring.
An online algorithm for maximizing submodular functions
, 2007
"... We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submod ..."
Abstract

Cited by 58 (12 self)
 Add to MetaCart
(Show Context)
We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs nearoptimally according to two natural metrics: (i) the fraction of jobs completed within time T, for some fixed deadline T> 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1
Structured sparsityinducing norms through submodular functions
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2010
"... Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turnedinto a convex optimization problem byreplacing the cardinality function by its convex en ..."
Abstract

Cited by 58 (10 self)
 Add to MetaCart
Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turnedinto a convex optimization problem byreplacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1norm. In this paper, we investigate more general setfunctions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nonincreasing submodular setfunctions, the corresponding convex envelope can be obtained from its Lovász extension, a common tool in submodular analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or highdimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rankstatistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as nonfactorial priors for supervised learning.
Efficient Principled Learning of Thin Junction Trees
"... We present the first truly polynomial algorithm for PAClearning the structure of boundedtreewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewi ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
We present the first truly polynomial algorithm for PAClearning the structure of boundedtreewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixedsize subsets of variables, if the underlying distribution can be approximated by a boundedtreewidth junction tree. 1
Robust submodular observation selection
, 2008
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations wh ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NPcomplete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating nonunit observation cost or communication and path costs). We also show how the algorithm can be used to nearoptimally trade off expectedcase (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worstcase (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several realworld problems. For Gaussian Process regression, our algorithm compares favorably with stateoftheart heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDPbased algorithms.
Voila: Efficient featurevalue acquisition for classification
 In TwentySecond Conference on Artificial Intelligence (AAAI
, 2007
"... We address the problem of efficient featurevalue acquisition for classification in domains in which there are varying costs associated with both feature acquisition and misclassification. The objective is to minimize the sum of the information acquisition cost and misclassification cost. Any decisi ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
We address the problem of efficient featurevalue acquisition for classification in domains in which there are varying costs associated with both feature acquisition and misclassification. The objective is to minimize the sum of the information acquisition cost and misclassification cost. Any decision theoretic strategy tackling this problem needs to compute value of information for sets of features. Having calculated this information, different acquisition strategies are possible (acquiring one feature at time, acquiring features in sets, etc.). However, because the value of information calculation for arbitrary subsets of features is computationally intractable, most traditional approaches have been greedy, computing values of features one at a time. We make the problem of value of information calculation tractable in practice by introducing a novel data structure called the Value of Information Lattice (VOILA). VOILA exploits dependencies between missing features and makes sharing of information value computations between different feature subsets possible. To the best of our knowledge, performance differences between greedy acquisition, acquiring features in sets, and a mixed strategy have not been investigated empirically in the past, due to inherit intractability of the problem. With the help of VOILA, we are able to evaluate these strategies on five real world datasets under various cost assumptions. We show that VOILA reduces computation time dramatically. We also show that the mixed strategy outperforms both greedy acquisition and acquisition in sets.
Optimal Value of Information in Graphical Models
"... Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristicguided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is NP PPhard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #Pcomplete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several realworld datasets, including a prototype sensor network deployment for energy conservation in buildings. 1.