Results 1  10
of
56
Practical bayesian optimization of machine learning algorithms
, 2012
"... In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at ..."
Abstract

Cited by 130 (16 self)
 Add to MetaCart
(Show Context)
In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at
Entropy Search for InformationEfficient Global Optimization
"... Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizer ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.
Multitask Bayesian optimization
 In: Proceedings of NIPS; 2013
"... (Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
(Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.
Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks
"... Markov jump processes and continuous time Bayesian networks are important classes of continuous time dynamical systems. In this paper, we tackle the problem of inferring unobserved paths in these models by introducing a fast auxiliary variable Gibbs sampler. Our approach is based on the idea of unif ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
Markov jump processes and continuous time Bayesian networks are important classes of continuous time dynamical systems. In this paper, we tackle the problem of inferring unobserved paths in these models by introducing a fast auxiliary variable Gibbs sampler. Our approach is based on the idea of uniformization, and sets up a Markov chain over paths by sampling a finite set of virtual jump times and then running a standard hidden Markov model forward filteringbackward sampling algorithm over states at the set of extant and virtual jump times. We demonstrate significant computational benefits over a stateoftheart Gibbs sampler on a number of continuous time Bayesian networks. 1
Gaussian Process Kernels for Pattern Discovery and Extrapolation
"... Gaussian processes are rich distributions over functions, which provide a Bayesian nonparametric approach to smoothing and interpolation. We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation. These kernels are derived by model ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Gaussian processes are rich distributions over functions, which provide a Bayesian nonparametric approach to smoothing and interpolation. We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation. These kernels are derived by modelling a spectral density – the Fourier transform of a kernel – with a Gaussian mixture. The proposed kernels support a broad class of stationary covariances, but Gaussian process inference remains simple and analytic. We demonstrate the proposed kernels by discovering patterns and performing long range extrapolation on synthetic examples, as well as atmospheric CO2 trends and airline passenger data. We also show that it is possible to reconstruct several popular standard covariances within our framework. 1.
Bayesian Nonparametric Covariance Regression
, 1101
"... Summary. Although there is a rich literature on methods for allowing the variance in a univariate regression model to vary with predictors, time and other factors, relatively little has been done in the multivariate case. Our focus is on developing a class of nonparametric covariance regression mode ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Summary. Although there is a rich literature on methods for allowing the variance in a univariate regression model to vary with predictors, time and other factors, relatively little has been done in the multivariate case. Our focus is on developing a class of nonparametric covariance regression models, which allow an unknown p × p covariance matrix to change flexibly with predictors. The proposed modeling framework induces a prior on a collection of covariance matrices indexed by predictors through priors for predictordependent loadings matrices in a factor model. In particular, the predictordependent loadings are characterized as a sparse combination of a collection of unknown dictionary functions (e.g, Gaussian process random functions). The induced covariance is then a regularized quadratic function of these dictionary elements. Our proposed framework leads to a highlyflexible, but computationally tractable formulation with simple conjugate posterior updates that can readily handle missing data. Theoretical properties are discussed and the methods are illustrated through simulations studies and an application to the Google Flu Trends data.
PROBABILISTIC PREDICTION OF NEUROLOGICAL DISORDERS WITH A STATISTICAL ASSESSMENT OF NEUROIMAGING DATA MODALITIES
 SUBMITTED TO THE ANNALS OF APPLIED STATISTICS
, 2012
"... For many neurological disorders, prediction of disease state is an important clinical aim. Neuroimaging provides detailed information about brain structure and function from which such predictions may be statistically derived. A multinomial logit model with Gaussian process priors is proposed to: (i ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
For many neurological disorders, prediction of disease state is an important clinical aim. Neuroimaging provides detailed information about brain structure and function from which such predictions may be statistically derived. A multinomial logit model with Gaussian process priors is proposed to: (i) predict disease state based on wholebrain neuroimaging data and (ii) analyze the relative informativeness of different image modalities and brain regions. Advanced Markov chain Monte Carlo methods are employed to perform posterior inference over the model. This paper reports a statistical assessment of multiple neuroimaging modalities applied to the discrimination of three Parkinsonian neurological disorders from one another and healthy controls, showing promising predictive performance of disease states when compared to non probabilistic classifiers based on multiple modalities. The statistical analysis also quantifies the relative importance of different neuroimaging measures and brain regions in discriminating between these diseases and suggests that for prediction there is little benefit in acquiring multiple neuroimaging sequences. Finally, the predictive capability of different brain regions is found to be in accordance with the regional pathology of the diseases as reported in the clinical literature.
Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
 In NIPS Workshop on Bayesian Optimization in Theory and Practice
, 2013
"... Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyper ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyperparameter optimization and use it to compare Spearmint, TPE, and SMAC, three recent Bayesian optimization methods for hyperparameter optimization. 1
ODE parameter inference using adaptive gradient matching with Gaussian processes
 Sixteenth International Conference on Artificial Intelligence and Statistics; AISTATS
, 2013
"... Parameter inference in mechanistic models based on systems of coupled differential equations is a topical yet computationally challenging problem, due to the need to follow each parameter adaptation with a numerical integration of the differential equations. Techniques based on gradient matching, wh ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Parameter inference in mechanistic models based on systems of coupled differential equations is a topical yet computationally challenging problem, due to the need to follow each parameter adaptation with a numerical integration of the differential equations. Techniques based on gradient matching, which aim to minimize the discrepancy between the slope of a data interpolant and the derivatives predicted from the differential equations, offer a computationally appealing shortcut to the inference problem. The present paper discusses a method based on nonparametric Bayesian statistics with Gaussian processes due to Calderhead et al. (2008), and shows how inference in this model can be substantially improved by consistently sampling from the joint distribution of the ODE parameters and GP hyperparameters. We demonstrate the efficiency of our adaptive gradient matching technique on three benchmark systems, and perform a detailed comparison with the method in Calderhead et al. (2008) and the explicit ODE integration approach, both in terms of parameter inference accuracy and in terms of computational efficiency. 1
A comparative evaluation of stochasticbased inference methods for Gaussian process models
 Machine Learning
, 2013
"... Abstract Gaussian Process (GP) models are extensively used in data analysis given their flexible modeling capabilities and interpretability. The fully Bayesian treatment of GP models is analytically intractable, and therefore it is necessary to resort to either deterministic or stochastic approxima ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract Gaussian Process (GP) models are extensively used in data analysis given their flexible modeling capabilities and interpretability. The fully Bayesian treatment of GP models is analytically intractable, and therefore it is necessary to resort to either deterministic or stochastic approximations. This paper focuses on stochasticbased inference techniques. After discussing the challenges associated with the fully Bayesian treatment of GP models, a number of inference strategies based on Markov chain Monte Carlo methods are presented and rigorously assessed. In particular, strategies based on efficient parameterizations and efficient proposal mechanisms are extensively compared on simulated and real data on the basis of convergence speed, sampling efficiency, and computational cost.