Results 1  10
of
150
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
 Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract

Cited by 228 (4 self)
 Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of nonlinear statistical techniques that can be used for...
Bayesian Classification with Gaussian Processes
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined wi ..."
Abstract

Cited by 179 (1 self)
 Add to MetaCart
We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points.
Gaussian processes for ordinal regression
 Journal of Machine Learning Research
, 2004
"... We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation ..."
Abstract

Cited by 116 (4 self)
 Add to MetaCart
(Show Context)
We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and realworld data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.
A Bayesian Committee Machine
 NEURAL COMPUTATION
, 2000
"... The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks a ..."
Abstract

Cited by 96 (7 self)
 Add to MetaCart
The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks and smoothing splines for which the degrees of freedom increase with the number of training data. Somewhat surprisingly, we nd that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator. The BCM also provides a new solution for online learning with potential applications to data mining. We apply the BCM to systems with fixed basis functions and discuss its relationship to Gaussian process regression. Finally, we also show how the ideas behind the BCM can be applied in a nonBayesian setting to extend the input dependent combination of estimators.
Gaussian Processes for Classification: Mean Field Algorithms
 Neural Computation
, 1999
"... We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed wit ..."
Abstract

Cited by 96 (14 self)
 Add to MetaCart
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler `naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leaveoneout estimator for model selection and 2. the builtin leaveoneout estimators are extremely precise when compared to the exact leaveoneout estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach. 1 1
Gaussian processes for machine learning
 International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. ..."
Abstract

Cited by 92 (14 self)
 Add to MetaCart
(Show Context)
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible nonparametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep
Regression with Inputdependent Noise: A Gaussian Process Treatment
 In Advances in Neural Information Processing Systems 10
, 1998
"... Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noisefree output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that wellapproximates the true variance. 1 Background and Motivation A very natural approach to regression problems is to place a prior on the kinds of function that we expect, and then after observing the data to obtain a posterior. The prior can be obtained by placing ...
Gaussian Processes  A Replacement for Supervised Neural Networks?
"... These lecture notes are based on the work of Neal (1996), Williams and ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
These lecture notes are based on the work of Neal (1996), Williams and
Bayesian model selection for Support Vector machines, Gaussian processes and other kernel classifiers
"... We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training ca ..."
Abstract

Cited by 56 (6 self)
 Add to MetaCart
(Show Context)
We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility to use sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models.
Dependent Gaussian processes
 In NIPS
, 2004
"... Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white no ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise the kernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs. 1