MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  PAC-Bayesian generalisation error bounds for gaussian process classification (2002) [3 citations — 0 self]

Download:
pdf | ps
by Matthias Seeger
Journal of Machine Learning Research
http://www.dai.ed.ac.uk/homes/seeger/papers/pacbgp-tr.ps.gz
Add To MetaCart

Abstract:

Approximate Bayesian Gaussian process (GP) classication techniques are powerful nonparametric learning methods, similar in appearance and performance to support vector machines. Based on simple probabilistic models, they render interpretable results and can be embedded in Bayesian frameworks for model selection, feature selection, etc. In this paper, by applying the PAC-Bayesian theorem of McAllester (1999a), we prove distributionfree generalisation error bounds for a wide range of approximate Bayesian GP classication techniques. We also provide a new and much simplied proof for this powerful theorem, making use of the concept of convex duality which is a backbone of many machine learning techniques. We instantiate and test our bounds for two particular GPC techniques, including a recent sparse method which circumvents the unfavourable scaling of standard GP algorithms. As is shown in experiments on a real-world task, the bounds can be very tight for moderate training sample sizes. To the best of our knowledge, these results provide the tightest known distribution-free error bounds for approximate Bayesian GPC methods, giving a strong learning-theoretical justication for the use of these techniques.

Citations

4514 Statistical Learning Theory – Vapnik - 1998
1890 Matrix Analysis – Horn, Johnson - 1985
1410 Convex Analysis – Rockafellar - 1970
1364 A theory of the learnable – Valiant - 1984
1103 A Tutorial on Support Vector Machines for Pattern Recognition – Burges - 1998
961 Learning with kernels – Schölkopf, Smola - 2002
727 Spline Models for Observational Data – Wahba - 1990
630 An introduction to Support Vector Machines and other Kernel-based learning methods – Cristianini, Shawe-Taylor - 2000
589 Information Theory and Statistics – Kullback - 1959
428 A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations – Chernoff - 1995
389 The perceptron: A probabilistic model for information storage and organization in the brain – Rosenblatt - 1958
374 Information Theory: Coding Theorems for Discrete Memoryless Systems – Csiszár, Körner - 1982
267 Nonparametric regression and Generalized Linear Models – Green, Silverman - 1994
265 Stochastic Simulation – Ripley - 1987
208 Structural risk minimization over data-dependent hierarchies – Shawe-Taylor, Bartlett, et al. - 1996
204 Sparse bayesian learning and the relevance vector machine – Tipping
137 Prediction with Gaussian processes: From linear regression to linear prediction and beyond – Williams - 1997
123 Seeger M.: Using the Nyström Method to Speed Up Kernel Machines – Williams - 2001
113 A Family of Algorithms for Approximate Bayesian Inference – Minka - 2001
92 Bounds on the sample complexity of Ba.yesian learning using information theory and the VC dimension – Haussler, Kearns, et al. - 1991
78 Maximum entropy discrimination – Jaakkola, Meila, et al. - 1999
74 carlo implementation of gaussian process models for bayesian regression and classification – Neal - 1997
70 Stability and generalization – Bousquet, Elisseeff
70 Empirical margin distributions and bounding the generalization error of combined classifiers – Koltchinskii, Panchenko
67 Some pac-bayesian theorems – McAllester - 1998
64 Bayesian Gaussian Processes for Regression and Classification – Gibbs - 1997
58 A Measure of Asymptotic Eciency for Tests of a Hypothesis Based on the Sum of Observations – Cherno - 1952
56 Bayes factors and model uncertainty – Kass, Raftery - 1995
54 Sparse greedy gaussian process regression – Smola, Bartlett - 2001
51 A bayesian committee machine – Tresp
50 Fast sparse gaussian process methods: The informative vector machine – Lawrence, Seeger, et al. - 2002
50 Directional statistics – Mardia, Jupp - 2000
47 Pac-bayesian model averaging – McAllester - 1999
46 Sparse on-line Gaussian processes – Csató, Opper - 2002
41 Relating data compression and learnability – Littlestone, Warmuth - 1986
38 Hybrid adaptive splines – Luo, Wahba - 1997
38 Learning Kernel Classifiers – Herbrich - 2002
31 PAC-Bayesian stochastic model selection – McAllester - 2003
30 Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers – Seeger - 2000
25 Mutual information, metric entropy and cumulative relative entropy risk.” The Annals of Statistics – Haussler, Opper - 1997
19 Algorithmic luckiness – Herbrich, Williamson - 2002
16 From margin to sparsity – Graepel, Herbrich, et al. - 2001
12 Learning curves for Gaussian processes – Sollich - 1999
10 Information Theory for Continuous Systems, World Scientific, Singapore. Downloaded from http://ijr.sagepub.com at – Ihara - 1993
9 Learning Kernel Classi – Herbrich - 2001
9 Bounds for averaging classifiers – Langford, Seger
8 and André Elisseeff. Stability and generalization – Bousquet - 2002
6 Thore Graepel. A PAC-Bayesian margin bound for linear classi Why SVMs work – Herbrich - 2001
6 Bounds for averaging classi – Langford, Seeger - 2001
6 Generalized Linear Models. Number 37 – McCullach, Nelder - 1983