20 citations found. Retrieving documents...
D. A. McAllester, PAC-Bayesian model averaging, Proceedings of the 12th annual conference on Computational Learning Theory, Morgan Kaufmann, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
PAC-Bayes & Margins - Langford, Shawe-Taylor   (Correct)

....rate bound that depends on the margin distribution and is independent of the size of the base hypothesis class. 2) A new true error bound for classi ers with a margin which is simpler, functionally tighter, and more data dependent than all previous bounds. 1 Introduction PAC Bayes bounds [8] (improved in [7] and again in [10] are interesting for constructing bounds on future error rate in classi cation given only an assumption that examples are drawn independently from some (unknown) distribution. One drawback of PAC Bayes bounds is that they only apply strongly to stochastic ....

David McAllester, \PAC-Bayesian Model Averaging" COLT 1999.


A PAC bound for mixture discriminants - Seeger (2000)   (Correct)

....to COLT 2000. A PAC bound for mixture discriminants Matthias Seeger Institute for Adaptive and Neural Computation University of Edinburgh 5 Forrest Hill, Edinburgh EH1 2QL seeger dai.ed. ac.uk Abstract Recently, McAllester [13] proved a remarkable theorem which gives a PAC style bound on the generalization error of discriminants like the Gibbs classi er. We show how to combine this result with techniques proposed in [1] to arrive at a bound on the generalization error for arbitrary mixture discriminants over a ....

....applies to virtually any known mixture discriminant algorithm, such as the Bayes classi er or AdaBoost and variants. We show how our result genuinely motivates the recently proposed maximum entropy discrimination paradigm from a theoretical viewpoint. 1 Introduction Recently, McAllester [13] proved a remarkable theorem which gives a PAC style bound on the generalization error of a hypothesis randomly drawn from an arbitrary weighting distribution over hypothesis space. The bound applies in the following situation. A hypothesis space, a loss function and a prior distribution over the ....

[Article contains additional citation context not shown here]

David McAllester. PAC-Bayesian model averaging. In Conference on COLT, 1999.


Data-Dependent Bounds for Bayesian Mixture Methods - Meir, Zhang (2003)   (1 citation)  (Correct)

....i = 1, 2, note that i p i = 1) For each q, let i(q) be the largest index for which A i(q) D(q##) Therefore log(1 p i(q) O(log log(D(q##) e) 1) Substituting in the bound of Lemma 3.1 yields the result. # The results of Theorem 3 can be compared to those derived by McAllester [8] for the randomized Gibbs procedure. In the latter case, the first term on the r.h.s. is E h#Q L(h) namely the average empirical error of the base classifiers h. In our case the corresponding term is L(E h#Qh) namely the empirical error of the average hypothesis. Since E h#Qh is potentially ....

....error of the base classifiers h. In our case the corresponding term is L(E h#Qh) namely the empirical error of the average hypothesis. Since E h#Qh is potentially much more complex than any single h # H, we expect that the empirical term in (4) is much smaller than the corresponding term in [8]. Moreover, the complexity term we obtain is in fact tighter than the corresponding term in [8] by a logarithmic factor in n (although the logarithmic factor in [8] could probably be eliminated) We thus expect that Bayesian mixture approach advocated here leads to better performance guarantees. ....

[Article contains additional citation context not shown here]

D. A. McAllester. PAC-bayesian model averaging. In Proceedings of the twelfth Annual conference on Computational learning theory, New York, 1999. ACM Press.


PAC-Bayesian Theorems for Gaussian Process Classification - Seeger (2002)   (Correct)

....Abstract We present distribution free generalization error bounds which apply to a wide class of approximate Bayesian Gaussian process classi cation (GPC) techniques, powerful nonparametric learning methods similar to Support Vector machines. The bounds use the PACBayesian theorem [8] for which we provide a simpli ed proof, leading to new insights into its relation to traditional VC type union bound techniques. Experiments on the MNIST database show that our bounds can be very tight for moderate training sample sizes. Our proofs require only elementary concepts of ....

....process to produce independently and identically distributed (i.i.d. samples and make no assumptions about the data distribution. Traditional PAC techniques often cannot handle Bayesian methods which sample from or average over continuous function spaces, yet the recent PAC Bayesian theorem [8] overcomes these problems and is an important modern tool for PAC analyses of Bayesian methods. The structure of the paper is as follows. In the rest of this section, we introduce the notion of PAC bounds. In section 2 we provide a simpli ed proof of the PAC Most advantages of the Bayesian ....

[Article contains additional citation context not shown here]

David McAllester. PAC-Bayesian model averaging. In Conference on COLT, 1999.


An Improved Predictive Accuracy Bound for Averaging Classifiers - Langford, Seeger (2001)   (3 citations)  (Correct)

....zero. If H is nite, we will usually work with H = fh1 ; hkg for simplicity. For uncountable spaces, we de ne Q as P j q j (h; h j ) where (h; h j ) is the delta distribution centered on h j . We will improve on this bound in Section 3. 1 by employing the PAC Bayes bound from McAllester [14]. In the PAC Bayes setting, a classi er is also de ned by a distribution Q over the hypothesis space. However, each classi cation 3 is carried out according to a hypothesis sampled from Q rather than by the averaging classi er c de ned by Q. We are interested in the gap between the expected ....

....H is nite, we have 4 D(QkP ) k X j=1 q j ln q j p j ; 5) where Q = q 1 ; q k ) P = p 1 ; p k ) The relative entropy is an asymmetric distance measure between probability distributions, with D(QkP ) 0 if and only if Q = P almost everywhere. Theorem 2 (PAC Bayes [14]) Let P be any prior distribution over H and 2 (0; 1) With probability at least 1 over random samples S from D we have that for all distributions Q over the hypothesis space H: PrD;Q [h(x) 6= y] Pr S;Q [h(x) 6= y] r D(QjjP ) ln 1 ln m 2 2m 1 Here, PrD;Q [ is short for E h Q ....

[Article contains additional citation context not shown here]

David McAllester, \PAC-Bayesian Model Averaging " COLT 1999


Bounds for Averaging Classifiers - Langford, Seeger (2001)   (Correct)

....Pittsburgh, PA 15213 Abstract We present a generalized PAC bound for averaging classi ers which applies to base hypotheses with a bounded real valued output. In addition, we discuss several methods for quantitatively tightening the bound. In the process, a tightened version of the PAC Bayes bound [5] is proved. Keywords: PAC bound, Maximum entropy discrimination, averaging hypotheses 1 Introduction This paper is the technical companion for an accompanying ICML submission. As such, we are concerned here with the details of what can and can not be proved rather then the implications of ....

....C a(m) for all m. This margin bound implies that if most training examples have a large margin (i.e. t(x; y) for most (x; y) 2 S) and the hypothesis space is not too large, then the generalization error cannot be large. To improve on this bound, we employ a PAC Bayes bound from McAllester [5]. In the PAC Bayes setting, a classi er is also de ned by a distribution Q over the hypothesis space. However, each classi cation 2 , is carried out according to a hypothesis sampled from Q rather than by the averaging classi er c de ned by Q. We are interested in the gap between the expected ....

[Article contains additional citation context not shown here]

David McAllester, \PAC-Bayesian Model Averaging" COLT 1999


Formal Grammar and Information Theory: Together Again? - Pereira (2000)   (4 citations)  (Correct)

....a problem that has been proven to belong to one of the standard classes believed to require more than polynomial time on a deterministic sequential computer, for instance the NP hard problems. Article submitted to Royal Society Formal grammar and information theory 7 statistical learning theory (McAllester, 1999) may provide new theoretical impetus to that research direction, since they show that a prior over models can play a similar regularizing role to a combinatorial complexity measure. The other role for hidden variables, capturing uncertainty in the interpretation of particular experience, becomes ....

McAllester, D. A. (1999). PAC-bayesian model averaging. In Proceedings of Twelfth Annual Conference on Computational Learning Theory (p. 164-170). New York: ACM Press.


Analysis of Regularized Linear Functions for Classification Problems - Zhang (1999)   (Correct)

....naturally to general problems. The important feature of this theorem is its independent of the smoothness of the loss function itself. Note that in general, the covering numbers of the loss function depend on such smoothness characterized by the Lipschitz condition (see Theorem 2. 9) Recently in [25, 39], McAllester and Zhang studied randomized algorithms that select posterior distributions inducing small average risks under certain regularization conditions. The dimensional independentcovering number bounds provided in this paper explain naturally why these algorithms can give good ....

David McAllester. PAC-Bayesian model averaging. In COLT'99, pages 164-170, 1999.


Robust Bayes Point Machines - Herbrich, Graepel, Obermayer.. (2000)   (1 citation)  (Correct)

....to introduce simpler algorithms for approximating the Bayes Point in kernel space [12] Another interesting avenue of research would be to devise model selection strategies for determining both the kernel parameters and the values of . One first approach is to use a recent result by McAllester [3] giving bounds on the generalisation error for algorithms defining their hypotheses in terms of a posterior distribution. In our case, this reveals that the generalisation error is controlled by the volume of version space relative to the volume of the whole of parameter space. Hence, we suggest ....

D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, Santa Crux, USA, 1999.


PAC-Bayesian Model Averaging - McAllester (1999)   (15 citations)  Self-citation (Mcallester)   (Correct)

....a re normalization of the prior on a given set of concepts. Theorem 2 corresponds to a model averaging learning algorithm where one selects the distribution minimizing the bound. The construction of a distribution optimizing the bound is discussed in section 3. An earlier version of this paper [14] proved a version of theorem 2 as a corollary of a tighter bound which, for large m, has the form l(Q; S) E cQ q ln(dQ=dP j c ) 2m. This bound is tighter than l(Q; S) p D(QjjP ) 2m by the slack in Jensen s inequality. However, this tighter bound is more difficult to prove and seems ....

....that Q i 0 and P i 0 for all i. By Jensen s inequality we have ( P n i=1 Q i Delta i ) 2 P n i=1 Q i Delta 2 i . So it now suffices to prove that P n i=1 Q i Delta 2 i (D(QjjP ) ln K) fi. This is a consequence of the following lemma. 2 2 The original version of this paper [14] proved a bound of approximately the form l(Q) P n i=1 Q i p ln(Q i =P i ) 2m) by maximizing P n i=1 Q i Delta i subject to constraint 20. A version of theorem 2, which is of the form l(Q; S) p ( P n i=1 Q i ln(Q i =P i ) 2m) was then proved from this bound by an application ....

David McAllester. Pac-bayesian model averaging. In COLT-99, 1999.


Learning Theory and Language Modeling - McAllester, Schapire (2001)   (1 citation)  Self-citation (Mcallester)   (Correct)

....Bayesian assumptions. PAC Bayesian model averaging is similar to Bayesian model averaging in that it is based on a prior distribution on models but, unlike Bayesian model averaging, PAC Bayesian model averaging can be justified independent of Bayesian assumptions about the meaning of the prior (McAllester 1999). Unfortunately, neither the Bayesian approach nor the PAC Bayesian approach justify the particular form of the smoothing methods in langauge modeling that work well in practice. So the real theoretical challenge is to explain the superiority of the methods that are in fact empirically best. The ....

McAllester, D. (1999). PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory.


PAC-Bayesian Stochastic Model Selection - McAllester (2001)   (7 citations)  Self-citation (Mcallester)   (Correct)

....that Q i 0 and P i 0 for all i. By Jensen s inequality we have ( P n i=1 Q i Delta i ) 2 P n i=1 Q i Delta 2 i . So it now suffices to prove that P n i=1 Q i Delta 2 i (D(QjjP ) ln K) fi. This is a consequence of the following lemma. 1 1 The original version of this paper [16] proved a bound of approximately the form l(Q) P n i=1 Q i p ln(Q i =P i ) 2m) by maximizing P n i=1 Q i Delta i subject to constraint 10. A 13 Lemma 4 For fi 0, K 0, and Q; P; y 2 R n satisfying P i 0, Q i 0, and P n i=1 Q i = 1, if n X i=1 P i e fiy i K (11) then ....

David McAllester. Pac-bayesian model averaging. In COLT-99, 1999.


Computable Shell Decomposition Bounds - Langford, McAllester (2000)   (12 citations)  Self-citation (Mcallester)   (Correct)

....rates far from 1 2 should be an exponentially small fraction of the class. Hence we get that s(ddqee; is significantly less than ln jHj and theorem 3.3 is tighter than (10) The remainder of this section is a proof of lemma 3.2. Our departure point for the proof is the following lemma from [6]. Lemma 3.4 (McAllester 99) For any measure on any hypothesis class we have the following where E h f(h) denotes the expectation of f(h) under the given measure on h. 8 0 8 S E h e (2m 1) e(h) e(h) 2 4m Intuitively, this lemma states that with high confidence over the choice of the ....

David McAllester, "Pac-Bayesian model averaging ", COLT, 1999.


Computable Shell Decomposition Bounds - Langford, McAllester (2000)   (12 citations)  Self-citation (Mcallester)   (Correct)

....rates far from 1 2 should be an exponentially small fraction of the class. Hence we get that s(ddqee; ffi ) is significantly less than ln jHj and theorem 3.3 is tighter than (10) The remainder of this section is a proof of lemma 3.2. Our departure point for the proof is the following lemma from [6]. Lemma 3.4 (McAllester 99) For any measure on any hypothesis class we have the following where E h f(h) denotes the expectation of f(h) under the given measure on h. 8ffi 0 8 ffi S E h e (2m Gamma1) e(h) Gammae(h) 2 4m ffi Intuitively, this lemma states that with high confidence ....

David McAllester, "Pac-Bayesian model averaging ", COLT, 1999.


A Better Variance Control For Pac-Bayesian Classification - Audibert   (Correct)

No context found.

D. A. McAllester, PAC-Bayesian model averaging, Proceedings of the 12th annual conference on Computational Learning Theory, Morgan Kaufmann, 1999.


Suboptimal Behavior of Bayes and MDL in Classification.. - Grünwald, Langford   (Correct)

No context found.

D. McAllester. PAC-Bayesian model averaging. In Proceedings COLT '99, 1999.


Data-dependent generalization error bounds for (noisy).. - Audibert (2004)   (Correct)

No context found.

D. A. McAllester, PAC-Bayesian model averaging, Morgan Kaufmann Publishers, 1999.


A PAC-Bayesian approach to adaptive classification - Catoni (2003)   (Correct)

No context found.

D. A. McAllester, PAC-Bayesian Model Averaging, Proceedings of the Twelfth Annual Conference on Computational Learning Theory (Santa Cruz, CA, 1999.


Bayesian Gaussian Process Models: PAC-Bayesian Generalisation.. - Seeger (2003)   (3 citations)  (Correct)

No context found.

David McAllester. PAC-Bayesian model averaging. In Conference on Computational Learning Theory 12, pages 164--170, 1999.


PAC-Bayesian Generic Chaining - Jean-Yves Audibert Universit (2003)   (Correct)

No context found.

D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory. ACM Press, 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC