32 citations found. Retrieving documents...
M. Anthony and J. Shawe-Taylor. A result of vapnik with applications. Discrete Applied Mathematics, 47(2):207--217, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Complexity Regularization via Localized Random Penalties - Lugosi, Wegkamp (2002)   (7 citations)  (Correct)

....every n and for all distributions of (X; Y ) In particular, k flog S k (2n) 2 log(nk)g The proof uses Lemma 2. 1 and the following uniform deviation bound due to Vapnik and Chervonenkis [27] The slighly improved form used here is proved by Anthony, and Shawe Taylor [1]. Proposition 3.2. Let S k (X 1 ) be the random shatter coecient of A k based on i.i.d. observations X 1 ; X 2n de ned in (1.3) For all 0 and n 1, 1 ) exp( n =4) 3.1) and 1 ) exp( n =4) 3.2) Proof. Observe that for all 0 and n 1, L(f) b ....

....X 1 ; X 2n de ned in (1.3) For all 0 and n 1, 1 ) exp( n =4) 3.1) and 1 ) exp( n =4) 3.2) Proof. Observe that for all 0 and n 1, L(f) b L(f) and similarly, b L(f) The proposition follows by Anthony, and Shawe Taylor [1]. 2 Proof of Theorem 3.1. We start with the proof of the rst inequality of Theorem 3.1. In view of Lemma 2.1, it suces to show that PfL( b f k ) b L( b f k ) b C k g 8= nk) Consequently, by (3.1) L( b f k ) L( b f k ) 2 b L( b f k ) 8 = so that Pf b C k e C k ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.


Margins and Combined Classifiers - Mason (1999)   (Correct)

....rate of convergence on the probability that the expected value of any function in is much larger than twice its sample average. For any class of functions : X # Y ##0; 1# # (# # :ED (x; y) # 2E S (x; y) # 4 # (2m)e #m =# The result follows from the following result (Theorem 2. 1 in [3]) due to Vapnik and Chervonenkis [71] For any class of functions : X # Y ##0; 1# # # : ED (x; y) #E S (x; y) ED (x; y) # 4 # (2m)e Following, for example, the proof of Corollary 7 (iii) in [5] suppose that ED (x; y) # E S (x; y) # ED (x; y) and consider the separate ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.


Generalization Error of Combined Classifiers - Mason, Bartlett, Golea (1997)   (Correct)

....hi(x NIi 1 ll Z llj(X) if hll : 1 j lljelj(X) if hl : O, 6) where ellj tplj for Xl; 1 and ellj tmij for Xl; 0. So the only thing varying over this class is the particular choices of the el;j (the classes they are chosen from k are fixed) and the l;j which are restricted so that l;j 6 [0, 1] and jl llj: 1. The probability (5) can in turn be bounded above by N=i N =1 Ii ,Ni. Xl i Pij ,nl i j ,00 B = PDm(h H ,Ol : ED(I)i 2E 2 c) s) Equation (7) follows from repeated application of the union bound over N, N , li, Nil, ll, Pij, nllj and 0o. The argument for the number of ....

....of a result due to Vapnik and Chervonenkis [17] LEMMA 5.2. Let .4 be a set of subsets of X x Y. Then for any e 0 P(3A A: Po(A) 2Ps(A) e) 4H.4(2ra)e m 8 where HA(m) denotes the growth function of the set .4, defined by HA(m) max l n A: A A I: c x x , ISl = m . Proof. in [1]) We begin with the following version of Vapnik s result (Theorem 2.1 Po (A) Ps (A) P A A: e 4AA(2ra)e 2 4. Following, for example, the proof of Corollary 7 (iii) in [2] suppose that PD (A) P , A) e and consider the separate cases in which PD(A) 4e 2 and PD(A) 4e 2. In either ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.


Data-Dependent Margin-Based Generalization Bounds for.. - Kégl, Linder, Lugosi (2001)   (Correct)

....margin error into account. Bartlett [6] gives a similar inequality with the only difference that his result involves worst case covering numbers. We omit the proof of this lemma as its proof is almost identical to that of [6, Theorem 6] The proof is based on Anthony and Shawe Taylor s proof [3] of the above mentioned inequality of Vapnik and Chervonenkis [17] Lemma 4. Using the notation of the proof of Theorem 1, for any 0 and e O, L(f) n(f) 4EAfoo( 2, W) Xln)e ne2 a. P sup Proof of Theorem 2. First observe that for any ( 0, r 3I:L(I) l a) f) e l a r sup ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.


New Zero-Error Bounds for Voting Algorithms. - Panchenko (2001)   (Correct)

....under permutations and increasing by inclusion. Theorem 5 If (1.6) and (1. 7) hold then for any t 0; P (9C 2 C(X n ) PC; P n C) t) 4G(2n) exp nt 4 : To prove this theorem one should repeat the standard proof of Vapnik s inequality for non random classes of sets (see [17] [1]) It is just a convenient observation that the symmetrization step of the proof can be carried out for random classes invariant under permutation, after one combines the training set with a ghost sample and utilizes (1.7) The following theorem is an analog of Theorem 3 in cases (1) 2) and (3) ....

Anthony, M. and Shawe-Taylor, J. (1993) A result of Vapnik with applications. Discrete Appl. Math. 47, no. 3, 207-217.


Generalization Error of Combined Classifiers - Mason, Bartlett, Golea (1997)   (Correct)

....of X Y . Then for any 0 P(9A 2 A : PD (A) 2PS (A) 4 A (2m)e m =8 where A (m) denotes the growth function of the set A, de ned by A (m) max fS A : A 2 Ag : S X Y; jSj = m . Proof. We begin with the following version of Vapnik s result (Theorem 2. 1 in [1]) P 9A 2 A : PD (A) PS (A) p PD (A) 4 A (2m)e m 2 =4 : Following, for example, the proof of Corollary 7 (iii) in [2] suppose that PD (A) PS (A) p PD (A) and consider the separate cases in which PD (A) 4 2 and PD (A) 4 2 . In either case, PD (A) 2PS (A) ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.


Learning Nested Differences in the Presence of Malicious Noise - Peter Auer (1997)   (1 citation)  (Correct)

....in S 0 which are misclassified by the algorithm s hypothesis and then apply the following result from PAC learning theory. Essentially the lemma states that with high probability a hypothesis which makes few mistakes on a noise free sample, is close to the target concept. Lemma12 (Adapted from [AST93]) Let C be any target concept and H any hypothesis class on a domain X, d = VC dim(H) Furthermore let D be any distribution on X, and choose ffl; ffi; ff 0. Then with probability at most ffi a sample of size m 0 8ffl ff 2 Gamma d ln 48ffl ff 2 ln 4 ffi Delta is drawn ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207--217, 1993.


Learning with a Slowly Changing Distribution - Peter Bartlett (1992)   (5 citations)  (Correct)

.... so Lemma 3 gives E P n 1 i (f) E P n 1 i 1 (f) n 1)fl: Inequality (17) follows immediately, using D(A) ED (1 A ) for any distribution D, where 1A is the indicator function for A (1 A (x) is 1 when x 2 A and 0 otherwise) The following Lemma is due to Anthony and ShaweTaylor ([AST90], Proposition 3.2) It improves on a similar result presented by Blumer et al. BEHW89] Theorem A3.1) Lemma 22 Define BD ; H; t; fi; ffl; d; ffi as in Theorem 20. For any distribution D on S, D t (BD (H; t; fi; ffl) ffi if t 1 fi 2 ffl(1 Gamma p ffl) 4 log 4 ffi 6d log 4 ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Technical Report CSD-TR-628, UCL, 1990.


Analysis of Data with Threshold Decision Lists - Martin Anthony December   Self-citation (Anthony)   (Correct)

....restricted to domain S. Note that H (m) 2 for all m. The function H is known as the growth function of H , and it measures how expressive the hypothesis class H is. The key probability results we employ are the following bounds, due to Vapnik and Chervonenkis [28] and Vapnik [27] see also [7, 4]) for any ; 2 (0; 1) Thus, we can obtain (probabilistic) bounds on the error er(f) of a (partial) extension from a class H when we know something about the growth function of H . 4.2 Growth function bounds We start with general threshold decision lists. We consider the the set of ....

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47, 1994: 207--217.


A Framework for Stuctural Risk Minimisation - Shawe-Taylor, Bartlett.. (1996)   Self-citation (Anthony Shawe-taylor)   (Correct)

....where H(q) is the entropy of the distribution q. Structural Risk Minimisation 4 3 Structural Risk Minimisation Using the framework established in the previous section we now wish to consider the possibility of errors on the training sample. We will make use of the following result of Vapnik [2] in a slightly improved version of Anthony and Shawe Taylor [2] Note that Pi H (m) refers to the growth function of a set of hypotheses H , that is the greatest number of dichotomies realisable on an m sample. Note also that the result is expressed in terms of the quantity Er z (h) which denotes ....

....Risk Minimisation 4 3 Structural Risk Minimisation Using the framework established in the previous section we now wish to consider the possibility of errors on the training sample. We will make use of the following result of Vapnik [2] in a slightly improved version of Anthony and Shawe Taylor [2]. Note that Pi H (m) refers to the growth function of a set of hypotheses H , that is the greatest number of dichotomies realisable on an m sample. Note also that the result is expressed in terms of the quantity Er z (h) which denotes the number of errors of the hypothesis h on the sample z, ....

Martin Anthony and John Shawe-Taylor, A Result of Vapnik with Applications, Discrete Applied Mathematics, 47 (1993) 207--217.


Cross-Validation for Binary Classification by Real-Valued.. - Anthony, Holden (1999)   (1 citation)  Self-citation (Anthony)   (Correct)

....can be made arbitrarily small by choosing a large enough n (which does not depend on P ) this will be explained below. A great deal of research in recent years has concentrated on extending results of this kind to deal with multiple class problems (see for example Anthony and Shawe Taylor [6]) and real valued functions. See for example Haussler [14] Pollard [22] Anthony [2] among others. However there is another potential direction for extending such results which to date has received much less emphasis, although it is no less interesting. Cheng and Titterington [11] have raised ....

....the training sequence is denoted by n throughout this paper. It should be noted that the realisable framework just described is a special case of the framework in which 6 the labelled examples are chosen according to a joint probability distribution P on Z = X Theta f Gamma1; 1g; see [10, 6]. In learning from examples we attempt to use z to select a hypothesis h : X f Gamma1; 1g from H. A learning algorithm or learner L is a technique that accomplishes this, such that h = L(z) Throughout the paper we denote by L(z) the hypothesis produced when L is applied to z. ....

[Article contains additional citation context not shown here]

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207--217, 1994. 38


Valid Generalisation from Approximate Interpolation - Anthony, Bartlett (1996)   (3 citations)  Self-citation (Anthony Shawe-taylor)   (Correct)

.... Gamma p ffl) 2 BdimH (j) ln 6 ffl ln 2 ffi : Furthermore, any sample length function must satisfy m 0 (j; ffl; ffi) max 1 Gamma ffl ffl log 1 ffi ; BdimH (j) Gamma 2 24ffl when ffi 1=6 and BdimH (j) 4. In some work on function learning, such as [17, 7, 11], a dimension known as the graph dimension has proven to be useful. The graph dimension of a class H of functions that map from X to a set Y is the VC dimension of the class ( x; y) 7 ( 1 if y = h(x) 0 otherwise : h 2 H ) It appears that this dimension is more useful for functions ....

Anthony, M. and Shawe-Taylor, J. (1993). A result of Vapnik with applications, Discrete Applied Mathematics, 47: 207--217.


Quantifying Generalization in Linearly Weighted Neural Networks - Anthony, Holden (1994)   (1 citation)  Self-citation (Anthony)   (Correct)

....over all sequences T k of k training examples, obtained by choosing each of the k examples independently at random from R n according to the probability distribution P . The result quoted here is based on a slight improvement on the original result of Vapnik; see Anthony and Shawe Taylor [4]. Now, by equation 16, if V(F) is nite then 4 F (k) is bounded above by a polynomial function of k and thus, since exp 2 k 4 decays exponentially in k, we can make the right hand side of equation 18 arbitrarily small by choosing k large enough. Furthermore, equation 18 provides a ....

....by Blumer et al. 7] and based on the work of Valiant [41] to relate network size to generalization ability. This work has recently been extended to a class of networks described in section 1 by Holden and Rayner [21] to networks with more than one output node by Anthony and Shawe Taylor [4], and to networks with real valued outputs by Haussler [18] In this section we give a brief introduction to the formalism. 2.3.1 Standard PAC learning Consider a neural network having a hypothesis space H. We de ne a concept class C in a similar manner as a set of subsets of R n . In general ....

[Article contains additional citation context not shown here]

M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications, Discrete Applied Mathematics), 47, 1993: 207-217.


Cross-Validation for Binary Classification By Real-Valued.. - Anthony, Holden (1998)   (1 citation)  Self-citation (Anthony)   (Correct)

....bound can be made arbitrarily small by choosing a large enough n (which does not depend on P ) this will be explained below. A great deal of research in recent years has concentrated on extending results of this kind to deal with multiple class problems (see for example Anthony and ShaweTaylor [7]) and real valued functions. See for example Haussler [14] Pollard [20] Anthony [2] among others. However there is another potential direction for extending such results which to date has received much less emphasis, although it is no less interesting. Cheng and Titterington [11] have raised ....

....context. The length of the training sequence is denoted by n throughout this paper. It should be noted that the realisable framework just described is a special case of the framework in which the labelled examples are chosen according to a joint probability distribution P on Z = X f1; 1g; see [10, 7]. In learning from examples we attempt to use x to select a hypothesis h : X f1; 1g from H. A learning algorithm or learner L is a function L : z H H that accomplishes this. A learning algorithm is consistent if for any z 2 z H , L produces a hypothesis h for which h(x i ) c(x i ) for 1 i ....

[Article contains additional citation context not shown here]

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1994.


Probabilistic `Generalization' of Functions and Dimension-based.. - Anthony (1999)   (1 citation)  Self-citation (Anthony)   (Correct)

....be extended by considering probability measures on X f0; 1g, rather than probability measures on X coupled with functions from X to f0; 1g. Any probability measure on X together with a function t : X f0; 1g can be represented in the obvious way by a probability measure P on X f0; 1g; see [8, 5]. In this more general model (which is discussed in [8, 10] for example) the error of h 2 H with respect to a probability measure P on X f0; 1g is taken to be er P (h) P (f(x; y) 2 X f0; 1g : h(x) 6= yg) 3 In this context, a learning algorithm takes as input a P m random sample s ....

....[8] prove a number of stronger assertions, among them that H PAC generalizes in the weaker sense of De nition 2.2 only if H has nite VC dimension. We remark that a number of di erent uniform convergence results along the lines of Theorem 3. 1 have been obtained, some of which, such as those in [21, 10, 5, 8], provide better bounds on the rate of uniform convergence. 4 Generalization of real functions We now discuss how the previous notions of generalization have been extended to classes of real valued functions. For simplicity, we shall assume, unless indicated otherwise, that our sets H of ....

Anthony, M. and J. Shawe-Taylor (1994). A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1994.


Confidence Estimates of Classification Accuracy on New Examples - Shawe-Taylor (1996)   Self-citation (Shawe-taylor)   (Correct)

.... j Gamma 2fl for all i; jfi : f(y i ) jgj ffl(m; n; k; ffi ) jfi : jf(y i ) Gamma j jgj ) ffi; where ffl(m; n; k; ffi ) 2 n (k log 8em(b Gammaa) kfl log 32m(b Gammaa) 2 fl 2 log 2 ffi ) and k = AFat(fl=4) Proof : Using the standard permutation argument (as in [2]) we may fix a sequence xy and bound the probability under the uniform distribution on swapping permutations Sigma that the permuted sequence satisfies the condition stated. Let A f (fl; r) denote the event, that the function f satisfies the bad conditions on the sample xy n jfj : f(x j ) ....

Martin Anthony and John Shawe-Taylor, "A Result of Vapnik with Applications, " Discrete Applied Mathematics, 47, 207--217, (1993).


A Framework for Structural Risk Minimisation - Shawe-Taylor, Bartlett   Self-citation (Anthony Shawe-taylor)   (Correct)

....of the distribution q. 3 Structural Risk Minimisation Using the framework established in the previous section we now wish to consider the possibility of errors on the training sample. We will make use of the following result of Vapnik in a slightly improved version of Anthony and Shawe Taylor [2]. Note also that the result is expressed in terms of the quantity Er z (h) which denotes the number of errors of the hypothesis h on the sample z, rather than the usual proportion of errors. Theorem 3.1 ( 2] Let 0 ffl 1 and 0 fl 1. Suppose H is an hypothesis space of functions from an ....

....the following result of Vapnik in a slightly improved version of Anthony and Shawe Taylor [2] Note also that the result is expressed in terms of the quantity Er z (h) which denotes the number of errors of the hypothesis h on the sample z, rather than the usual proportion of errors. Theorem 3. 1 ([2]) Let 0 ffl 1 and 0 fl 1. Suppose H is an hypothesis space of functions from an input space X to f0; 1g, and let be any probability measure on S = X Theta f0; 1g. Then the probability (with respect to m ) that for z 2 S m , there is some h 2 H such that er (h) ffl and Er z (h) ....

Martin Anthony and John Shawe-Taylor, "A Result of Vapnik with Applications, " Discrete Applied Mathematics, 47, 207--217, (1993).


Generalization Performance of Support Vector Machines and .. - Bartlett, Shawe-Taylor (1998)   (43 citations)  Self-citation (Shawe-taylor)   (Correct)

No context found.

Martin Anthony and John Shawe-Taylor, "A Result of Vapnik with Applications", Discrete Applied Mathematics, 47 207--217 (1993).


Valid Generalisation from Approximate Interpolation - Anthony, Bartlett, al. (1996)   (3 citations)  Self-citation (Anthony)   (Correct)

....is 1 ffl(1 Gamma p ffl) 2 BdimH (j) ln 6 ffl ln 2 ffi : Furthermore, any sample length function must satisfy m 0 (j; ffl; ffi) max 1 Gamma ffl ffl log 1 ffi ; BdimH (j) Gamma 2 24ffl when ffi 1=6 and BdimH (j) 4. In some work on function learning, such as [17, 7, 11], a dimension known as the graph dimension has proven to be useful. It appears that this dimension is more useful for functions taking values in a finite set, rather than in the reals, and there is some further evidence of this here. For, although it might seem that the banddimension is a ....

Anthony, M. and Shawe-Taylor, J. (1993). A result of Vapnik with applications, Discrete Applied Mathematics, 47: 207--217.


A Framework for Structural Risk Minimisation - Shawe-Taylor, Bartlett, al. (1996)   Self-citation (Shawe-taylor)   (Correct)

....of the distribution q. 3 Structural Risk Minimisation Using the framework established in the previous section we now wish to consider the possibility of errors on the training sample. We will make use of the following result of Vapnik in a slightly improved version of Anthony and Shawe Taylor [2]. Note also that the result is expressed in terms of the quantity Er z (h) which denotes the number of errors of the hypothesis h on the sample z, rather than the usual proportion of errors. Theorem 3 ( 2] Let 0 ffl 1 and 0 fl 1. Suppose H is an hypothesis space of functions from an ....

....the following result of Vapnik in a slightly improved version of Anthony and Shawe Taylor [2] Note also that the result is expressed in terms of the quantity Er z (h) which denotes the number of errors of the hypothesis h on the sample z, rather than the usual proportion of errors. Theorem 3 ([2]) Let 0 ffl 1 and 0 fl 1. Suppose H is an hypothesis space of functions from an input space X to f0; 1g, and let be any probability measure on S = X Theta f0; 1g. Then the probability (with respect to m ) that for z 2 S m , there is some h 2 H such that er (h) ffl and Er z ....

Martin Anthony and John Shawe-Taylor, "A Result of Vapnik with Applications," Discrete Applied Mathematics, 47, 207--217, (1993).


Detecting Change in Data Streams - Kifer, Ben-David, Gehrke (2004)   (Correct)

No context found.

M. Anthony and J. Shawe-Taylor. A result of vapnik with applications. Discrete Applied Mathematics, 47(2):207--217, 1993.


Detecting Change in Data Streams - Shai Ben-David Johannes   (Correct)

No context found.

M. Anthony and J. Shawe-Taylor. A result of vapnik with applications. Discrete Applied Mathematics, 47(2):207--217, 1993.


Bayesian Gaussian Process Models: PAC-Bayesian Generalisation.. - Seeger (2003)   (3 citations)  (Correct)

No context found.

M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47(2):207--217, 1993.


Non-Parametric Approach to Change Detection and Estimation.. - Ben-David, He, Tong (2004)   (Correct)

No context found.

M. Anthony and J. Shawe-Taylor, "A result of Vapnik with applications", in Discrete and Applied Mathematics, vol. 47(2), pp. 207-217, 1993.


o-Minimal Expansions of the Real Field: A.. - Karpinski, Macintyre (1997)   (Correct)

No context found.

M. Anthony and J. Shawe-Taylor, A Result of Vapnik with Applications, Discrete Applied Math. 47 (1993), pp. 207--217.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC