42 citations found. Retrieving documents...
Martin Anthony and Norman L. Biggs. Computational Learning Theory: An Introduction, Cambridge University Press, Cambridge, UK, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Quantum versus Classical Learnability - Servedio, Gortler (2000)   (1 citation)  (Correct)

....complexity, and we say that C is eciently exact learnable if there is a learning algorithm for C which runs in poly(n) time. 2. 2 Classical PAC Learning The PAC (Probably Approximately Correct) model of concept learning was introduced by Valiant in [28] and has since been extensively studied [3, 24]. In this model the learning algorithm has access to an example oracle EX(c;D) where c 2 C n is the unknown target concept and D is an unknown distribution over f0; 1g : The oracle EX(c;D) takes no inputs; when invoked, in one time step it returns a labeled example hx; c(x)i where x 2 f0; 1g ....

M. Anthony and N. Biggs. Computational Learning Theory: an Introduction. Cambridge Univ. Press, 1997.


Quantum versus Classical Learnability - Servedio, Gortler (2001)   (1 citation)  (Correct)

....complexity T (n) of a learning algorithm A for C is the maximum number of calls to MQ c which A ever makes for any c 2 Cn : 2.2. Classical PAC Learning The PAC (Probably Approximately Correct) model of concept learning was introduced by Valiant in [33] and has since been extensively studied [4, 27]. In this model the learning algorithm has access to an example oracle EX(c; D) where c 2 Cn is the unknown target concept and D is an unknown distribution over f0; 1g n : The oracle EX(c; D) takes no inputs; when invoked, in one time step it returns a labeled example hx; c(x)i where x 2 f0; 1g ....

M. Anthony and N. Biggs. Computational Learning Theory: an Introduction. Cambridge Univ. Press, 1997.


Some Topics in Neural Networks and Control - Sontag (1993)   (7 citations)  (Correct)

....Vapnik and Chervonenkis see the excellent book [63] and the interactions between statistics and computer science are the subject of much current research. The next few paragraphs introduce the basic ideas, using terminology from learning theory; for more details see for instance the textbook [3]. In the PAC paradigm, a learner has access to data given by a labeled sample S = u 1 , y 1 ) u s , y s ) The inputs u i are being generated at random, independently and identically distributed according to some probability measure P. There is some fixed but unknown function f so ....

Anthony, M., and N.L. Biggs, Computational Learning Theory: An Introduction, Cambridge U. Press, 1992.


The VC Dimension for Mixtures of Binary Classifiers - Jiang   (Correct)

....a scalar input, we show that the lower bound m is attained, in which case we obtain the exact result that the VC dimension is equal to the number of experts. 1 Introduction The Vapnik Chervonenkis (VC) dimension is a central concept for recent developments of computational learning theory (see Anthony and Biggs 1992). The VC dimension is a combinatorial parameter defined on a set of binary functions 1 or a system of classifiers, which shows the expressive power of the system. In risk minimization (Vapnik 1982, 1992 and 1998) the VC dimension provides a bound to the rate of uniform convergence of empirical ....

....uniform convergence of empirical risk to the actual risk. In the formalism of PAC (probably approximately correct) learning (Valiant 1984) the VC dimension is used for a number of purposes, including planning for the size of the training sample, and estimating the computational efficiency. See Anthony and Biggs (1992) for a review of the concept and its applications. There is a special issue of the Discrete Applied Mathematics (Volume 86, Number 1, 1998) dedicated to the study of VC dimension. As Shawe Taylor (1998) pointed out in the preface of this special issue: The VC dimension seems to have a knack of ....

[Article contains additional citation context not shown here]

Anthony, M. and Biggs, N. 1992. Computational Learning Theory: An Introduction. Cambridge University Press, Cambridge.


On Weak Base Hypotheses And Their Implications For Boosting.. - Jiang (2000)   (Correct)

....labels, where a perfect sample fit implied by the assumption is typically good for the prediction error also. An algorithm that fits the data perfectly is said to be consistent in the PAC framework, and is an important condition to prove a good performance in the prediction error [see, e.g. Anthony and Biggs (1992), Chapter 4] Another possible reason is that originally the inventors of AdaBoost may not have intended to let the algorithm run forever, but rather to truncate the process (which is a regularization method ) see Freund and Schapire (1997) For such an approach of boosting in the process the ....

....of a large deviation of Q, we get the following bound for its expectation: E(Q) p 2m Gamma1 log(2ejH c j) where jH c j is the number of distinct vectors f(x m 1 ) when f varies in H c . Apply the VC bound to this number and we get the proof. For the concept of the VC dimension, see, e.g. Anthony and Biggs (1992, Chapter 7) 2 In Section 6, we showed that a nonzero a span of the base hypothesis space implies an exponential reduction in the (reducible) training error. In fact, we will also show that the (reducible) training error is guaranteed to become exactly zero at some finite time for any data set, ....

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction. Cambridge University Press, Cambridge.


Process Consistency for AdaBoost - Jiang (2000)   (23 citations)  (Correct)

....possible multiple solutions) F t 1 of fits from the population version of AdaBoost, one has lim t 1 jjF t 1 Gamma FB jj L 2 (PX ) 0. II) Base Hypothesis Space] The VC (Vapnik Chervonenkis) dimension of H is finite, i.e. V C(H) 1. For the concept of the VC dimension, see, e.g. Anthony and Biggs (1992, Chapter 7) III) Population Coefficients] The coefficients in the population AdaBoost are all finite, i.e. jff s 1 j 1 for all s. PROCESS CONSISTENCY FOR ADABOOST 3 (IV) t Step Consistency ] Given any t and any sample realization S such that D t n;1 = sup f2H j Delta t Gamma1 n ....

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction. Cambridge University Press, Cambridge.


On Sparse Approximations To Randomized Strategies And Convex.. - Althöfer (1994)   (7 citations)  (Correct)

....that are merely convex combinations of already existing columns. E) The Approximation Lemma is not an isolated discovery. In several fields results of a similar flavour have been obtained, for instance: i) the VC dimension [VC 71] and its applications in computational learning theory [LLW 91] AB 92] and computational geometry [HW 87] ii) Monte Carlo approximations [KL 83] KLM 89] LV 91] iii) uniformity and irregularities (discrepancies) of set systems and matrices [BF 81] LSV 86] iv) stochastic information theory [Ahl 78] Ahl 79] On the other hand there are some statistics ....

Anthony, M. and Biggs, N. 1992. Computational Learning Theory: An Introduction. Cambridge, UK: Cambridge University Press.


Lower Bounds for Learning Discrete Distributions - Servedio   (Correct)

....for learning an unknown distribution work by memorizing the sample they are given. 2 Preliminaries In this section we describe the discrete distribution learning model introduced by Kearns et al. in [13] The reader who is familiar with the PAC model of concept learning as described in, e.g. [1, 14] will notice many similarities between the two models. Throughout this paper the size parameter n denotes an arbitrary positive integer. We write D n to denote a class of probability distributions over f0; 1g n : For D 2 D n and x 2 f0; 1g n we write D[x] to denote the probability weight ....

M. Anthony and N. Biggs. Computational Learning Theory: an Introduction. Cambridge Univ. Press, 1997.


Quantum versus Classical Learnability - Servedio, Gortler (2001)   (1 citation)  (Correct)

....complexity, and we say that C is efficiently exact learnable if there is a learning algorithm for C which runs in poly(n) time. 2. 2 Classical PAC Learning The PAC (Probably Approximately Correct) model of concept learning was introduced by Valiant in [28] and has since been extensively studied [3, 24]. In this model the learning algorithm has access to an example oracle EX(c;D) where c 2 C n is the unknown target concept and D is an unknown distribution over f0; 1g n : The oracle EX(c;D) takes no inputs; when invoked, in one time step it returns a labeled example hx; c(x)i where x 2 f0; 1g ....

M. Anthony and N. Biggs. Computational Learning Theory: an Introduction. Cambridge Univ. Press, 1997.


Neural Networks with Quadratic VC Dimension - Koiran, Sontag (1996)   (25 citations)  (Correct)

....according to the same probability distribution on X, one needs that the space of possible inputs be well sampled by the training data, so that f is an accurate fit. We omit the details of the formalization of PAC learning, since there are excellent references available, both in textbook (e.g. Anthony and Biggs (1992), Natarajan (1991) and survey paper (e.g. Maass (1994) form, and the concept is by now very well known. After the work of Vapnik (1982) in statistics and of Blumer et al. 1989) in computational learning theory, one knows that a certain combinatorial quantity, called the Vapnik Chervonenkis ....

M. Anthony and N.L. Biggs (1992) Computational Learning Theory: An Introduction, Cambridge U. Press.


PAC Learning of One-Dimensional Patterns - Goldberg, Goldman, Scott (1996)   (1 citation)  (Correct)

....the robot is near the given landmark. 3. Background In this paper we work within the PAC (probably approximately correct) model of computational learning, as introduced by Valiant (1984,85) Details of the model may be found in such textbooks as Kearns and Vazarani (1994) Natarajan (1991) or Anthony and Biggs (1992). We now review the basic definitions and results used here. 3.1. The PAC Learning Model In the PAC model, examples of a concept are made available to the learner according to an unknown probability distribution D, and the goal of a learning algorithm is to classify with high accuracy any further ....

Anthony, M., & Biggs, N. (1992). Computational Learning Theory: an Introduction, Cambridge University Press, 1992.


Decision Lists and Threshold Decision Lists - Martin Anthony December   Self-citation (Anthony)   (Correct)

....5 , 1) x 1 , 0) x 4 , 0) x 2 , 1) This is simpler, in the sense that it is a 1 decision list rather than a 2 decision list. It is easily verified that both decision lists are indeed extensions of the pdBf given by the sample. Correctness of the algorithm in general is easily established [19, 2]. Theorem 4.4 Suppose that K is a set of Boolean functions containing the identically 1 function, 1. Suppose that s is a sample of labelled elements of . If there is an extension in DL(K) of the partially defined Boolean function described by s, then the above algorithm will produce such an ....

Martin Anthony and Norman L. Biggs. Computational Learning Theory: An Introduction, Cambridge University Press, Cambridge, UK, 1992.


Valid Generalisation from Approximate Interpolation - Anthony, Bartlett (1996)   (3 citations)  Self-citation (Anthony)   (Correct)

.... and Definitions Much work has recently been carried out on probabilistic models of machine learning such as the probably approximately correct (or pac) model due to Valiant [26] In particular, the pac learning of f0; 1g valued functions (equivalently, sets) has been studied in great depth; see [12, 5, 18], for example. More recently, attention has been focussed on the extension of the pac model to classes of real valued functions; see, for example, 14, 1, 9] The problem studied in this paper is a problem in probability theory which is motivated by, and has applications to, the learnability of ....

.... all h 2 H; jh(x i ) Gammat(x i )j j; 1 i m) P(fx : jh(x) Gammat(x)j jg) ffl: Note that the sample length m 0 must be independent of t and P , depending only on j, ffl and ffi; thus the requirement is similar to that of the standard probably approximately correct (pac) learning model [12, 26, 5]. Another noticeable feature of this definition is the requirement that, with high probability, any j approximate interpolant of t on the sample is required to be a good approximation to t. Thus, the notion of valid generalisation from approximate interpolation is a generalisation of what has been ....

[Article contains additional citation context not shown here]

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction, Cambridge University Press.


PAC Learning and Artificial Neural Networks - Anthony, Biggs (1995)   Self-citation (Anthony Biggs)   (Correct)

....threshold units, each connected to all of n inputs, the outputs of these threshold networks then being combined together by a hard wired AND gate. Thus, the network outputs 1 if and only if all k threshold units output 1. Blum and Rivest (1988) proved (essentially) the following result. See also Anthony and Biggs (1992). Theorem 4 Let P k n be as described, where k 2. If there is a PAC learning algorithm for P k n which is efficient with respect to accuracy, example size and number of inputs then the RP 6= NP conjecture is false. Thus it is extremely unlikely that there is an efficient PAC learning ....

Anthony, M. and Biggs, N. , 1992, Computational Learning Theory: An Introduction, Cambridge, UK: Cambridge University Press.


A Sufficient Condition for Polynomial.. - Anthony, Shawe-Taylor (1997)   (2 citations)  Self-citation (Anthony)   (Correct)

....with respect to a probability distribution. We obtain a sucient condition for feasible (polynomially bounded) sample size bounds for distributionspeci c (solid) learnability. 1 1 Introduction There have been extensive studies of probabilitic models of machine learning; see the books [3, 11, 12], for example. In the standard PAC model of learning, the de nition of successful learning is distribution free . A number of researchers have examined learning where the probability distribution generating the examples is known; see [6, 5] for example. In this paper we seek conditions under ....

Martin Anthony and Norman Biggs, Computational Learning Theory: An Introduction, Cambridge University Press: Cambridge, UK, 1992.


The Vapnik-Chervonenkis Dimension of a Random Graph - Anthony, Brightwell, Cooper (1995)   (6 citations)  Self-citation (Anthony)   (Correct)

.... notion of the Vapnik Chervonenkis dimension of a set system, rst introduced in [11] The Vapnik Chervonenkis dimension has proved useful in a number of areas of mathematics and computer science; in probability theory [11, 10, 8] in computational geometry [7] and in the theory of machine learning [4, 2], for example. We start by presenting the necessary de nitions and making a few preliminary observations. Our main aim is to determine, for each positive integer d, the exact edge probabililty threshold function for a random graph G(n; p) to have VC dimension at least d: for large d, this turns ....

Martin Anthony and Norman Biggs, Computational Learning Theory: An Introduction, Cambridge University Press, Cambridge, UK, 1992.


Using the Perceptron Algorithm to Find Consistent Hypotheses - Anthony, Shawe-Taylor (1993)   (1 citation)  Self-citation (Anthony)   (Correct)

.... is known as the weight vector, and is known as the threshold. This class of functions is the set of functions computable by the simple boolean perceptron (see [8, 9, 6] and we shall denote it by BP n . 1 We now give a eeting description of the perceptron learning algorithm, and refer to [6, 1] for more details. For any learning constant 0, we have the perceptron learning algorithm L , devised by Rosenblatt [8, 9] which acts sequentially as follows. Let t be any function in BP n , which may be thought of as the target. The algorithm L maintains at each stage a current ....

M. Anthony and N. Biggs, Computational Learning Theory: An Introduction, Cambridge University Press: Cambridge, UK, 1992.


Function Learning from Interpolation - Anthony, Bartlett (1995)   (1 citation)  Self-citation (Anthony)   (Correct)

No context found.

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction, Cambridge University Press.


On Specifying Boolean Functions by Labelled Examples - Anthony, Brightwell.. (1992)   (9 citations)  Self-citation (Anthony)   (Correct)

No context found.

Martin Anthony and Norman Biggs, Computational Learning Theory: An Introduction, Cambridge University Press, Cambridge, UK, 1992.


Probabilistic `Generalization' of Functions and Dimension-based.. - Anthony (1999)   (1 citation)  Self-citation (Anthony)   (Correct)

....was placed on the computational complexity of learning algorithms, which is not something we shall address here. The main probabilistic tools which have become useful for the analysis of this model and its variants have their roots in the work of Vapnik and others (see [21, 23, 22] The books [4, 14, 16] contain general discussions of PAC learning. In its simplest form, the PAC model of learning may be described as follows. There is a set of examples X, and a target function t : X f0; 1g. It is known that t belongs to some set C of functions, but that is all that is known about it. There is ....

.... least 1 , a sample x 2 X m is such that the following holds: h 2 H and h(x i ) t(x i ) for i = 1; 2; m = er (h) fx 2 X : h(x) 6= t(x)g) It is clear that if H PAC generalizes, if C H, and if L is a consistent learning algorithm, then L is a PAC learning algorithm; see [8, 4]. More generally, when it is not the case that C H, or when C is unknown, it will be impossible to nd a consistent learning algorithm. To deal with this (and with other considerations, such as there being no well de ned target function) the model can be extended by considering probability ....

Anthony, M. and N. Biggs (1992). Computational Learning Theory: An Introduction. Cambridge Tracts in Theoretical Computer Science (30). Cambridge University Press, Cambridge, UK, 1992.


Classification by Polynomial Surfaces - Anthony (1993)   (3 citations)  Self-citation (Anthony)   (Correct)

....higher degree, there is a strict divergence. Having computed the VC dimensions, we obtain an indication of how large a set of test data should be used for valid further classi cation of previously unseen points (within the criteria of the probably approximately correct model of machine learning [24, 6, 1]) 2 De nitions and Notation We now introduce some notation which will prove useful. Let us denote the set f1; 2; ng by [n] We shall denote the set of all subsets of at most m objects from [n] by [n] m) and we shall denote by [n] m the set of all selections, in which repetition is ....

....dimension (or VC dimension) of H [25, 6] denoted VCdim(H) is de ned to be the largest integer k such that there is some subset of X 7 of cardinality k shattered by H. If no such largest k exists, we say that H has in nite Vapnik Chervonenkis dimension. It is well known (see [6, 1], for example) that the Vapnik Chervonenkis dimension of the set of homogeneous linear threshold functions is n. Indeed, we have the following useful characterisation of the shattered sets, a proof of which we include for completeness. Lemma 5 A subset T = fx 1 ; x 2 ; x k g of R n can ....

[Article contains additional citation context not shown here]

M. Anthony and N. Biggs, Computational Learning Theory: An Introduction, Cambridge University Press, Cambridge, UK, 1992.


Computational Learning Theory for Artificial Neural Networks - Anthony, Biggs (1993)   (2 citations)  Self-citation (Biggs Anthony)   (Correct)

....concept space C is a subset of H, then any consistent learning algorithm L for (C; H) is pac, with sample complexity mL ( K d ln 1 ln 1 ; for 0 ; 1. ut Values of K are easily obtained; see Blumer et al. 1989) Anthony, Biggs and ShaweTaylor (1990) and Anthony and Biggs (1992). This result provides an extension of the bound for nite spaces, mentioned at the beginning of the section, to spaces with nite VC dimension. Example The real perceptron P n has VC dimension n 1. Suppose that for any training sample for a hypothesis of P n , we can nd a state of the ....

....1) d(i) 1 , by Sauer s Lemma and since the VC dimension of H i is d(i) 1. It follows that H (m) 1 (m) 2 (m) z (m) z Y i=1 em d(i) 1 d(i) 1 : From this one can obtain the desired result. We omit the details here; these may be found in Baum and Haussler (1989) or Anthony and Biggs (1992). ut Theorem 12 The VC dimension of a feedforward linear threshold network with z computation nodes and a total of W variable weights and thresholds is at most 2W log (ez) Proof Let H be the hypothesis space of the network. By the above result, we have, for m W , H (m) zem=W ) W ; where W ....

Anthony and Biggs (1992): M. Anthony and N. Biggs, Computational Learning Theory: an Introduction, Cambridge University Press.


Interpolation and Learning in Artificial Neural Networks - Anthony (1996)   Self-citation (Anthony)   (Correct)

....by the network. A General Formulation 3 We shall consider neural networks N having one real output node, whose output lies in the interval [0; 1] In formulating the problems considered in this paper, we take as motivation the probably approximately correct (or pac ) model of learning [10, 4]. In this framework, the training sample is generated by choosing each x i independently at random from some fixed, but not necessarily known, probability distribution on the set X of all possible inputs. One is then able to formulate successful learning in a probabilistic manner, as in the ....

M. Anthony, and N. Biggs (1992). Computational Learning Theory: An Introduction, Cambridge University Press.


Function Learning from Interpolation - Anthony, Bartlett (1994)   (1 citation)  Self-citation (Anthony)   (Correct)

....2P 2m (R) To prove this claim, we first note that if h 2 B, then, for m 8=ffl, by Chebyshev s inequality (for example) the probability that y 2 X m satisfies jfi : jh(y i ) Gamma t(y i )j j flgj fflm=2 is at most 1=2. The characteristic function R of R may be expressed as follows [4]: R (xy) Q (x)OE x (y) where Q is the characteristic function of Q and the f0; 1g valued function OE x is such that OE x (y) 1 if and only if there is h 2 B such that jh(x i ) Gamma t(x i )j j for 1 i m and such that jfi : jh(y i ) Gamma t(y i )j j flgj fflm=2. Then P 2m (R) ....

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction, Cambridge University Press.


Valid Generalisation from Approximate Interpolation - Anthony, Bartlett, al. (1996)   (3 citations)  Self-citation (Anthony)   (Correct)

.... and Definitions Much work has recently been carried out on probabilistic models of machine learning such as the probably approximately correct (or pac) model due to Valiant [26] In particular, the pac learning of f0; 1g valued functions (equivalently, sets) has been studied in great depth; see [12, 5, 18], for example. More recently, attention has been focussed on the extension of the pac model to classes of real valued functions; see, for example, 14, 1, 9] The problem studied in this paper is a problem in probability theory which is motivated by, and has applications to, the learnability of ....

.... h 2 H; jh(x i ) Gamma t(x i )j j; 1 i m) P(fx : jh(x) Gamma t(x)j jg) ffl: Note that the sample length m 0 must be independent of t and P, depending only on j, ffl and ffi; thus the requirement is similar to that of the standard probably approximately correct (pac) learning model [12, 26, 5]. Another noticeable feature of this definition is the requirement that, with high probability, any j approximate interpolant of t on the sample is required to be a good approximation to t. Thus, the notion of valid generalisation from approximate interpolation is a generalisation of what has ....

[Article contains additional citation context not shown here]

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction, Cambridge University Press.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC