| M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464--497, 1994. |
....of at least . The quantity in inequality (3) measures the capacity of the space in which the empirical risk is minimized, named hypothesis space. Appropriate capacity measures are defined in the theory, the most popular one being the VC dimension [10] or scale sensitive versions of it [11], 12] For more details and examples of exact forms of , we refer the reader to [10] 4] and [12] Intuitively, if the capacity of the hypothesis space is very large and the number of examples is small, then the distance between the empirical and expected risk can be large and overfitting is ....
M. Kearns and R. Shapire, "Efficient distribution-free learning of probabilistic concepts," J. Comput. Syst. Sci., vol. 48, pp. 464--497, 1994.
....data, we assume the following model: The learner receives a finite set of positive only examples of some concept drawn according to some probability distribution. From this data, the learner is required to come up with a procedure that, given any unclassified instance, returns a confidence value [SS98, KS90] in the range ### ## that the given instance is in the concept. To simplify our treatment we assume that the instance space is # dimensional and the domain of each attribute is the real interval ### ##. We also assume that the examples are drawn from a probability distribution function, # , which ....
Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. In Proceedings of the Third Annual Workshop on Computational Learning Theory, page 389, 1990.
....[3] was suggested for directly minimising the VC dimension based on a training sample and an a priori structuring of the hypothesis space. In practice, for example in the case of linear classifiers, often a thresholded real valued function is used for classification. In 1993, Kearns and Schapire [5] demonstrated that considerably tighter bounds can be obtained by considering a scale Ralf Herbrich is with Microsoft Research Cambridge, 7 J J Thomson Avenue, Cambridge CB3 0FB, United Kingdom. Email: rherb microsoft.com. Thore Graepel works at the Institute of Computational Science, ....
M. J. Kearns and R. E. Schapire, "Efficient distribution-free learning of probabilistic concepts," Journal of Computer and System Sciences, vol. 48, no. 3, pp. 464--497, 1994.
....2d n is defined for any 0 by fat: x ( max m: r shatters a subsequence of length m of x . Note that for X = Xl, Xn) fat,x ( is a random quantity whose value depends on the data. The (worst case) fat shattering dimension fat: n( sup was used by Kearns and Schapire [11], Alon et al. 1] Shawe Taylor et al. 15] and Bartlett [6] to derive useful bounds. In particular, Anthony and Bartlett [2] show that if d = fat: n( 8) then for any 0 5 1 2, with probability at least 1 5, all f r satisfies Z(f) n(f) 2.829(dlog2(3)ln(128n) 2.829 ln. 2) Throughout ....
M. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer Systems Sciences, 48:464-497, 1994.
....domain Z = 1, 1 . The input space has an unknown underlying distribution denoted by D. The notation pD(x) is used for the probability of observing vector value x under the distribution D. The class of stochastic perceptrons can be embedded into the class of probabilistic concepts (p concepts) [25]. A p concept consists of a function c : Z [0, 1] and probabilistic device which generates an output of y = 1 with probability c(x) for input x. 7.2. PAC Learning Criterion For each classification of an input space with underlying distribution D there exists a p concept called target ....
M.J. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48:464-497, 1994.
....is a function class computed by a folding architecture. It has been shown that finiteness of the VC dimension if F is a concept class or finiteness of the fat shattering dimension if F is a function class, respectively, is even necessary for F to possess the distribution independent UCED property [1, 18]. In general, only the class of so called loss functions which is correlated to F has a finite fat shattering dimension if F possesses the UCED property. However, if the constant function 0 is contained in F the class of loss functions contains F itself, such that F has a finite fat shattering ....
M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 1994.
....are equivalent. Using this observation, Valiant constructs an efficient PAC learning algorithm, with the aid of a membership query oracle, for learning monotone DNF under the universal interpretation. Some work that has similar motivations is the p concepts model of Kearns and Schapire [30]. In the p concepts model (when applied to the boolean domain) the learner is given a total example from f0; 1g , yet there is some probabilistic process (or possibly something that appears probabilistic due to the learner being unaware of some important attributes) that determines whether ....
M. Kearns and R. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464--497, 1994. 24
....m p : P m i=1 jx i j p ) 1=p for p 1 and kxk m 1 : max i=1; m jx i j for p = 1. When m = 1, we simply write p and elements x of p are infinite sequences (x 1 ; x 2 ; with finite kxk p . We give the definition of the fat shattering dimension, which was introduced in [12], and has been used for several problems in learning since [1] Definition 2.1 Let F be a set of real valued functions. We say that a set of points x is fl shattered by F relative to r = r x ) x2X if there are real numbers r x indexed by x 2 x such that for all binary vectors b indexed by x, ....
M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. In Proc. of the 31st Symposium on the Foundations of Comp. Sci., pages 382--391. IEEE Computer Society Press, Los Alamitos, CA, 1990. REFERENCES 26
....(see for example [14] The most important scale sensitive dimension which has been used to date in the development of the theory of learning real valued functions is the fat shattering dimension. This is a scale sensitive version of the pseudo dimensionand was introduced by Kearns and Schapire [21]. Suppose that H is a set of functions from X to [ Gamma1; 1] and that fl 2 (0; 1) We say that a finite subset S = fx 1 ; x 2 ; x d g of X is fl shattered if there is r = r 1 ; r 2 ; r d ) 2 R d such that for every b = b 1 ; b 2 ; b d ) 2 f0; ....
M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on the Foundations of Computer Science, pages 382--391. IEEE Computer Society Press, Los Alamitos, CA, 1990.
....or coin rules. A coin rule is any function F : X [0; 1] where F (x) is interpreted as the probability that the boolean hypothesis defined by the coin rule takes value 1 on x. Coin rules are formally equivalent to p concepts, whose learnability has been investigated by Kearns and Schapire in [5]. However, here we focus on a completely different problem, i.e. the malicious PAC learning of boolean functions using p concepts as hypotheses. If a learner uses coin rules as hypotheses, then the PAC learning criterion D(C 6= H) where H is the learner s hypothesis, is replaced by E xD jF ....
Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994. An extended abstract appeared in the Proceedings of the 30th Annual Symposium on the Foundations of Computer Science.
....randomness (or noise phenomena) in nature to the effect of some invisible deterministic variables on the observable environment. As such, models concerning noisy information like that of Kearns and Li [18] as well as those dealing with probabilistic concepts like the Kearns Schapire p concepts [19], may be viewed as special cases of our scenario. Another related model is the model of switching concepts introduced by Blum and Chalasani [10] In that model a probabilistic behavior of the unknown concept is generated by switching between several different (deterministic) concepts. By ....
....which is induced by the underlying distribution on the instance space, and by the (deterministic) target function. Obviously, in such a case, the best that the learner can do is learning this probabilistic behavior. A learning model for such scenarios was presented by Kearns and Schapire in [19]. In their model, a concept is not a binary function over the instance space X, but rather a real function f : X [0; 1] An example (x; y) is generated by drawing a random x 2 X using the underlying distribution on X, and letting y = 1 with a probability of f(x) The model is known in the ....
[Article contains additional citation context not shown here]
Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pages 382--391, 1990.
....on the number of samples and the dimensionality of the data. 10, 12] have explored a new direction in which the margin of the data is used to derive bounds on the generalization error. Their bound depends on the fat shattering function, afat, a generalization of the VC dimension, introduced in [6]. For sample size m the bound is given by: 2 m f log 2 (32m) log 2 8em f log 2 8m ; 10) where f = afat( 8) for the minimum margin of data points in the sample. For the linear functions this is bounded by (BR= 2 , where B gives the norm of the classifier and R is the ....
M. Kearns and R. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48:464--497, 1994.
....subset of X) If c(x) 1 then the instance x is labeled 1 deterministically. The instances x are drawn according to some probability distribution D over X that remains unknown. P concepts: Another special case of (ffl; ffi ) learning is the model of p concepts, introduced by Kearns and Schapire [KS90]. In the model of p concepts, Y = f0; 1g, and A = 0; 1] The collection P of distributions over X Theta Y is replaced by a p concept class C. A p concept is a function c : X [0; 1] stating the conditional probability P (y = 1jx) of the label y being 4 given the instance x which is drawn ....
....1 ; xm ) Thus, we have contradicted the assumption, and there is no such m. 2 Note that theorem 4.2 can do with the weaker BC criterion. Corollary 4.2 In the PAC model, Learning In the Limit is equivalent to the countability of the concept class. Corollary 4. 3 In the model of p concepts [KS90], Learning In the Limit is equivalent to the countability of the p concept class. Theorem 4.2 relies on the assumption that R is an equivalence relation over H. We believe it is possible to omit this assumption. Conjecture 4.1 Let (X; Y; A; H; P; l) be a problem in which Y is a finite outcome ....
[Article contains additional citation context not shown here]
M.J. Kearns and R.E? Schapire. Efficient Distribution-free Learning of Probabilistic Concepts. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, pp. 382-391, 1990.
....where ffl(m; i; ffi) 2 m i 1 log 4 p i ffi 4j(m; L(x; h) p i ffi=4) log 4m: To illustrate the use of this theorem we will quote a result from [26] but we first need the following further definitions. The main one is the fat shattering dimension, which was first introduced in [19], and has been used for several problems in learning since [1, 5, 2, 4] Definition 2.3 Let F be a set of real valued functions. We say that a set of points X is fl shattered by F relative to r = r x ) x2X if there are real numbers BACKGROUND RESULTS 6 r x indexed by x 2 X such that for all ....
Michael J. Kearns and Robert E. Schapire, "Efficient Distribution-free Learning of Probabilistic Concepts," pages 382--391 in Proceedings of the 31st Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA, 1990.
....class of probabilistic concepts, and then applying a lower bound of Simon [S93] on the sample size of p concepts learning. For the relevant definitions of p concept learning and the related notions of fl shattering and the dimension dC (fl) we refer the reader to the papers of Kearns and Schapire [KS90] and of Simon [S93] Proof: Given a class C of sets over a domain (X ; B; and a parameter 0 fl 0:5, we define a class of distributions, D C = fD t : t 2 Cg. Each distribution D t is defined by setting its density function (w.r.t. to be d t (x) 0:5 fl for every x 2 t and d t (x) 0:5 ....
Kearns, M., and R.E. Schapire, "Efficient Distribution-Free Learning of Probabilistic Concepts" Proc. of 31st FOCS 90, pp. 382-392. Full version will appear in JCSS.
.... holds: with P m probability at least 1 Gamma ffi a sample x is such that P (fx : jh L (x) Gamma t(x)j jg) ffl, where h L = L(j; x(t) The criterion P (fx 2 X : jh(x) Gamma t(x)j jg) ffl is similar to the definition of a good model of probability introduced by Kearns and Schapire [16] in their work on p concepts, defined as functions from X to [0; 1] However, the problem they consider is quite different since, in learning a good model of probability of a p concept as discussed in their work, one is given examples which are labelled 0 or 1 with certain probabilities, rather ....
....a class H of real valued functions is a scale sensitive extension of the VC dimension. This means that the band dimension is not simply one number depending on H, but is, rather, a function depending on H. A number of such scale sensitive dimensions have proven to be useful in learning theory [16, 1, 9, 23, 24]. Let H be a set of real valued functions. Given any fl 2 , let us say that the finite subset T = f(x 1 ; y 1 ) x 2 ; y 2 ) x d ; y d )g of X Theta is fl band shattered by H if for every b = b 1 ; b 2 ; b d ) 2 f0; 1g d , there is a function h b 2 H with jh b (x i ....
[Article contains additional citation context not shown here]
Kearns, M.J. and Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 1990 IEEE Symposium on Foundations of Computer Science, IEEE Press.
....with unrestricted weights (and on unrestricted real inputs) has finite pseudo dimension. 14 Scale Sensitive Dimensions 14.1 Learnability of p concepts An interesting variant of the PAC model which has received attention recently is that of p concept learning , introduced by Kearns and Schapire [64]. Much attention has been on learning a good model of probability of a p concept and it is this problem we shall discuss here. A p concept (or probabilistic concept) is a function t from X to the interval [0; 1] The value t(x) is meant to represent a probability. A motivating example given by ....
....the pseudo dimension is the limit of the fl dimension: pdim(H) lim fl 0 dim fl (H) It is possible, as in the case of ND, for dim fl (H) to be finite for all fl but for pdim(H) to be infinite. Alon et al. 1] proved the following result. The necessity was proved earlier by Kearns and Schapire [64]. Theorem 14.3 Let H be a class of p concepts. Then H is learnable (in the p concept model) if and only if dim fl (H) is finite for all fl 0. The algorithm used to prove sufficiency in the above theorem is, like that of Theorem 12.3, one which outputs a hypothesis which has near minimal ....
[Article contains additional citation context not shown here]
M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. In Proc. of the 31st Symposium on the Foundations of Comp. Sci., pages 382--391. IEEE Computer Society Press, Los Alamitos, CA, 1990.
....of a multidimensional monotone regression model is, for instance, price of a house as an increasing function of the size and decreasing function of the distance from downtown. The literature on regression models with monotone properties is not comprehensive. Kearns and Schapire considered in [3] a one dimensional monotone model, in which the output variables take values in f0; 1g The regression function f was called monotone probabilistic concept. The statistical regression used in [2] were based on Radial Basis Functions, Multilayer Perceptron and Projection Pursuit, and did not ....
M. Kearns and R. Schapire. Efficient distributionfree learning of probabilistic concepts. 31st Annual Symposium on Foundations of Computer Science, 382-391, 1990.
....community to study biclass discrimination. For instance, it has been involved in the analyse of the convergence of the perceptron algorithm (Novikoff s theorem, see [22] for a proof and a discussion) This notion is also related to the fat shattering dimension, which has been introduced in [13], and is of central importance to study the statistical properties of real valued models. This dimension is defined as follows (cf. 20] Definition 1 (Fat shattering Dimension) Let F be a set of real valued functions. A set of points X is said to be fl shattered by F if there are real numbers r ....
M.J. Kearns and R.E. Shapire. Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposiumon the Foundations of Computer Science, 1990. IEEE Computer Society Press.
....of the set of functions it computes. The pseudo dimension is a well understood and useful measure of expressive power. If H is a vector space then its pseudo dimension is equal to its its linear dimension; see [7] The fat shattering function was introduced by Kearns and Schapire [8] and is related to the pseudo dimension. It has proved to be very useful in the learning theory of real functions [5] and in probability theory [1] Suppose that H is a set of functions from X to R and that ff 0. We say that a finite subset S = fx 1 ; x 2 ; x d g of X is ff shattered if ....
M.J. Kearns and R.E. Schapire (1990). Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 1990 IEEE Symposium on Foundations of Computer Science, IEEE Press.
.... from the SauerShelah lemma [10] for the biclass case, ffl the growth function and derived notions of graph dimension or Natarajan dimension [8] for the multiclass case, ffl the covering numbers [9] which are very general purpose, e.g. they appear in the context of the fat shattering dimension [7] as well as in functional analysis approaches [17] Since we are interested in the last case, we give below a definition of covering numbers and of a derived useful measure. Definition 1 (Covering number) Let (E; ae) be a pseudo metric space, and B(x; r) the closed ball in E with radius r and ....
M.J. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464--497, 1994.
.... is a with probability O 0 (q; a) This allows us to model the situation where, for a particular set of observations, different repairs are appropriate at different times; this could happen, for example, if the correct repair depends on some unobserved variables as well as the observations; see [KS90]. Notice here that err(T; q) 1 Gamma O 0 (q; T(q) and that our deterministic oracle is a special case of this, where O 0 (q; a q ) 1 for a single a q 2 A and O 0 (q; a) 0 for all a 6= a q . To handle predicate calculus expressions, we may have to consider answers of the form fYes[ ....
Michael Kearns and Robert E. Shapire. Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on Foundation of Computer Science, October 1990.
No context found.
Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. In 31st Annual Symposium on Foundations of Computer Science, pages 382--391, October 1990. To appear, Journal of Computer and System Sciences.
No context found.
M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464--497, 1994.
No context found.
Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. In 31st Annual Symposium on Foundations of Computer Science, pages 382--391, October 1990. To appear, Journal of Computer and System Sciences.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC