| M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on the Foundations of Computer Science, pages 382391, Los Alimatos, CA, 1990. IEEE Computer Society Press. |
....of VC dimension through a generalized Sauer s lemma. Our capacity measure, the M fat shattering dimension, can be seen either as an extension of the fat shattering dimension to the multivariate case, or as a scale sensitive version of the graph dimension. De nition 8 (Fat shattering dimension [8]) Let H be a set of real valued functions on a set X . For 0, a subset s m = fx i g, 1 i m) of X is said to be shattered by H if there is a vector v b = b i ] 2 R such that, for each binary vector v y = y i ] 2 f 1; 1g , there is a function h y 2 H satisfying (h y (x i ) b i ) ....
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464497, 1994.
....a normal distribution. It shows the tradeo# between the VCdimension and the random projection terms in the bound the margin of the data is used to derive bounds on the generalization error. Their bound depends on the fatshattering function, afat, a generalization of the VC dimension, introduced in [KS94]. For a sample of size m the bound is given by: f log 2 (32m) log 2 f log 2 , 8) where f = afat(# 8) and # the minimum margin of data points in the sample. For linear functions this is bounded by (BR #) where B is the norm of the classifier and R is the maximal norm of ....
M. Kearns and R. Schapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48:464--497, 1994.
....discrimination. It is for instance involved in the analyse of the convergence of the perceptron algorithm (Noviko s theorem, see [40] for a proof and a discussion) A rst quantity connecting the notion of margin and the rate of convergence of the empirical risk is the fat shattering dimension [23, 24]. In order to introduce this capacity measure, we must rst give preliminary de nitions. De nition 1 (Growth function) Let G be a set of indicator functions (functions taking two values) on X. Let s m = fx 1 ; xm g X m and let G (s m ) be the number of di erent dichotomies on s m ....
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464497, 1994.
....discrimination. It is for instance involved in the analyse of the convergence of the perceptron algorithm (Noviko s theorem, see [40] for a proof and a discussion) A rst quantity connecting the notion of margin and the rate of convergence of the empirical risk is the fat shattering dimension [23, 24]. In order to introduce this capacity measure, we must rst give preliminary de nitions. De nition 1 (Growth function) Let G be a set of indicator functions (functions taking two values) on X. Let s m = fx 1 ; xm g X m and let G (s m ) be the number of di erent dichotomies on s m ....
....d V C of vectors that can be shattered, i.e. separated into two classes in all 2 dV C possible ways, using functions of H. Thus: d V C = max fm : G (m) 2 m g If this maximum does not exist, the VC dimension is equal to in nity. The fat shattering dimension is de ned as follows (cf. [23, 35, 3, 6, 36]) De nition 3 (Fat shattering dimension) Let G be a set of real valued functions, and a positive real. A set of points s = fx i g, 1 i m) is said to be shattered by G if there is a vector b = b i ] 2 R m such that, for all binary vector y = y i ] 2 f1; 1g m , there is a function ....
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, volume 1, pages 382391. IEEE Computer Society Press, 1990.
.... 8 : fat (m; h; z; fat (m; z (h) 2 m 0 log 2 0 8em dH z 8 dH( z 8 ) 1 A log 2 (32m) log 2 2m 1 A ; 3) where dH ( is known as the fat shattering dimension of the hypothesis space H at the observed scale (see (Shawe Taylor et al. 1998; Kearns Schapire, 1993) for details) The function dH : R N is always monotonically non increasing and is a straightforward generalisation of the VC dimension to sets of real valued functions. An immediate consequence of this result is that the bound on the generalisation error R [h] depends inversely on the ....
Kearns, M. J., & Schapire, R. (1993). Ecient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48 (2), 464497.
....risk minimisation [12] was suggested for directly minimising the VC dimension based on a training set and an a priori structuring of the hypothesis space. In practice, e.g. in the case of linear classi ers, often a thresholded real valued func tion is used for classi cation. In 1993, Kearns [4] demonstrated that considerably tighter bounds can be obtained by considering a scale sensitive complexity measure known as the fat shattering dimension. Further results [1] provided bounds on the Growth function similar to those proved by Vapnik and others [14, 6] The popularity of the theory ....
M. J. Kearns and R. Schapire. Ecient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(2):464497, 1993.
....from the expected one . How fast does the expected error decrease as the number of examples increases In the first part of the thesis, chapter 2, this framework is formally presented. Moreover, a technical extension of the standard SLT of Vapnik, based on work in the PAC learning community [ Kearns and Shapire, 1994, Alon et al. 1993, Valiant, 1984 ] is presented. This extended SLT will be used in chapter 3 to theoretically justify and analyze a class of learning machines, namely kernel learning machines. Learning using Kernel Machines A number of machines can be developed in the aforementioned ....
....convergence and the V # dimension As mentioned above finiteness of the VC dimension is not a necessary condition for uniform convergence in the case of real valued functions. To get a necessary condition we need a slight extension of the VC dimension that has been developed (among others) in [ Kearns and Shapire, 1994, Alon et al. 1993 ] known as the V # dimension 4 . Here we summarize the main results of that theory that we will also use later on to design regression machines for which we will have distribution independent uniform convergence. Definition 2.1.7 Let A # V (y, f(x) # B, f # F , ....
M. Kearns and R.E. Shapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
....of Real Valued Geometric Patterns 5 Section 6.2. Section 7 looks at alternate aggregation functions. Finally, Section 8 summarizes other results and future directions of this work. 2. A Real Valued Multiple Instance On line Model We apply the on line agnostic learning model (Haussler, 1992; Kearns et al. 1994) to the multiple instance setting. Let S = #(x 1 , y 1 ) x t , y t )# be the sequence of trials. Each multiple instance example x i is a set of m i elements from domain X . While our model is well defined for any domain X , throughout the rest of this paper we will assume that ....
....the minimum number of mistakes (or has the minimum loss) In the agnostic on line learning algorithm, the learning algorithm s performance is compared with the performance of the best hypothesis from the touchstone class. On the surface, our model has many similarities with the p concepts model (Kearns and Schapire, 1994). A p concept c over the domain X is a mapping c : X # [0, 1] For each x # X , c(x) is interpreted as the probability that x is a positive example of the p concept c. A p concepts algorithm (to find a good model of probability) must infer a hypothesis h : X # [0, 1] that is a good ....
[Article contains additional citation context not shown here]
Kearns, M. J. and R. E. Schapire: 1994, E#cient distribution-free learning of probabilistic concepts, Vol. I: Constraints and Prospects, Chapt. 10, pp. 289--329. MIT Press. Earlier version appeared in FOCS90.
....overfitting) The bounds involve the number of examples # and the capacity h of the function space, a quantity measuring the complexity of the space. Appropriate capacity quantities are defined in the theory, the most popular one being the VC dimension [16] or scale sensitive versions of it [9], 1] The bounds have the following general form: with probability at least # I [f ] I emp [f ] #( r h # , #) 2) where h is the capacity, and # an increasing function of h # and #. For more information and for exact forms of function # we refer the reader to [16] 15] 1] ....
M. Kearns and R.E. Shapire. E#cient distributionfree learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
.... ERM in a hypothesis space (4) consistency is shown to be related with uniform convergence in probability [11] and necessary and su#cient conditions for uniform convergence are given in terms of the V # dimension (also known as level fat shattering dimension) of the hypothesis space considered [1, 8], which is a measure of complexity of the space. In statistical learning theory typically the measure of complexity used is the VC dimension. However, as we show below, the VC dimension in the above learning setting in the case of infinite dimensional RKHS is infinite both for L p and L # , so it ....
M. Kearns and R.E. Shapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
....ICA and BSS both the matrix A and the 28 sources x are unknown, and we assume that x i (t) are statistically independent, while we don t have any explicit restriction on A. Various methods for ICA have been developed in recent years [3, 9, 61, 51, 63] A review of the methods can be found in [50]. Typically the problem is solved by assuming a probability distribution model for the sources x i (t) A typical prior distribution is the Laplacian, namely P (x(t) # e x1 (t) xn (t) Moreover, if the noise # is Gaussian with zero mean and variance # 2 , then, for a given A, the ....
M. Kearns and R.E. Shapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
....convergence and the V # dimension As mentioned above finiteness of the VC dimension is not a necessary condition for uniform convergence in the case of real valued functions. To get a necessary condition we need a slight extension of the VC dimension that has been developed (among others) in [50,2], known as the V # dimension 11 . Here we summarize the main results of that theory that we will also use later on to design regression machines for which we will have distribution independent uniform convergence. Definition 2.8. Let A # V (y, f(x) # B, f # F , with A and B #. The ....
M. Kearns and R.E. Shapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
.... can be converted to one with random attribute noise by inserting random values for the missing components (although note that for k RFA data, with small k, the associated noise rate would be quite high) Finally, observe that there is a similarity to the probabilistic concepts framework of [29] in that, given a stochastic missing data mechanism, we have observations of a mapping from an input domain consisting of partially observed vectors to outputs whose values are conditional distributions over f0; 1g conditioned on the observed inputs. The di erence is that we do not just want to ....
....we do not just want to model the conditional distribution of outputs given any input, we also want an underlying deterministic function to be well approximated by our (deterministic) hypothesis. In this paper we make use of the quadratic loss function of an observation and hypothesis, as de ned in [29]. 1.2 Formalization of the Learning Problem We are interested in algorithms for probably approximately correct (PAC) learning as introduced by Valiant in [38, 39] Here we give the basic de nitions and introduce some notation. An algorithm has access to a source of observations of a target ....
[Article contains additional citation context not shown here]
M.J. Kearns and R.E. Schapire (1994). EĈcient Distribution-free Learning of Probabilistic Concepts, Journal of Computer and System Sciences, 48(3) 464-497. (see also FOCS '90)
....some alternative techniques for controlling overfitting which albeit conceptually similar to pruning act on the complexity of the node classifiers rather than on the complexity of the overall tree. We begin with the definition of the fat shattering dimension, which was first introduced in [15], and has been used for several problems in learning since [1, 4, 2, 3] Definition 3.1 Let F be a set of real valued functions. We say that a set of points X is # shattered by F relative to r = r x ) x#X if there are real numbers r x indexed by x # X such that for all binary vectors b ....
Kearns M. & Schapire, R. (1990). E#cient Distribution-free Learning of Probabilistic Concepts, pages 382--391 in Proceedings of the 31st Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA.
No context found.
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on the Foundations of Computer Science, pages 382391, Los Alimatos, CA, 1990. IEEE Computer Society Press.
No context found.
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464497, 1994.
No context found.
M.J. Kearns and R.E. Schapire. Ecient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, volume 1, pages 382391. IEEE Computer Society Press, 1990.
No context found.
M. Kearns and R.E. Shapire. E#cient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464--497, 1994.
No context found.
Michael J. Kearns and Robert Schapire, Ecient distribution-free learning of probabilistic concepts, Journal of Computer and System Sciences, vol. 48, no. 2, pp. 464497, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC