| A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Inf. and Comput, 82:246--261, 1989. |
....functions. We say that a decision list defined on f0; 1g is a 1 decision list if the Boolean function in each test is given by a single literal. So, for each i, there is some l i such that either f i (y) 1 if and only if y l i = 1, or f i (y) 1 if and only if y l i = 0. Then, it is known [13] (see also [6, 2] that any 1 decision list is a threshold function. In an easy analogue of this, any threshold decision list is a threshold function of threshold functions [3] But a threshold function of threshold functions is nothing more than a two layer threshold network, one of the simplest ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82, 1989: 247-- 261.
....two versions. The VC dimension based version of Occam s razor theorem (Theorem 3.1. 1 of [3] gives the following upper bound on sample complexity: For a hypothesis space H with V Cdim(H) d, 1 d #, d log log ) 1) The following lower bound was proved by Ehrenfeucht et al. [6]. m(H, #, #) max( 32# ) 2) The upper bound in (1) and the lower bound in (2) di#er by a factor #(log ) It was shown in [8] that this factor is, in a sense, unavoidable. When H is finite, one can directly obtain the following bound on sample complexity for a consistent ....
A. Ehrenfeucht, D. Haussler, M. Kearns, L. Valiant. A general lower bound on the number of examples needed for learning. Inform. Computation, 82(1989), 247-261.
....[w, #] The vector w is known as the weight vector, and # is known as the threshold. We denote the class of threshold functions on by T n . Note that any t T n will satisfy t [w, #] for ranges of w and #. We have the following connection between 1 decision lists and threshold functions [7] (see also [3] Theorem 5.1 Any 1 decision list is a threshold function. Proof: We prove this by induction on the number of terms in the decision list. Since the identically one function 1 is regarded as a monomial of length 0, we may assume that decision lists output Suppose, for the base case ....
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82, 1989: 247--261.
....which are informationtheoretically necessary and sucient to PAC learn C: For the lower bound, the following theorem is (a slight simpli cation of) a result due to Blumer et al. 7] Theorem 2.1.ii.b) a proof sketch is given in Appendix A. A stronger bound was later given by Ehrenfeucht et al. [16]. Theorem 13 Let C be any concept class and d = VC DIM(C) Then any (classical) PAC learning algorithm for C must have sample complexity d) 9 The following theorem is a quantum analogue of Theorem 13; the proof, which extends the techniques used in the proof of Theorem 10 using ideas from ....
A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant. A general lower bound on the number of examples needed for learning, Inf. and Comput. 82 (1989), 246-261.
....concept class ; 1 e)ln(1 ) random examples are still required to pac learn , where e and are the usual pac learning parameters. Interestingly, this bound holds for any unknown probability distribution, unlike the standard proof (due to Ehrenfeucht, Haussler, Kearns, and Valiant [12]) which holds only for a particular distribution constructed by an adversary. Even if the unknown probability distribution is known to be smooth , at least (1 4e) ln(1 ) examples are required to pac learn from random examples (only) For the special case of learning haft spaces of an ....
.... ( lu( In(1 3) ln( l e) ln( ln(1 ) Thus, unless rn ln(L( In( n(1 fi) in other words, unless rn = f( 1 ( ln(1 ) Pr(do, c . which proves our theorem. 2. 2 Comparison of the Lower Bound to Previous Results We already know from Ehrenfeucht et al. [12] that for 1 2, an algorithm that can only draw random examples must see at least fi examples to pac learn, where VCdim( is the Vapnik Chervonenkis dimension of the class . Since our bound is within a constant factor of their lower bound (treating VCdim( as a constant as we let and go to ....
[Article contains additional citation context not shown here]
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A gen- eral lower bound on the number of examples needed for learning. Information and Computation, 82(3):247-251, September 1989.
....constructing SQ algorithms which are nearly optimal with respect to these bounds. However, the robust PAC learning algorithms obtained by sim ulating even optimal SQ algorithms in the presence of noise are inefficient when compared to known lower bounds for PAC learning in the presence of noise [11, 20, 30]. In fact, the PAC learning algorithms obtained by simulating optimal SQ algorithms in the absence of noise are inefficient when compared to the tight bounds known for noise free PAC learning [7, 11] These shortcomings could be consequences of either inefficient simulations or a deficiency in the ....
.... are inefficient when compared to known lower bounds for PAC learning in the presence of noise [11, 20, 30] In fact, the PAC learning algorithms obtained by simulating optimal SQ algorithms in the absence of noise are inefficient when compared to the tight bounds known for noise free PAC learning [7, 11]. These shortcomings could be consequences of either inefficient simulations or a deficiency in the model itself. In this thesis, we show that both of these explanations are true, and we provide both new simulations and a variant of the SQ model which combat the current inefficiencies of PAC ....
[Article contains additional citation context not shown here]
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247-251, September 1989.
....class of Boolean functions then C shatters A f0; 1g if for every Boolean function g : A f0; 1g there exists a Boolean function f 2 C such that f j A = g. The Vapnik Chernovenkis dimension of C, V Cdim(C) is the cardinality of the largest subset A which is shattered by C. Ehrenfeucht et al. [EHKV88] proved a sample complexity lower bound of V Cdim(C) for PAC learning any class C with error ffl and confidence ffi . It is easy to see that the VC dimension of monotone functions is at least n. Hence we get the following easy corollary. Corollary 2 Any PAC learning algorithm for ....
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A General Lower Bound on the Number of Examples Needed for Learning. In Proceedings of the 1988 Workshop on Computational Learning Theory, pages 139--154, 1988.
....for constructing SQ algorithms which are nearly optimal with respect to these bounds. However, the robust PAC learning algorithms obtained by simulating even optimal SQ algorithms in the presence of noise are inefficient when compared to known lower bounds for PAC learning in the presence of noise [11, 20, 30]. In fact, the PAC learning algorithms obtained by simulating optimal SQ algorithms in the absence of noise are inefficient when compared to the tight bounds known for noise free PAC learning [7, 11] These shortcomings could be consequences of either inefficient simulations or a deficiency in the ....
.... are inefficient when compared to known lower bounds for PAC learning in the presence of noise [11, 20, 30] In fact, the PAC learning algorithms obtained by simulating optimal SQ algorithms in the absence of noise are inefficient when compared to the tight bounds known for noise free PAC learning [7, 11]. These shortcomings could be consequences of either inefficient simulations or a deficiency in the model itself. In this thesis, we show that both of these explanations are true, and we provide both new simulations and a variant of the SQ model which combat the current inefficiencies of PAC ....
[Article contains additional citation context not shown here]
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, September 1989.
....a more general context) in studying the uniform convergence of relative frequencies to probabilities. The vc dimension characterizes fairly precisely the size of training sample which should be used for e ective pac learning. The following result is due to Blumer et al. 8] and Ehrenfeucht et al. [10]. Theorem 6.1 If a feedforward network N has nite vc dimension d 1, then any consistent learning algorithm L for N is a pac learning algorithm. Moreover, there is a constant c 1 such that c 1 d ln 1 ln 1 is a sucient sample length m L ( for any such algorithm. ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation, 82 (1989), 247-261.
....is also tight in terms of the visibility size. The time and sample complexities of the learning algorithm are O( n 3 2 n = n ln(1= Notice that, since V Cdim( n 1) DL) 2 n 1, every algorithm which PAC learns (n 1) DL (let al..one an RFA one) needs a sample size exponential in n ([13]) Theorem 5 (n 1) DL is properly (n 1) RFA learnable with a sample and time complexity of O n 3 2 n n ln 1 : 20 A. BIRKENDORF, E. DICHTERMAN, J. JACKSON, N. KLASNER, AND H. U. SIMON Proof: We start by showing information theoretic learnability of this class, then ....
Ehrenfeucht, A., Haussler, D., Kearns, M., and Valiant, L. G. (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82:247{
....3.4) by showing that at least order of = Delta 2 d= Delta examples are needed to PAC learn, with accuracy and tolerating a malicious noise rate j = 1 ) Gamma Delta, every class of f0; 1g valued functions of VC dimension d. Our proof combines, in an original way, techniques from [2, 4, 10] and uses some new estimates of the tails of the binomial distribution that may be of independent interest. We then prove that this lower bound cannot be improved in general. Namely, we show (Theorems 3.18 and 3.10) that there is an algorithm RMD (for Randomized Minimum Disagreement) that, for ....
....As B is optimal, it cannot be worse than a strategy which ignores sample points, thus the error probability p B (m) does not increase with m. We may therefore drop the condition m 37 j(1 Gammaj) This completes the proof. 2 The proof of our next lower bound combines the technique from [2] for showing the lower bound on the sample size in the noise free PAC learning model with the argument of statistical indistinguishability. Here the indistinguishability is used to force with probability 1=2 a mistake on a point x, for which D(x) j= 1 Gamma j) To ensure that with probability ....
[Article contains additional citation context not shown here]
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Appendix 29 Information and Computation, 82(3):247--261, 1989.
....Since 1= Omega Gamma1 = for all SQ algorithms [14] this simulation effectively uses Omega Gammae = 2 ) 3 examples. This is clearly suboptimal when compared to the basically tight general upper and lower bounds on the noise free sample complexity whose dependence on is Theta(1= [5, 9]. 1 Thus, while there is an incentive for developing algorithms in the statistical query model due to the noise tolerance gained, there is also a disincentive towards doing so due to the inefficiency of the simulations: Algorithm designers must choose between writing a single algorithm in the SQ ....
.... O(1= examples in the absence of noise. Furthermore, our simulation uses O(1= examples in the presense of malicious errors, while retaining the Omega Gamma ) error tolerance present in the known simulation. Note that a linear dependence on 1= is optimal for both sample complexities [5, 9], and a linear dependence on is optimal for the error tolerance in the malicious error model [15] 2 Background Before presenting the new model, we give formal definitions of the other learning models used throughout this paper. We begin by defining the example based PAC learning model as well ....
[Article contains additional citation context not shown here]
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, September 1989.
....4 shows that ASE Progol is outperformed by the naive strategy of simply performing the cheapest trial. However Figure 5 indicates that ASE Progol 25 performs best in terms of the number of trials required. In this case, as might be expected from general results in Computational Learning Theory (eg [14]) random queries produce reasonably rapid convergence of the space of consistent hypotheses. One reasonable decision theoretic approach to the combination of time and cost would be to use maximum information gain per unit cost. 8 Conclusion Our aim is to partially automate some aspects of ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. In COLT 88: Proceedings of the Conference on Learning, pages 110-120, Los Altos, CA, 1988. Morgan-Kaufmann.
....time of a DNF learning algorithm. Viewing DNF formulae as polynomial threshold functions immediately yields a new interpretation of the DNF learning algorithms of Bshouty [11] and Tarui and Tsukiji [36] Since any r decision list is equivalent to a polynomial threshold function of degree r [16], in the language of polynomial threshold functions Bshouty s structural result implies that any s term DNF can be expressed as a polynomial threshold function of degree O( p n log n log s) In the case of Tarui Tsukiji, it can be shown as a corollary of their results that any s term DNF can be ....
A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation 82:3 (1989), 247-251.
....and Chervonenkis [2] see [3] showed that for any set F of VC dimension d, p(F ; m) 1 Gamma 1=m) d Gamma1 2em . Haussler et al. 1] further showed that for any set F of VC dimension d, p(F ; m) d Gamma1 2em . 1 are indicator functions of root to leaf paths. As has become standard since [4], we first consider the case in which the target is chosen uniformly at random (we set the distribution over the domain to be uniform as well) and lower bound the probability that the Bayes optimal algorithm makes a mistake in this setting. This implies the same lower bound for any algorithm for ....
....theorem, which concerns the case in which the VC dimension is 1. We denote the logarithm to the base 2 by log , and the logarithm to the base e by ln . We extend this result to the case d 1 in Theorem 3. Theorem 2 p(1; m) 1 m 1 Gamma O (log log m) 2 log m : Proof: As in [4], we will fix D, and describe a distribution over the choice of f such that, for any algorithm A, the probability, with respect to the choice of f as well as the random examples, that A makes a mistake is lower bounded as in Theorem 2. This will imply the existence of f for which the probability ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247-- 251, 1989.
....1 Gamma l ln 1 p ln 1 ffi ln(jHj) 2) The probability that this reduction erroneously removes the target function f is bounded above by l m l (p q) n Gamma1 . An analogous logarithmic lower bound can be obtained using results derived by Ehrenfeucht and colleagues [18, 26]. A similar analysis can be found in [7] The key idea underlying this analysis is to make explicit two levels of learning: a meta level and a base level [48, 53, 71] The base level learning problem is the problem of learning functions, just like regular supervised learning. The meta level ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247--261, 1989.
....jg) P(fag) ffl. So with P m probability at least (1 Gamma ffl) m , a sample x 2 X m is not (P ; H; ffl; j) reliable for t. This probability is at least ffi for m 1 Gamma ffl ffl log 1 ffi : To prove the second term in the maximum, we use an argument similar to one used in [13]. Let X 0 = fy 0 ; y 1 ; y k g X be shattered by H [j;t] where Characterising with the Band Dimension 9 k = d Gamma 1. Choose a set F H [j;t] such that jF j = 2 d and F shatters X 0 . Let P be the probability distribution on X defined by P(fxg) 8 : 1 Gamma 2ffl if ....
Ehrenfeucht, A. , Haussler, D. , Kearns, M. and Valiant, L. (1989). A general lower bound on the number of examples needed for learning, Information and Computation 82: 247-261.
....P , as described above. The only other observation needed is that, in this case, P (E h ) is precisely er (h; t) ut The constants in the sample complexity bound of the above theorem can be improved; see [11, 95] We now present a lower bound result, part of which is due to Ehrenfeucht et al. [40] and part of which is due to Blumer et al. 35] This provides a lower bound on the sample complexity of any PAC (C; H) learning algorithm when C has finite VC dimension and is non trivial . A result of this strength is not needed simply to show that finite VC dimension of the concept space is ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247--261, 1989.
....dimension) as given in the following theorem. Theorem 2.2 [16] For sample size at least max 4 ffl lg 2 ffi ; 8 vcd(C) ffl lg 13 ffl any concept C 2 C consistent with the sample will have error at most ffl with probability at least 1 Gamma ffi. Furthermore, Ehrenfeucht et al. [30] prove that any concept class C must use Omega i 1 ffl log 1 ffi vcd(C) ffl j examples in the worst case. Note that the above results also hold in the case of prediction (i.e. the hypothesis comes from a class C 0 that is more expressive than C) by simply substituting vcd(C 0 ) ....
....comes from the fact that we draw O i 1 ffl log M ffi j examples for hypothesis testing, we test M hypotheses, and each test takes O(N) time. Additionally 10 , if M = Theta (vcd(C) for concept class C, then this bound is asymptotically optimal (recall the result from Ehrenfeucht et al. [30] in Section 2.2.2) 9 Technically, all we require is that M is a known upper bound on the mistake bound. 10 It is known [61] that vcd(C) opt(C) lg jCj where opt(C) is the optimal mistake bound for C. 27 2.3 The Class of One Dimensional Geometric Patterns For the concept class considered in ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247--261, 1989. 173
....in order to obtain valid generalization with high probability. We remark that such analysis is independent of the particular learning function or learning 10 algorithm being used; in this sense, Theorem 10 may appear to be stronger than is necessary in practice. Nonetheless, there are results [7, 17, 3] showing that, no matter what learning function is used, the required number of training samples for PAC learning must still be bounded below by a quantity depending on the VC dimension. 3 Radial basis function networks Radial basis function networks in their most general form (when used for ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247-261, 1989.
....1983; Kearns 1990; Valiant 1984; Pitt 1990) for formal machine learning, among others. The formal approach has been widely explored by the machine learning community (Pitt 1990) and a great deal is known about necessary conditions, limitations, and bounds (Blumer et al. 1989; Blumer et al. 1987; Ehrenfeucht et al. 1989; Kearns and Valiant 1994; Linial et al. 1991; Pitt and Valiant 1988; Shvaytser 1990) Amsterdam (Amsterdam 1988) discusses limitations of the framework. The problem of language identification in the limit differs from this work in finding an exact rather than probabilistic predicate, and in ....
Ehrenfeucht, A., Haussler, D., Kearns, M., and Valiant, L. (1989). A general lower bound on the number of examples needed for learning. Information and Control 82, (3):247--261.
....number of computable functions can be determined. It turns out that, as far as PAC learning is concerned, it is not the size of the set of computable functions which is crucial, but the VC dimension of the network. More precisely, we have the following key result, due to Blumer et al. 1989) and Ehrenfeucht et al. 1989). Theorem 2 If a neural network N has finite VC dimension d 1, then any consistent learning algorithm L for N is a PAC learning algorithm. Moreover, there is a constant K such that a sufficient sample length m 0 (ffi; ffl) for any such algorithm is K ffl Gamma1 Gamma d ln Gamma ffl ....
Ehrenfeucht, A. , Haussler, D. , Kearns, M. and Valiant, L. , 1989, A general lower bound on the number of examples needed for learning, Information and Computation, 82(3):247--261.
....which concerns the case in which the VC dimension is 1. We denote the logarithm to the base 2 by log , and the logarithm to the base e by ln . We extend this result to the case d 1 in Theorem 3.2. Theorem 3.1. p(1; m) 1 m 1 Gamma O (log log m) 2 log m : Proof: As in [3], we will fix D, and describe a distribution over the choice of f such that, for any algorithm A, the probability, with respect to the choice of f as well as the random examples, that A makes a mistake is lower bounded as in Theorem 3.1. This will imply the existence of f for which the probability ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, 1989.
....The answer to this question depends on the complexity of the hypothesis class. For bounded degree boolean networks, Akutsu et al. showed a lower bound of k log n in order for exact inference assuming the data are uniformly distributed. For a general hypothesis class C, Ehrenfeucht et al. [8] showed a lower bound of Omega Gamma 1 ffl ln 1 ffi V CDim(C) ffl ) in the PAC sense (ffl and ffi are accuracy and confidence parameters respectively. The VC dimension (V CDim(C) measures the complexity of a hypothesis class C) This shall serve as a general guideline when choosing ....
....in nature, relying on the bounded degree assumption to achieve polynomial time results. In an attempt to utilize existing results on learning boolean functions [19] we will first try the class of kDNF and kCNF. It was shown that both kDNF and kCNF are efficiently learnable in polynomial time [8] under the PAC learning model. Other classes such as monotonic DNF should also be investigated. Our plan is to see how complex the hypothesis class have to be in order to give plausible results. 4.4 Plans for evaluation There are two means to evaluate the quality of the inference algorithms, The ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L.G. Valiant. A general lower bound on the number of examples needed for learning. Proc. of the 1988 workshop on computational learning theory, pages 139--154, 1988.
....the value of f(xm 1 ) The heart of our analysis is the proof of the following theorem, which concerns the case in which the VC dimension is 1. We extend this result to the case d 1 in Theorem 4.2. Theorem 4.1. p(1; m) 1 m 1 Gamma O (log log m) 2 log m : Proof: As in [7], we will fix D, and describe a distribution over the choice of f such that, for any algorithm A, the probability, with respect to the choice of f as well as the random examples, that A makes a mistake is lower bounded as in Theorem 4.1. This will imply the existence of f for which the probability ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, 1989.
.... log y, then (1) is obviously true. Clearly, dN = 0 only if d = 0, so assume d n 1. Then y 2d N d d N n 2d N , so we have d 1 log y log(2dn 2 ) 1 (log(2d N ) dN log(dn 2 ) log(2dn 2 ) 3d N log 2 (2dn 2 ) ut The following result follows easily from Theorem 1 in [7], which gives a lower bound on the number of examples necessary for learning f0; 1g [d] in the probably approximately correct model (see also [5] Lemma 12 Let 0 1=8, 0 1=100, and d 1. If m max d 32 ; 1 ln 1 then there is a distribution P on [d] and a function t ....
Ehrenfeucht, A. , Haussler, D. , Kearns, M. and Valiant, L. (1989). A general lower bound on the number of examples needed for learning, Information and Computation 82: 247-261. 21
....number of computable functions can be determined. It turns out that, as far as PAC learning is concerned, it is not the size of the set of computable functions which is crucial, but the VC dimension of the network. More precisely, we have the following key result, due to Blumer et al. 1989) and Ehrenfeucht et al. 1989). Theorem 2 If a neural network N has nite VC dimension d 1, then any consistent learning algorithm L for N is a PAC learning algorithm. Moreover, there is a constant K such that a sucient sample length m 0 ( for any such algorithm is K 1 d ln 1 ln 1 : On the other ....
Ehrenfeucht, A. , Haussler, D. , Kearns, M. and Valiant, L. , 1989, A general lower bound on the number of examples needed for learning, Information and Computation, 82(3):247{ 261.
....that x leads to an good hypothesis L x . We refer to [24, 6, 1] for further discussion of the theory of pac learning; here we simply state the following result, in which the upper bound follows from [6] in conjunction with our results on the VC dimensions, and the lower bound follows from [11] in conjunction with the VC dimensions. Theorem 10 Let L be any algorithm which takes as input a sample x of s points from R n , together with their classi cations determined by some target function t 2 P (n; m) and which returns a function L x in P (n; m) which correctly classi es the points ....
A. Ehrenfeucht, D. Haussler, M. Kearns and L.G. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation 82, 1989: 247-261.
....sample of length K (n 1) ln 1 ln 1 ; we are guaranteed a probably approximately correct output hypothesis, regardless of both the target hypothesis and the probability distribution on the examples. ut We now present a lower bound result, part of which is due to Ehrenfeucht et al. 1989) and the other part of which is due to Blumer et al. 1989) This provides a lower bound on the sample complexity of any (C; H) pac learning algorithm when C has nite VC dimension and is non trivial . Theorem 9 Let C be a concept space and H a hypothesis space, such that C has VC dimension at ....
Ehrenfeucht et al. (1989): A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant, A general lower bound on the number of examples needed for learning. Information and Computation, 82 (3): 247-261.
....the best learning system is not necessary. Although, having a good control of the generalization error is important. The performance is actually less critical than the control. It is remarkable that in classification, the requirements of VC theory coincide with a good control. It is proved (see [6]) that if d is the VC dimension of a learning system, any algorithm need Omega Gamma d ffl Delta input patterns to have a good estimate of the generalization error up to ffl. This result holds true for the training error. In [8] it is shown that the leave one out error of an empirical risk ....
A. Ehrenfeucht, D.Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247--261, 1989.
....Of course, this is just a heuristic that does not always hold. The Complexity of Theory Revision 14 Lower Bounds: To obtain a lower bound on the number of samples required to be at least 1 Gamma ffi confident of finding a theory within ffl of optimal, we can use Theorem 2 (Sample Complexity [BEHW89, EH89]) Given a class of theories T and values ffl; ffi 0, let T 2 T be any theory with empirical error of Err S ( T ) 0 based on m samples drawn independently from a stationary distribution over the query class Q. To be at least 1 Gamma ffi confident that Err(T ) is at most ffl (i.e. ....
Andrzei Ehrenfeucht and David Haussler. A general lower bound on the number of examples needed for learning. Inform. Comput., 82(3):247--251, September 1989.
....possible for variation, it is problematic. Variation may be generalized to apply to concepts defined by attributes having any scale (Rendell Seshu, 1994) Theoretical bounds on the sample size required for PAC learning have been based on DNF measures such as number of terms and longest term (Ehrenfeucht, Haussler, Kearns, Valiant, 1988). These bounds are related to our notion of concept difficulty although they measure learning difficulty of concept classes, instead of individual concepts. Nevertheless, concept variation can be analytically related to DNF. For instance, consider a monomial of k literals defined in an ....
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1988). A general lower bound on the number of examples needed for learning. In Proc. of the Workshop. on Computational Learning Theory, pp. 139--154.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns. L.G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3), 1989, pp. 247-261.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. In First Workshop on Computatinal Learning Theory, pages 139--154, Cambridge, Mass. August 1988. Morgan Kaufmann.
....2 X k of the Psidimension of its x restriction, if such a maximum exists, and infinity otherwise. Define the uniform Psi dimension of F analogously. As a first step, we mention the following result showing that the finiteness of the Natarajan dimension is necessary for learning. Theorem 12 ([6, 10]) If Psi N dim(F) 1 then F is not learnable. The next theorem follows relatively straightforwardly from the results obtained in the previous section. Theorem 13 Choose distinguishers Psi and Phi. Then the following are equivalent: 1. Psi dim(F) 1. 2. Phi dim(F) 1. 3. Psi dim U ....
A. Ehrenfeucht, D. Haussler, M. Kearns, and L.G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, 1989. 20
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Inf. and Comput, 82:246--261, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Inf. and Comput, 82:246--261, 1989.
No context found.
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1988). A general lower bound on the number of examples needed for learning. Proceedings of the 1988 Workshop on Computational Learning Theory (pp. 110-120). San Mateo, CA: Morgan Kaufmann.
No context found.
A Ehrenfeucht, D Haussler, M Kearns, and L Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247--261, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant, A General Lower Bound on the Number of Examples Needed for Learning, Information and Computation 82(3) (1989) 247-261
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247--251, September 1989.
No context found.
Andrzej Ehrenfeucht, David Haussler, Michael J. Kearns, and Leslie G. Valiant. A General Lower Bound on the Number of Examples Needed for Learning. Information and Computation 82(3), 247--261, 1989.
No context found.
Ehrenfeucht, A., Haussler, D., Kearns, M. & Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:3, 247-261, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. G. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247{ 251, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Inf. and Comput, 82:246--261, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82, 1989: 247-- 261.
No context found.
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247--261, 1989.
No context found.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82, 1989: 247-- 261.
No context found.
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247--261, 1989.
No context found.
Ehrenfeucht, A., Haussler, D., Kearns, M. & Valiant, L. (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82:3, 247-261.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC