| D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, pages 76--87, 1994. |
....based on the behavior of an upper bound to the loss, instead of the loss itself which is the quantity of interest. While the upper bound becomes tight as the sample size increases, it is not all clear that the upper bound is indicative of the true behavior of the system for finite sample sizes [7], which is really the only region of interest for model selection. Moreover, any upper bound derived is based on a specific bounding technique. It is possible that other bounding methods may lead to different model selection choices. Continuing now to the methods based on asymptotic expansions, we ....
D. Haussler, M. Kearns, H.S. Seung and N. Tishby. "Rigorous Learning Curve Bounds from Statistical Mechanics", in Proceedings of the seventh workshop on Computational Learrang Theory", ACM Press. 1994.
....distribution. The informal characterization can be justified using approximations expected to hold for large sample size. The precise characterization is proved in a certain large sample limit similar to the thermodynamic limits studied in the statistical physics model of machine learning [7]. Section 3 also gives two different posterior distributions which both achieve optimal behavior in the large sample limit. 2 The First Main Result In this section we state the main result of this paper. Proofs are given in later sections. We assume a prior probability measure P on a concept ....
....(1) Equation (7) is derived below in two ways. First a derivation is given using informal approximations which should hold for large m. Second, the equation is proved rigorously in a certain large m thermodynamic limit similar to limits used in the statistical physics model of machine learning [7]. Before doing either of these, however, we rigorously prove the following inequality which holds without any large m assumptions. 6 Theorem 8 B(Q ) min l l s ln(1=p( l) 2m Proof: Let Q be an arbitrary distribution on concepts. We show that B(Q) can be no smaller than the ....
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195-- 236, 1996. Also Appeareed in COLT-94.
....other hand, knowing that we would like a particular quality guarantee, we can ask how large a sample we need to draw to ensure that guarantee. The former question has been addressed for predictive learning in work on selfbounding learning algorithms (Freund, 1998) and shell decomposition bounds (Haussler et al. 1996; Langford McAllester, 2000) For our purposes here, the latter question is more interesting. We assume that samples can be requested incrementally from an oracle ( incremental learning ) We can then dynamically adjust the required sample size based on the characteristics of the data that have ....
Haussler, D., Kearns, M., Seung, S., & Tishby, N. (1996). Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25.
.... communicate successfully leave more o spring, who in turn learn their language, which puts the problem of grammar acquisition in an evolutionary context (Hashimoto Ikegami, 1995, 1996; Nowak Krakauer, 1999; Nowak et al. 1999, 2000) Learning theory (Vapnik, 1995; Valiant, 1984; Niyogi, 1998; Haussler et al. 1997; Osherson et al. 1986) often asks the question how many sample sentences are needed for an individual learner to acquire the correct rule from a single teacher with a certain probability. In contrast, we study the following question: what are the conditions for the learning process which allow a ....
HAUSSLER, D., KEARNS, M., SEUNG,H.S.&TISHBY,N. (1997). Rigorous learning curve bounds from statistical mechanics. Mach. earning 25, 195}236.
....fact that this result holds with no assumption on the probability distribution underlying the data. Consequently, VC theory of bounds is considered as a Worst Case theory. This observation is the source of most of the criticisms addressed to VCtheory. It has been argued (see e.g. 4] 5] [9], 17] that VC bounds are loose in general. Indeed, there is an infinite number of situations in which the observed learning curves representing the generalization error of some learning structure are not well described by theoretical VC bounds. 2 In [17] D. Schuurmans criticizes the ....
Haussler, D., Kearns, M., Seung, H.S., Tishby, N.: Rigorous Learning Curve Bounds from Statistical Mechanics. Machine Learning (1996) 195-236
....amount of data. One approach to nding practical algorithms is to process a xed amount of data but determine the possible strength of the quality guarantee dynamically, based on characteristics of the data; this is the idea of self bounding learning algorithms [8] and shell decomposition bounds [13, 19]. Another approach (which we pursue) is to demand a certain xed quality and determine the required sample size dynamically based on characteristics of the data that have already been seen; this idea has originally been referred to as sequential analysis [4, 28, 9] In the machine learning ....
....(in which all hypotheses are equally good) Instead of operating with smaller samples, it is also possible to work with a xed size sample but guarantee a higher quality of the solution if the observed situation di ers from this worst case. This is the general idea of shell decomposition bounds [13, 19] and self bounding learning algorithms [8] Although we have discussed our algorithm only in the context of knowledge discovery tasks, it should be noted that the problem which we address is relevant in a much wider context. A learning agent that actively collects data and searches for a ....
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
....x l 1 . The version space is de ned as the set of hypotheses which are consistent with the rst l given data points. In our approach we use an alternative de nition of consistency, where all hypothesis in an appropriate neighborhood of the empirical minimizer de ne the version space (see also [4]) Averaging over this neighborhood yields a structure with risk equivalent to the expected risk obtained by random sampling from this set of hypotheses. There exists also a tight methodological relationship to [7] and [4] where learning curves for the learning of two class classi ers are derived ....
....neighborhood of the empirical minimizer de ne the version space (see also [4] Averaging over this neighborhood yields a structure with risk equivalent to the expected risk obtained by random sampling from this set of hypotheses. There exists also a tight methodological relationship to [7] and [4] where learning curves for the learning of two class classi ers are derived using techniques from statistical mechanics. 2 The Empirical Risk Approximation Principle The data samples Z = fz r 2 ; 1 r lg which have to be analyzed by the unsupervised learning algorithm are elements of a ....
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195-236, 1997.
....[2] leads to an analysis that (besides making the additional assumption that the training set error is known) deviates from the rst analysis [18] only in some technical details. The histogram of error rates has been used to improve on worst case error bounds. The idea of a worst case analysis of [5] is that hypotheses with an error rate of much more than the desired error bound have a much smaller chance of incurring the least empirical error than hypotheses with an error rate that lies just slightly above . In contrast to the resulting shell decomposition bounds, we obtain the exact ....
.... Given the uncertainty that remains when the histogram has been estimated, it is not possible to determine the exact expected generalization error (which we are concerned about in this paper) but Langford and McAllester [8] have proven worst case shell decomposition bounds that di er from those of [5] by taking into account that the histogram is only estimated. We have shown that the error histogram for Boolean functions is a certain binomial distribution. A fundamental question is whether there is a more general link between the error histogram and measurable properties (such as the VC ....
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
....a tighter bound on the difference between training error and true error for hypotheses with training error near 0. The new bound is proved using a distribution dependent application of the union bound similar in spirit to the shell decomposition introduced by Haussler, Kearns, Seung and Tishby [1]. We actually give two upper bounds on generalization error an uncomputable bound and a computable bound. The uncomputable bound is a function of the unknown distribution of true error rates of the hypotheses in the class. The computable bound is, essentially, the uncomputable bound with the ....
....can be ruled out. But as the confidence requirement becomes more stringent suddenly (and discontinuously) the high error concepts must be considered. A similar discontinuity can occur in sample size. Phase transitions in shell decomposition bounds are discussed in more detail by Haussler et al. [1]. Phase transition complicate asymptotic analysis. But asymptotic analysis illuminates the nature of phase transitions. As mentioned in the introduction, in the asymptotic analysis of learning theorem bounds it is important that one not hold H fixed as the sample size increases. If we hold H ....
David Haussler, Michael Kearns, H. Sebastian Seung, and Naftali tishby, "Rigorous learning curve bounds from statistical mechanics", Machine Learning 25, 195-236, 1996.
....space. The version space is defined as the set of hypotheses which are all consistent with the first l selected data samples. In our approach we use an alternative definition of consistency, where all hypothesis in an 2fl app ball around the empirical minimizer define the version space (see also [10]) Averaging over this ball yields a hypothesis with risk equivalent to the expected risk obtained by a random sampling over this ball. From a Bayesian point of view this is similar to averaging over a posterior distribution, where a uniform distribution over the hypothesis space is used as prior. ....
....the expected risk obtained by a random sampling over this ball. From a Bayesian point of view this is similar to averaging over a posterior distribution, where a uniform distribution over the hypothesis space is used as prior. In addition there is a tight methodological relationship to the papers [10] and [11] where learning curves for the learning of two class classifiers are derived using techniques from statistical mechanics. Especially in [11] the notion of an optimal temperature with respect to the generalization error is introduced. These works present asymptotic results in the sense, ....
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195--236, 1997.
....[17] Some variants of SRM have been developped in [10] in order to understand which inductive principle explains the remarkable generalization performance of SVM. Measuring VC Dimension VC theory has been criticized several times for being a Worst Case, thus not practical, theory (see [5] 6] [8]) Hopefully, the development of SVM has shown the bearing of Statistical Learning Theory in the real world . Our point is that there is probably some space for improving the theory itself byproviding an operational version of its central concepts as VC dimension. A common argument against VC ....
D. Haussler, M. Kearns, H.S. Seung, N. Tishby, Rigorous Learning Curve Bounds from Statistical Mechanics, Machine Learning, 195-236, 1996.
....a tighter bound on the difference between training error and true error for hypotheses with training error near 0. The new bound is proved using a distribution dependent application of the union bound similar in spirit to the shell decomposition introduced by Haussler, Kearns, Seung and Tishby [1]. We actually give two upper bounds on generalization error an uncomputable bound and a computable bound. The uncomputable bound is a function of the unknown distribution of true error rates of the hypotheses in the class. The computable bound is, essentially, the uncomputable bound with the ....
....can be ruled out. But as the confidence requirement becomes more stringent suddenly (and discontinuously) the high error concepts must be considered. A similar discontinuity can occur in sample size. Phase transitions in shell decomposition bounds are discussed in more detail by Haussler et al. [1]. 29 Phase transition complicate asymptotic analysis. But asymptotic analysis illuminates the nature of phase transitions. As mentioned in the introduction, in the asymptotic analysis of learning theorem bounds it is important that one not hold H fixed as the sample size increases. If we hold H ....
David Haussler, Michael Kearns, H. Sebastian Seung, and Naftali tishby, "Rigorous learning curve bounds from statistical mechanics", Machine Learning 25, 195-236, 1996.
....the technique for proving the PACstyle bounds (see the proof of Theorem 2. 2) is inherently sub optimal asymptotically since it makes the worst case assumption that all points in a covering of the loss function is equally likely to become an empirical estimate of the parameter (for example, see [14] for discussions of this point) For classi cation problem (1) the estimation equation becomes E x;y f # (w T # xy)xy #g(w # ) 0: 13) 23 Therefore let cov denote covariance, wehave (w # ;x;y) f # (w T # xy)xy #g(w # ) f # (w T # xy)xy #E x;y f # (w T # xy)xy; U(w ....
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25(2-3):195-236, 1996.
....that the slope of a learning curve is monotonically non increasing with n except for local variance. Locality is defined within a particular progressive sampling procedure. Not all learning curves are well behaved. For example, theoretical analyses of learning curves based on statistical mechanics [7, 19] have shown that sudden increases in accuracy are possible, particularly on small samples. However, empirical studies of the application of standard induction algorithms to large data sets those of relevance to this paper have shown learning curves to be well behaved [3, 4, 6, 12, 13] In ....
Haussler, D., Kearns, M., Seung, H. S., and Tishby, N. Rigorous learning curve bounds from statistical mechanics. Machine Learning 25 (1996), 195--236.
....limiting method the so called thermodynamic limit developed in the statistical physics literature[9, 12] For MDPs, this is the combined limit that the allowed exploration time, T , and the size of the state space, N , grow to infinity at a fixed rate: T 1;N 1; T =N = ff (finite) Ref. [5] gives a rigorous treatment of this method from the viewpoint of computational learning theory. Though formulated originally for problems in supervised learning, it can also be used to study problems in decision and control. Of course, important differences between these two types of problems must ....
....of 2 N policies, and second, to discriminate (based on imperfect statistics) which policies are best. Since it is this second goal we wish to focus on, we will study an algorithm that has unlimited resources for search, but limited resources for policy evaluation. The so called Gibbs algorithm[5] works as follows. For each policy , it selects a random initial state, then estimates the value function v by the time averaged return from a random walk of T steps: v = 1 T T X t=1 R i t : 7) This is done in parallel for each of the 2 N policies, 2 f0; 1g N . Having ....
[Article contains additional citation context not shown here]
D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. In Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory, pages 76--87. Morgan Kauffman, San Mateo, CA, 1994.
No context found.
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, pages 76--87, 1994.
....Jerusalem 91904, Israel Email: ffshai,ranb,shamir,tishbyg cs.huji.ac.il Abstract Generalization in most PAC learning analysis starts around O (d) examples, where d = V C dim of the class. Nevertheless, analysis of learning curves using statistical mechanics shows much earlier generalization [7]. Here we introduce a gadget called Early Predictor, which exists if somewhat better than random prediction of the label of an arbitrary instance can be obtained from labels of O (log d) random examples. We were able to show that by taking a majority vote over a committee of Early Predictors, ....
....of the learning curve starts from O (d) examples, where d = V C d . For example, Helmbold and Warmuth [8] showed that 2d Gamma Omega Gamma p d log d Delta examples suffices for weak learning 1 , while d Gamma O( p d) are essential for distribution free learning. Haussler et al. [7] showed, based on analysis taken from the statistical mechanics of learning, that for many classes of distributions generalization is possible even with very few examples. In this paper we continue this line of study and focus our attention on the very beginning of the learning curve. In ....
D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195--236, 1997.
....the limit of infinite temperature. The close relationship between the VC and Gardner entropies can be seen within the replica formalism. There has been recent progress towards understanding the relationship between the statistical physics and Vapnik Chervonenkis (VC) approaches to learning theory[1, 2, 3, 4]. The two approaches can be unified in a statistical mechanics based on the VC entropy. This paper treats the case of learning randomly labeled patterns, or the capacity problem, and extends some of the results of previous work[5, 6] to finite temperature. As will be explained in a companion ....
D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, pages 76-- 87, New York, 1994. ACM.
No context found.
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
No context found.
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
No context found.
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
No context found.
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
No context found.
D. Haussler, M. Kearns, S. Seung, N Tishby, "Rigorous Learning Curve Bounds from Statistical Mechanics", Proceedings of COLT 1994, pp. 76-85.
No context found.
Haussler, Kearns, Seung, and Tishby. Rigorous learning curve bounds from statistical mechanics. In COLT, 1994.
No context found.
D. Haussler, M. Kearns, S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 1996.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC