58 citations found. Retrieving documents...
H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8), 1992.

 Home/Search   Document Not in Database   Summary   APS   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Statistical Mechanics of Neural Networks: Enhancement by.. - Dietrich   (Correct)

....k . 3.22) We shall see in a while how Mercer s condition reduces to a requirement on k that is very easy to check. 3.4 Eigenvalue Expansion The general aim of this part of the thesis is to establish an analysis of the above minimisation problem (3. 8) by means of statistical mechanics [64, 76, 55]. The starting point for this is, as usual, the partition function . 3.23) The integral is over the free variables, i.e. the student weights; the Hamiltonian is just the objective function from (3.8) and the inequality constraints are implemented via the # functions. ....

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Phys. Rev. A, 45(8):6056--6091, 1992.


The VC Dimension for Mixtures of Binary Classifiers - Jiang   (Correct)

....who show that the generalization error undergoes a second order phase transition related to symmetry breaking and follows asymptotically an inverse power law, as the sample size increases. They, however, consider hard boundary gating functions and only a small number of experts (e.g. m = 2) As Seung, Sompolinsky and Tishby (1992) pointed out, the two different approaches, VC theory that employs inequalities and bounds, versus statistical physics that uses approximations, provide complementary perspectives to the study of generalization. Acknowledgments The author is grateful to Martin A. Tanner for suggesting this ....

Seung, H. S., Sompolinsky, H. and Tishby, N. 1992. Statistical mechanics of learning from examples. Physical Review A. 45, 6056-6091.


Model Selection in Clustering By Uniform Convergence Bounds - Buhmann, Held (2000)   (Correct)

....neighborhood of the empirical minimizer de ne the version space (see also [4] Averaging over this neighborhood yields a structure with risk equivalent to the expected risk obtained by random sampling from this set of hypotheses. There exists also a tight methodological relationship to [7] and [4] where learning curves for the learning of two class classi ers are derived using techniques from statistical mechanics. 2 The Empirical Risk Approximation Principle The data samples Z = fz r 2 ; 1 r lg which have to be analyzed by the unsupervised learning algorithm are elements ....

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056-6091, April 1992.


On-Line Learning Processes in Artificial Neural Networks - Heskes, Kappen (1993)   (10 citations)  (Correct)

....3 Intermezzo: other approaches 3. 1 The Langevin approach In this section we will point out the difference between the intrinsic noise due to the random presentation of training patterns and the artificial noise in studies on the generalization capabilities of neural networks (see e.g. [57, 64]) In the latter case, the noise is added to the deterministic equation (18) i.e. the weights evolve according to the Langevin equation dw(t) dt = GammarE(w(t) p 2T(t) 23) where (t) is white noise obeying Omega i (t) j (t 0 ) ff = ffi ij ffi(t Gamma t 0 ) The Langevin ....

....ff(w) P (w; t)g T r 2 P (w; t) The equilibrium distribution is [compare with equation (9) P s (w) 1 Z exp Gamma E(w) T ; 24) with Z a normalization constant. The existence of this Gibbs distribution raises the idea to put learning in the framework of statistical mechanics [45, 57, 64]. In these studies, the Langevin equation (23) is more an excuse to arrive at the Gibbs distribution (24) than an attempt to study the dynamics of learning processes in artificial neural networks. The equilibrium distribution of the master equation for on line learning processes is not a simple ....

H. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45:6056--6091, 1992.


On PAC Learning Using Winnow, Perceptron, and a Perceptron-Like.. - Servedio   (Correct)

....= 0 (i.e. origin centered halfspaces) under the uniform distribution on the unit sphere in R n : 6. 1 PREVIOUS WORK The problem of learning an unknown origin centered halfspace in R n given access to examples drawn uniformly from the unit sphere has been the subject of considerable research [7, 8, 16, 22, 28, 34, 39]. Long [28] proved that any algorithm which learns an origin centered halfspace to accuracy ffl under the uniform distribution must use at least Omega Gamma n ffl ) examples. Long also showed [29] that by applying Vaidya s linear programming algorithm [40] it is possible to learn to accuracy ....

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056-6091, 1992.


A Rigorous Investigation Of "Evidence" And "Occam Factors" In.. - Wolpert   (Correct)

....0 h, and q at least one h agreeing with q for which T(h) 0. This last requirement ensures that the generalizer is defined for all q. The Gibbs generalizer can be viewed as a zero temperature limit of the scenarios analyzed in the statistical mechanics supervised learning framework (see [Seung et al. 1991a, 1991b, Tishby et al. 1989] III) The generalization error function is a mapping from (f, h, q X , q Y ) to R. It measures how good h is as a guess for f. One rather popular choice is the i.i.d. error function: Er(f, h, q) S xX p(x) 1 d(f(x) h(x) where p( is the same distribution ....

Seung H., et al. (1991). Statistical mechanics of learning from examples I, II. Submitted.


Average Case Analysis of the Clipped Hebb Rule for.. - Mostefa Golea.. (1993)   (1 citation)  (Correct)

....attracted much attention recently [1, 4, 6, 7, 13] This was motivated by both theoretical and practical reasons. First, because the number of possible states in the weight space of a binary network is finite, its properties may di#er drastically from these of a network with continuous weights [4, 12]. Second, the hardware realization of binary networks may prove simpler. The generalization ability of neural networks with binary weights has been studied extensively using the statistical mechanics approach [4, 7, 12] Although this approach has yielded some impressive results, it has its ....

....properties may di#er drastically from these of a network with continuous weights [4, 12] Second, the hardware realization of binary networks may prove simpler. The generalization ability of neural networks with binary weights has been studied extensively using the statistical mechanics approach [4, 7, 12]. Although this approach has yielded some impressive results, it has its shortcomings. In particular, it neglects the computational aspect of the learning process by assuming a stochastic training algorithm, similar to a finite Monte Carlo process, that leads at long times to a Gibbs distribution ....

[Article contains additional citation context not shown here]

Seung H. S., Sompolinsky H., and Tishby N., "Statistical Mechanics of Learning from Examples", Phys. Rev. A, Vol. 45, (1992), 6056--6091.


Knowledge Acquisition in Statistical Learning Theory - Fine (1999)   (Correct)

....of machine learning. Many of the notions that have been used in the de nition of the PAC model, as well as later studies, emerged from di erent (although related) research elds, such as Pattern Recognition [43] Inductive Inference [7] Information Theory [102, 33] and Statistical Mechanics [105, 67]. A notable contribution was made by the work of Vapnik [116] who addressed mainly questions related to sample complexity of learning algorithms. Since then, computational learning theory literature has made an extensive use of many of the notions and results proposed by Vapnik, although he ....

....the hypothesis h. Several papers in the eld of computational learning theory have studied sequential classi cation problems in which f 1g labeled instances (examples) are given online, one at a time, and for each new instance, the learning system must predict the label before it sees it ( cf. [66, 105]) Such systems adapt online learning to make better predictions as they see more examples. If n is the total number of examples, then the performance of these online learning systems, as a function of n, has been measured both by the total number of mistakes (incorrect predictions) they make ....

[Article contains additional citation context not shown here]

H. S. Seung, H. Sampolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45:6056-6091, 1992.


On the Optimal Number of Clusters in Histogram Clustering - Buhmann, Held (1999)   (Correct)

....risk obtained by a random sampling over this ball. From a Bayesian point of view this is similar to averaging over a posterior distribution, where a uniform distribution over the hypothesis space is used as prior. In addition there is a tight methodological relationship to the papers [10] and [11] where learning curves for the learning of two class classifiers are derived using techniques from statistical mechanics. Especially in [11] the notion of an optimal temperature with respect to the generalization error is introduced. These works present asymptotic results in the sense, that the ....

....where a uniform distribution over the hypothesis space is used as prior. In addition there is a tight methodological relationship to the papers [10] and [11] where learning curves for the learning of two class classifiers are derived using techniques from statistical mechanics. Especially in [11] the notion of an optimal temperature with respect to the generalization error is introduced. These works present asymptotic results in the sense, that the learning behavior is studied in the limit of an infinite number of data samples l and an infinite number of the hypotheses, where the ratio ....

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056--6091, April 1992.


Data Compression and Prediction in Neural Networks - Meir, Fontanari (1993)   (Correct)

....there is a value of ff below which training with zero error is possible (even in the unrealizable problem) The entropy of the system at this value of ff vanishes; thus we term this value ff ZE . ii) In the realizable case L s L t there is a sharp transition to perfect generalization at ff ZE [SST92][MF92] iii) For the unrealizable problem above ff ZE , there is a freezing of the system at a nonzero temperature T ZE (ff) i.e. the entropy of the system vanishes for all T T ZE (ff) The training error of the network above ff ZE is nonzero. iv) N log ff bits are sufficient to perfectly ....

Seung S., H. Sompolinsky and N. Tishby. Statistical mechanics of learning from examples. Phys. Rev. A45: 6056-6091. 10


A Bound on the Error of Cross Validation Using the Approximation.. - Kearns (1996)   (16 citations)  (Correct)

.... is spherically symmetric (for instance, the uniform density on the unit ball in N ) and the target function is a function in H s with all s nonzero weights equal to 1, then it can be shown that the approximation rate function ffl g (d) is ffl g (d) 1= cos Gamma1 ( p d=N) for d s [6], and of course ffl g (d) 0 for d s. This problem provides a nice contrast to the intervals problem, since here the behavior of the approximation rate for small d is concave down: as long as d s, an incremental increase in d yields more approximative power for large d than it does for small d ....

....h d 3 . Note that the best such bound may depend in a complicated way on all of the elements of the problem: f , D and the structure. Indeed, much of the recent workon the statistical physics theory of learning curves has documented the wide variety of behaviors that such deviations may assume [6, 3]. However, for many natural problems it is both convenient and accurate to rely on a universal estimation rate bound provided by the powerful theory of uniform convergence: Namely, for any f , D and any structure the function ae(d; m) p (d=m) log(m=d) is an estimation rate bound [9] 4 . ....

[Article contains additional citation context not shown here]

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-- 6091, 1992.


Learning Curve Bounds for Markov Decision Processes with.. - Saul, Singh (1996)   (Correct)

....effective number of accessible states may be much smaller than the size of the state space. These considerations do not apply to MDPs with undiscounted rewards. Our analysis employs a particular limiting method the so called thermodynamic limit developed in the statistical physics literature[9, 12]. For MDPs, this is the combined limit that the allowed exploration time, T , and the size of the state space, N , grow to infinity at a fixed rate: T 1;N 1; T =N = ff (finite) Ref. 5] gives a rigorous treatment of this method from the viewpoint of computational learning theory. Though ....

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A 45: 6056--6091, 1992.


The Relationship between PAC, the Statistical Physics framework.. - Wolpert (1994)   (3 citations)  (Correct)

.... those frameworks: PAC ( Blumer et al. 1987, Blumer et al. 1989, Haussler 1994ab, Valiant 1984, Dietterich 1990, COLT, Rivest 1989, Natarajan 1991, Anthony and Biggs 1992] the statistical physics of supervised learning (SP [Hertz et al. 1991, Opper and Haussler 1991ab, Schwartz et al. 1990, Seung et al. 1991, Tishby et al. 1989, Tishby et al. 1994, Van der Broech and Kawai 1991, Wolpert 1994e] Bayesian supervised learning ( Berger 1985, Buntine and Weigend 1991, Buntine 1990a, Duda and Hart 1973, Haussler et al. 1994, Loredo 1990, Neal 1994, Wolpert 1994bc, 1993, MacKay 1991, Wolpert and Strauss ....

....there is a good deal of current work concerned with modifying and extending the other three frameworks. For example a lot of work has been done extending SP to the case of noise, non zero temperature generalizers, and or assumed correspondences between the generalizer and the prior. See Seung et al. 1991, Tishby et al. 1989, Tishby et al. 1994) Often this is all done with f and h parameterized as neural nets. Sometimes the distributions involving f and d are referred to as the teacher , and P(h d) as the student . As example of an extension of PAC is the Probably Approximately Bayes ....

Seung H., et al. (1991). Statistical mechanics of learning from examples I, II. Physical Review A, 45, p.


Learning From a Population of Hypotheses - Michael Kearns And (1995)   (13 citations)  Self-citation (Seung)   (Correct)

No context found.

H.S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056--6091, April 1992.


Rigorous Learning Curve Bounds from Statistical Mechanics - Haussler, Kearns, Seung.. (1996)   (40 citations)  Self-citation (Seung Tishby)   (Correct)

.... Lyuu and Rivin (1991; 1992) Here we are actually giving a bound on the entire learning curve, and the behavior of our bound is very similar in shape to learning curves obtained in both simulations and non rigorous replica calculations from statistical physics (Engel Fink, 1993; Gyorgyi, 1990; Seung et al. 1992; Sompolinsky et al. 1990) 6 . In figure 11, we graph the difference of the entropy and energy curves shown in figure 3, that is, we plot s(#) # log(1 #) for the three values of #. This plot is simply another way of visualizing the entropy energy competition. The zero crossings of the ....

....3.5. Large # asymptotics of scaled learning curves Our formalism can be used to give a classification of the large # asymptotics of scaled learning curves 7 , thus completing a classification program that has been suggested by several researchers (Amari et al. 1992; Schwartz et al. 1990; Seung et al. 1992). From Eq. 32) and Lemma 9, the weaker form u(#) # # min ) 2 2v(#) 54) P1: rba Machine Learning KL36204(Haus) October 10, 1996 14:3 RIGOROUS LEARNING CURVE BOUNDS 227 Figure 21. Phase diagram showing line of first order transitions beginning at # = 1.448 for # min (# ) 0 and ....

[Article contains additional citation context not shown here]

Seung, H.S., Sompolinsky, H., & Tishby, N. (1992). Statistical mechanics of learning from examples. Physical Review, A45:6056--6091.


Active Learning for Logistic Regression - Schein (2005)   (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8), 1992.


An Experimental and Theoretical Comparison of Model.. - Kearns, Mansour, Ng, Ron   (57 citations)  (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056--6091, 1992.


Active Learning for Logistic Regression - Schein (2004)   (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8), 1992.


Learning Unrealizable Tasks From Minimum Entropy - Sollich (1995)   (Correct)

No context found.

H S Seung, H Sompolinsky, and N Tishby. Statistical-mechanics of learning from examples. Phys. Rev. A, 45:6056--6091, 1992.


Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out .. - Kearns, Ron (1997)   (42 citations)  (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056--6091, 1992.


Efficient Noise-Tolerant Learning From Statistical Queries - Kearns (1998)   (100 citations)  (Correct)

No context found.

H.S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056--6091, April 1992.


An Experimental and Theoretical Comparison of Model.. - Kearns, Mansour, Ng, Ron   (57 citations)  (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056--6091, 1992.


A Bound on the Error of Cross Validation Using the Approximation.. - Kearns (1996)   (16 citations)  (Correct)

No context found.

H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-- 6091, 1992.


Optimal Nonlinear Training In The Multi-Class Proximity Problem - Bolle, Jongen (1996)   (Correct)

No context found.

H. S. Seung, H. Sompolinsky and N. Tishby 1992, #Statistical mechanics of learning from examples," Phys. Rev. A 45, 6056#6091.


Dynamic and Static Properties of Neural Networks with FeedBack - Priel (1999)   (Correct)

No context found.

Seung H.S., Sompolinsky H. and Tishby N. "statistical mechanics of learning from examples". Physical Review A, 45:6056, 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC