21 citations found. Retrieving documents...
J. Denker et. al. "Large automatic learning, rule extraction and generalization". Complex Systems, 1, pp. 887-922, 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Successive Overrelaxation Back Propagation.. - De Leone.. (1999)   (Correct)

....or levels: the input layer, the middle or hidden layer and the output layer. Connections exist only between nodes in the input and the hidden layer and between nodes in the hidden and output layer. Despite the simplicity of the model, any Boolean function can be represented in this model [8] [9]. The two most important problems that must be solved in using neural networks are the definition of the topology of the network (i.e. the number of nodes in each level and the connections) 10, Chapter 10] and the weights on the connecting arcs. For the problem of determining these weights (also ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large automatic learning, rule extraction, and generalization," Complex Systems, vol. 1, pp. 877--922, 1987.


A New Higher-order Binary-input Neural Unit: Learning and.. - Sahin (1994)   (Correct)

....generalizations consistent with the training set reduces to 2 (2 N Gammap) which is still a large number. And it is interesting to see that the network and human beings in many applications generalize similarly. Theoretical studies on generalization is done by various authors including Denker [5], Schwartz [39] Although no formal definition of generalization aggreed upon is present, a good generalizing system is expected to associate similar outputs to similar stimuli. A theoretical framework for generalization in [12] discusses the topic extensively, which is drawn on [39] Optimal ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large Automatic Learning, Rule Extraction, and Generalization ", Complex Systems, 1, 877, (1987).


The Error Surface of the simplest XOR Network has no.. - Sprinkhuizen-Kuyper And .. (1994)   (Correct)

....and that other minima can not be stable. Results that on line learning with a reasonably large learning parameter leads best to avoiding such minima [6] can be explained from this fact. 1.3 Generalization The third point is the ability to generalize. The work of Denker, Schwartz, Solla et al. [4, 10, 11] suggests to investigate the a priori probability that a network represents a certain function when the weights are chosen randomly. We did some calculations for the XOR networks with treshold units and did some numerical experiments for the networks with a sigmoid transfer function, to determine ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel and J. Hopfield; "Large Automatic Learning, Rule Extraction, and Generalization". Complex Systems 1, pp. 877-922, 1987.


A Modified Back Propagation Algorithm for Neural Network .. - De Leone, Capparuccia.. (1999)   (1 citation)  (Correct)

....or levels: the input layer, the middle or hidden layer and the output layer. Connections exist only between nodes in the input and the hidden layer and between nodes in the hidden and output layer. Despite the simplicity of the model, any Boolean function can be represented in this model [8] [9]. The two most important problems that must be solved in using neural networks are the definition of the topology of the network (i.e. the number of nodes in each level and the connections) 10, Chapter 10] and the weights on the connecting arcs. For the problem of determining these weights (also ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large automatic learning, rule extraction, and generalization," Complex Systems, vol. 1, pp. 877--922, 1987.


The Acquisition of Lexical Semantics for Spatial Terms: A.. - Regier (1992)   (18 citations)  (Correct)

....on some highly regular tasks such as image and speech recognition. Tailoring the network architecture to the task can be thought of as a way of reducing the size of the space of possible functions that the network can generate, without overly reducing its computational power. Theoretical studies [Denker et al. 1987; Patarnello and Carnevali, 1987] have shown that the likelihood of correct generalization depends on the size of the hypothesis space (total number of networks 3 being considered) the size of the solution space (set of networks that give good generalization) and the number of training ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large Automatic Learning, Rule Extraction and Generalization," Complex Systems, 1:877--922, 1987.


Efficient Higher-order Neural Networks for Classification and.. - Ghosh, Shin (1995)   (5 citations)  (Correct)

....rules for the PSN more tractable and accurate. Function surfaces obtained using PSNs are quite smooth, and are similar to those obtained using regularization and related techniques [32] This is particularly satisfying since it corresponds to generalizations that are more natural or reasonable [30]. Indeed, the network efficiently representable functions [30] realizable through PSNs are akin to curve fitting using low order polynomials. To effectively use the PSN, one only needs to select an appropriate order of the network. While more efficient polynomial based networks may be obtained ....

.... obtained using PSNs are quite smooth, and are similar to those obtained using regularization and related techniques [32] This is particularly satisfying since it corresponds to generalizations that are more natural or reasonable [30] Indeed, the network efficiently representable functions [30] realizable through PSNs are akin to curve fitting using low order polynomials. To effectively use the PSN, one only needs to select an appropriate order of the network. While more efficient polynomial based networks may be obtained through incremental growth strategies, developing such networks ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel and J. Hopfield,"Large Automatic Learning, Rule Extraction, and Generalization,"Complex Systems 1 , pp.877-922, 1987.


Artificial Neural Networks 149 - Some Remarks   (Correct)

....become zero, while the other probabilities increase. In this paper we show that the entropy of a neural network is not always monotonically decreasing as function of the size of the training set. 1 Introduction In their study of the generalization ability of layered feedforward neural networks Denker et al. 1987) introduce the entropy S m and the average generalization ability G m as function of the size m of the training set. The entropy S m is a measure of the functional diversity of the chosen architecture restricted so that it correctly represents the m examples of the training set. In Denker et ....

....al. 1987) introduce the entropy S m and the average generalization ability G m as function of the size m of the training set. The entropy S m is a measure of the functional diversity of the chosen architecture restricted so that it correctly represents the m examples of the training set. In Denker et al. 1987) it is suggested that the entropy S m decreases when m increases, while in Solla (1992) and Schwartz et al. 1990) it is said explicitly. During his work for his master s thesis Claas (1996) tried to prove that indeed the entropy as defined by Denker et al. 1987) has to decrease with m. He ....

[Article contains additional citation context not shown here]

Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L., & Hopfield, J. (1987). "Large Automatic Learning, Rule Extraction, and Generalisation". Complex Systems 1, 877-922.


The Acquisition of Lexical Semantics for Spatial Terms: A.. - Regier (1992)   (18 citations)  (Correct)

....on some highly regular tasks such as image and speech recognition. Tailoring the network architecture to the task can be thought of as a way of reducing the size of the space of possible functions that the network can generate, without overly reducing its computational power. Theoretical studies [Denker et al. 1987; Patarnello and Carnevali, 1987] have shown that the likelihood of correct generalization depends on the size of the hypothesis space (total number of networks 3 being considered) the size of the solution space (set of networks that give good generalization) and the number of training ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large Automatic Learning, Rule Extraction and Generalization, " Complex Systems, 1:877--922, 1987.


Classification of all Stationary Points on a Neural.. - Sprinkhuizen-Kuyper.. (1994)   (Correct)

....and that other minima can not be stable. Results that on line learning with a reasonably large learning parameter leads best to avoiding such minima [6] can be explained from this fact. 1.3 Generalization The third point is the ability to generalize. The work of Denker, Schwartz, Solla et al. [4, 10, 11] suggests to investigate the a priori probability that a network represents a certain function when the weights are chosen randomly. We did some computations for the XOR networks with threshold units and numerically determined the a priori probability of the network to represent an approximation ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel and J. Hopfield; "Large Automatic Learning, Rule Extraction, and Generalization". Complex Systems 1, pp. 877-922, 1987.


Probabilities and Entropy of Some Small Neural Networks.. - Ida Sprinkhuizen-Kuyper (1996)   (Correct)

....of the boolean values the probabilities of the trivial functions in the 2 2 1 network are much larger than those of the other functions, while the probabilities of the XOR like functions are much smaller. In their study of the generalization ability of layered feedforward neural networks Denker et al. 1987) introduce the entropy S m and the average generalization ability G m as function of the size m of the training set. The entropy S m is a measure of the functional diversity of the chosen architecture restricted so that it correctly represents the m examples of the training set. In Denker et ....

....al. 1987) introduce the entropy S m and the average generalization ability G m as function of the size m of the training set. The entropy S m is a measure of the functional diversity of the chosen architecture restricted so that it correctly represents the m examples of the training set. In Denker et al. 1987) it is suggested that the entropy S m decreases when m increases, while in Solla (1992) and Schwartz et al. 1990) it is said explicitly. During his work for his master s thesis Claas (1996) tried to prove that indeed the entropy as defined by Denker et al. 1987) has to decrease with m. He ....

[Article contains additional citation context not shown here]

Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L., & Hopfield, J. (1987). "Large Automatic Learning, Rule Extraction, and Generalisation". Complex Systems 1, 877--922.


Strategies for Improving Neural Net Generalization - Derek Partridge (1994)   (1 citation)  (Correct)

....elsewhere [4] However, the most recent statistical model to be developed as a result of several substantial empirical studies in multiversion software engineering ( 1] did, as already noted, provide the basis for some of our current approaches. Within the neural network domain, Denker et al. [5] provided some quantitative measures for the potential scope for generalization that a given network architecture exhibits. And from almost the opposite direction (to our empirically driven studies) Holden [6] 7] has made progress towards a theoretical quantification of the generalization to be ....

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield. Large automatic learning rule extraction and generalization. Complex Systems, 1, 1987.


Designing Modular Artificial Neural Networks - Boers, Kuiper, Happel.. (1993)   (18 citations)  (Correct)

....determines a probability distribution over the space of possible input output mappings that can be implemented with the fixed architecture. The entropy of this distribution is a quantitative measure of the diversity of the mappings realizable by the architecture under consideration (see e.g. [4, 8, 24]) Learning from examples reduces the intrinsic entropy of the untrained network by excluding weight configurations which realize mappings incompatible with the training set. The residual entropy of the trained network is a measure of its generalization. The goal of this research has been to ....

...., the network is unable to learn the desired input output mapping. Selecting a network structure defines a class of functions that are realizable with that network. A useful measure of the diversity of possible mappings that can be implemented with the chosen architecture is the a priory entropy [4]: of the a priory probability distribution of a given network. Since not just for the desired mapping but also for mappings , the distribution of is such that the entropy . The definition of the a priory probability and the entropy of neural networks give a mathematical notion of the necessity to ....

J.S. Denker, D.B. Schwartz, B.S. Wittner, S.A. Solla, R.E. Howard, L.D. Jackel and J.J. Hopfield; `Large automatic learning, rule extraction and generalization'. In: Complex systems, 1, 877-922, 1987.


On Some Factors Influencing MLP Error Surface - Kordos, Duch   (Correct)

No context found.

J. Denker et. al. "Large automatic learning, rule extraction and generalization". Complex Systems, 1, pp. 887-922, 1987.


Some Remarks on the Entropy of a Neural Network - Ida Sprinkhuizen-Kuyper..   (Correct)

No context found.

Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L., & Hopfield, J. (1987). "Large Automatic Learning, Rule Extraction, and Generalisation". Complex Systems 1, 877-922.


An Incremental Learning Algorithm That Optimizes Network Size and.. - Zhang (1994)   (9 citations)  (Correct)

No context found.

J. Denker et al., "Large automatic learning, rule extraction, and generalization," Complex Systems, vol. 1, pp. 877--922, 1987.


Generalization Properties of Modular Networks: Implementing.. - Franco, Cannas (2001)   (Correct)

No context found.

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, "Large automatic learning, rule extraction and generalization, " Complex Syst., vol. 1, pp. 877--922, 1987.


An Incremental Learning Algorithm That Optimizes Network Size and.. - Zhang (1994)   (9 citations)  (Correct)

No context found.

J. Denker et al., "Large automatic learning, rule extraction, and generalization," Complex Systems, vol. 1, pp. 877--922, 1987.


Two Constructive Methods for Designing Compact Feedforward.. - Amaldi, al. (1998)   (2 citations)  (Correct)

No context found.

J.Denker,D.Schwartz,B.Wittner,S.Solla, R. Howard, L. Jackel and J. Hopfield 1987, "Large automatic learning, rule extraction, and generalization, " Complex Syst. 1, 877--922.


An Incremental Learning Algorithm That Optimizes Network Size and.. - Zhang (1994)   (9 citations)  (Correct)

No context found.

J. Denker et al., "Large automatic learning, rule extraction, and generalization," Complex Systems, vol. 1, pp. 877--922, 1987.


Network Generalization Differences Quantified - Derek Partridge (1994)   (4 citations)  (Correct)

No context found.

J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield. Large automatic learning rule extraction and generalization. Complex Systems, 1, 1987.


GAL: Networks that grow when they learn and shrink when they forget - Alpaydin (1991)   (18 citations)  (Correct)

No context found.

Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L., Hopfield, J. (1987) "Large automatic learning, rule extraction, and generalization," Complex Systems, 1, 877--922.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC