25 citations found. Retrieving documents...
M. C. Mozer and P. Smolensky, "Using relevance to reduce network size automatically," Connection Sci., vol. 1, no. 1, pp. 3--16, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Guidelines for Financial Forecasting with Neural Networks - YAO, TAN   (Correct)

....NN. When there is no enough data available to train the NNs and the structure of NNs is too complex, the NN tends to memorize the data rather than to generalize from it. Keeping the NN small is one way to avoid overfitting. One can prunes the network to a small size using the technical such as in [12]. Step 4. Data Organization The next step is Data Organization. In data preprocessing step, we have chosen the prediction goal and the inputs that should be used. The historical data may not necessarily contribute equally to the model building. We know that for certain periods the market is more ....

M.C. Mozer, P. Smolensky , "Using relevance to reduce network size automatically", Connection Science, 1(1), 1989, pp3-16.


Extraction of Rules from Artificial Neural Networks for.. - Setiono, Leow (2002)   (3 citations)  (Correct)

....been proposed in the literature. The constructive algorithms start with a few hidden units and add more units as needed to improve network accuracy [5, 6, 7] The destructive algorithms, on the other hand, start with a large number of hidden units and remove those that are found to be redundant [8]. The number of useful input units corresponds to the number of relevant input attributes of the data. Typical algorithms usually start by assigning one input unit to each attribute, train the network with all input attributes and then remove network input units that correspond to irrelevant data ....

M. C. Mozer and P. Smolensky, \Using relevance to reduce network size automatically," Connection Science, vol. 1, no. 1, pp. 3-16, 1989.


Behavioural Aspects of BP-SOM - Weijters, van den Bosch, van den..   (Correct)

.... avoiding overfitting, or regularisation, can be distinguished: i) starting with an undersized network, and gradually increasing the network s complexity [Fahlman Lebiere, 1990] and (ii) starting with an oversized network and gradually decreasing its complexity by (a) pruning units or weights, (Mozer Smolensky, 1989; Le Cun et al. 1990; Hassibi et al. 1992; Weigend, 1994) b) adding penalty terms (e.g. Weigend et al. 1991) or (c) using early stopping (e.g. Pechelt, 1994) bp som belongs to the second group: the representations at the hidden layer of the mfn are guided by the som, resulting in ....

Mozer, M. C., & Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1, 3--16.


Generativity and Systematicity in Neural Network Combinatorial.. - Brousse (1993)   (8 citations)  (Correct)

....needed to obtain a fixed competence level scales linearly with the number of input units, and that the nature of representations used play a crucial role in generalization performance. Other experimental research activities on generalization have been targeted at improving generalization ( e. g (Mozer and Smolensky, 1989) (Chauvin, 1990) Weigend et al. 1990) Yu and Simmons, 1990) by modifying learning algorithms. We will briefly discuss these in chapter 4. 2.4.5 Connectionist modeling and generalization Within the connectionist cognitive modeling paradigm, generalization has, surprisingly, received little ....

.... a weight cost term tending to penalize large weights is added in the error function on which gradient descent is performed, Hanson and Pratt, 1989) Chauvin, 1990) Weigend et al. 1990) unit removal, where a measure of unit relevance is assessed allowing irrelevant ones to be discarded, (Mozer and Smolensky, 1989), and network construction algorithms, Mezard and Nadal, 1989) Sirat and Nadal, 1990) Marchand et al. 1990) Frean, 1990) Sankar and Mammone, 1991) where networks are constructed with optimization in mind. Since our purpose was not to study how networks with specialized architectures could ....

Michael C. Mozer and Paul Smolensky. Using relevance to reduce network size automatically. Connection Science, 1:3--16, 1989.


Pruning of Neural Networks - Thimm, Fiesler (1997)   (Correct)

.... with having in mind that setting an additional strength of zero is equivalent to removing it, those units are removed for which the derivative of the error function to these 4 IDIAP RR 97 03 additional strengths are small (actually, they use an exponentially decaying average of these values) [9]. 5. W. Finnoff et al. define a test statistic based on the probability that a weight becomes zero. A connection is removed if the probability that it becomes zero is high. This sensitivity measure is integrated into a pruning method called autoprune [10] This pruning method is extended to the ....

Michael C. Mozer and Paul Smolensky, "Using relevance to reduce network size automatically", Connection Science, vol. 1, no. 1, pp. 3--16, 1989.


Dynamic Learning Algorithms - Waugh (1994)   (Correct)

.... to the point of being computationally intractable but it has reasonable success in removing extra connections and retraining existing ones. Connections may also be pruned by measuring the sensitivity of the network to their removal. Skeletonization was introduced by Mozer and Smolensky [28, 29] in a technique which removes nodes by assessing their relevance during the training process. This process can be extended for the removal of connections as it estimates the error after removing a single connection, and hence removing nodes. Karnin [20] notes that Mozer and Smolensky s sensitivity ....

....and the network error. The method allows for the alteration of the weight cost function so that smaller weights or larger weights become relatively expensive. They repeat their work in Weigend, Rumelhart et al. [47, 48] Not everyone is in favour of these penalty term methods. Mozer and Smolensky [29] state: our impression is that it is a tricky matter to balance a primary and secondary error term against one another. In our experience, it is often impossible to avoid local minima compromise solutions that partially satisfy each of the error terms. Karnin also notes that ....

[Article contains additional citation context not shown here]

Mozer, M C and Smolensky, P. (1989) Using relevance to reduce network size automatically. Connection Science 1(1): 3 - 16.


Pruned Neural Networks for Regression - Setiono, Leow (2000)   (Correct)

.... an appropriate number of hidden units, constructive algorithms start with a few hidden units and add more units as needed to improve network accuracy [1, 8, 14] Destructive algorithms, on the other hand, start with a large number of hidden units and remove those that are found to be redundant [11]. The number of useful input units correspond to the number of relevant input attributes of the data. Typical algorithms usually start by assigning one input unit to each attribute, train the network with all input attributes and then remove network input units that correspond to irrelevant data ....

Mozer, M.C. and Smolensky, P. (1989) Using relevance to reduce network size automatically. Connection Science, 1 (1), 3-16.


Speed Up Learning and Network Optimization With Extended.. - Sperduti, Starita (1992)   (26 citations)  (Correct)

....developed. Optimization of network architecture is very important both to reduce the computational load and to improve generalization. Two different approaches have been proposed in literature to achieve this aim. In the pruning approach, a network is trained and then analyzed by unit activity (Mozer Smolensky, 1989) or a relevance measure over weights (Karnin, 1990) Non relevant weights or units are subsequently removed by an iteration process which interleaves removing of weights units with learning to recover the network damage. The other approach tries to force out pointless connections from the final ....

Mozer, M. C., & Smolensky P. (1989). Using Relevance to Reduce Network Size Automatically. Connection Science, 1, 3-16.


Partial Retraining: A New Approach to Input Relevance Determination - Laar (1999)   (Correct)

....section 4. In section 5, we will compare the various algorithms on a set of artificial and real world problems. Our conclusions and some discussion can be found in section 6. 2 Relevance 2. 1 How can one define relevance Many closely related definitions of relevance have been proposed (see e.g. [1, 2, 3]) Here we adopt the very general definition of the relevance R i of variable i as the difference in performance on a Real World Computing Partnership y Foundation for Neural Networks task with and without input variable i, given all other input variables: R i = P Gamma P f Gammaig , 1) ....

....the relevance of a single variable is thus (almost) equal to the time needed to process a dataset by the neural network. The data modification algorithms can be separated into three different groups. Constant substitution substitutes a constant value for the input variable under investigation [2, 3, 5, 6, 7, 8], translation factor modifies the data by translation [7, 9, 10] and data permutation permutes the data of input variable i across patterns [11] 3.2 Missing values The following algorithms treat the removed input variable as a missing value and approximate the performance without an input ....

[Article contains additional citation context not shown here]

Michael C. Mozer and Paul Smolensky. Using relevance to reduce network size automatically. Connection Science, 1(1):3--16, 1989.


Automated Learning for Reducing the Configuration of a.. - Teng, Wah   (Correct)

....the network structure once learning fails in a small network. These methods include destructive, constructive, genetic algorithm and pattern classification approaches [34] Destructive or pruning methods start from a fairly large network and dynamically remove unimportant connections or units [21, 14, 3], whereas constructive or growth methods start from a small network and dynamically grow the network. Since the latter usually require less computations, extensive research has been carried out in this area [1, 4, 8, 18, 17, 22, 12, 16, 15, 7, 13] Another class of dynamic multilayer perceptrons ....

M. C. Moze and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1(1):3--16, 1989.


Unifying Empirical and Explanation-Based Learning by Modeling the .. - Holder   (Correct)

....holds 5 , then as the number of cycles increases, so does the degree (complexity) of the hypothesis learned by the network. Therefore, roughly similar behavior to that of Figure 9 will exist if the number of cycles replaces k along the amount of learned knowledge axis. 5 Observations by Mozer and Smolensky [1989] support a similar interpretation, but more experimentation is necessary to confirm the reason for this behavior. 7.2 ANALYTICAL LEARNING An analytical learner is similar to an empirical learner in that both seek a concept that maximizes performance. The concept sought by an analytical learner ....

M. C. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1(1):3--16, 1989.


Connectionist, Statistical and Symbolic Approaches to.. - Wermter, Riloff, Scheler (1996)   (Correct)

....learning strategies used and due to the focus on spoken language it was placed in the group of hybrid approaches for spoken language. paper about connectionism in general one could start with [25, 51] Architectural issues of connectionist and hybrid connectionist systems are discussed in [68, 22, 23, 40, 61, 3, 21, 52, 19, 56, 80]. Some representative references for semantic and syntactic analysis with connectionist networks can be found in [38, 50, 60, 75, 70, 79] For references on cognitively oriented connectionist natural language processing some references are [14, 78, 69, 42, 12] 3 Statistical Approaches 3.1 ....

M. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1 (1):3--16, 1989.


Input Selection with Partial Retraining - Laar, Gielen, Heskes (1997)   (Correct)

....irrelevant variables and the (possibly) abundant parameters tends to degrade generalization. Secondly, resources are wasted by measuring irrelevant variables. And finally, a model with irrelevant variables and parameters is more difficult to understand. Architecture selection algorithms (see e.g. [1,5,6,9]) try to remove irrelevant parameters and or input variables and, consequently, save resources, improve generalization, and yield architectures which are easier to interpret. In this article, we will describe a new algorithm to perform input selection, a subproblem of architecture selection, which ....

M. C. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1(1):3--16, 1989.


Optimization of Neural Network Topology and. . . - al.   (Correct)

....size and the complexity of neural network applications has grown rapidly. The search for small networks with large information content and generalization capability is ongoing. Most of the optimization strategies are a trade off between error and network complexity. The known optimization schemes[1,2,3] have used this trade off to minimize the cost function. Among various complexity measures, Vapnic Chervonenkis (VC) dimensionality [4] concentrates on information content and distribution of information in the network. The error term associated with increasing VC dimension can be reduced by ....

M. C. Mozer and P. Smolensky, "Using relevance to reduce network size automatically", Connection Science, 1, pp. 3-16, 1989.


On the Combination of Supervised and Unsupervised Learning - Intrator   (Correct)

....that are based on general principles or assumptions on the functional form of the desired estimator. They do not depend directly on the (unknown) data distribution. In the neural network framework they include methods such as weight decay and magnitude control of the weights (Plaut et al. 1986; Mozer and Smolensky, 1989) network pruning via weight elimination based on a simple threshold (Le Cun et al. 1990; Weigend et al. 1991) or based on the Hessian matrix (Hassibi and Stork, 1993) A different approach for reducing the effective number of weights is weight sharing, in which a single weight is shared among ....

Mozer, M. C. and Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1(1):3--16.


How Hybrid Should a Hybrid Model Be? - Richard Cooper (1994)   (1 citation)  (Correct)

....it to the connectionist component, or vice versa, independently of the task. The general implementation relation (as in Touretzky Hinton s connectionist production system) is an example of a system dependent constraint flowing from the symbolic to the connectionist, whereas skeletonisation (Mozer Smolensky, 1989), whereby symbolic behaviour may be derived from a connectionist network via pruning of irrelevant nodes, is an example of such a constraint flowing in the opposite direction. 3 System Dependent Constraints Consider a non physically hybrid model. What kinds of relation might there be between the ....

....system, and not simply semantic content. The force of this claim can best be seen if we examine several cases of hybrid models, including some representing different directions of flow of the system dependent constraint which justifies the hybridness. We first look at the skeletonisation technique (Mozer Smolensky, 1989; Clark Karmiloff Smith, 1993) The starting point here is a fully trained multi layer feedforward distributed network performing a given function from which characteristic connectionist properties are transferred to a symbolically perspicuous system. Mozer Smolensky propose a technique for ....

Mozer, M. C. & Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1, 3--16.


Combining Exploratory Projection Pursuit And Projection Pursuit.. - Intrator (1992)   (8 citations)  (Correct)

.... 1990; Geman et al. 1992, for discussion) This can be done using some form of complexity regularization (Barron and Barron, 1988; Barron, 1989; White, 1990; Moody, 1991) or by weight elimination penalties which aim to reduce the effective number of parameters in the model (Plaut et al. 1986; Mozer and Smolensky, 1989; Le Cun et al. 1990; Weigend et al. 1991) The performance of the network is measured using a loss criterion, e.g. mean squared error between the output and the target of the network (the class label) The estimation of the weights is done by minimizing the empirical average of the error via ....

Mozer, M. C. and Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1(1):3--16.


Avoiding Overfitting with BP-SOM - Weijters, van den Herik, van den.. (1997)   (Correct)

.... two types of methods of avoiding overfitting (or regularisation) can be distinguished: i) starting with an undersized network and gradually increasing the network s complexity (Fahlman and Lebi ere, 1990) and (ii) starting with an oversized network and gradually decreasing its complexity (e.g. Mozer and Smolensky, 1989; Le Cun, Denker, and Solla, 1990; Weigend, Rumelhart, and Huberman, 1991; Hassibi, Stork, and Wolff, 1992; Prechelt, 1994; Weigend, 1994) In this paper we analyse the overfitting avoidance behaviour of a novel artificial neural network architecture (bp som, Weijters, 1995) which belongs to the ....

M. C. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1:3--16, 1989.


Connectionist Theory Refinement: Genetically Searching the.. - Opitz, al. (1997)   (20 citations)  (Correct)

....Solla, 1989) When trying to find an appropriate topology, one approach is to construct or modify a topology in an incremental fashion. Network shrinking algorithms start with too many parameters, then remove nodes and weights during training (Hassibi Stork, 1992; Le Cun, Denker, Solla, 1989; Mozer Smolensky, 1989). Network growing algorithms, on the other hand, start with too few parameters, then add more nodes and weights during training (Blanziere Katenkamp, 1996; Fahlman Lebiere, 1989; Frean, 1990) The most obvious difference between Regent and these algorithms is that Regent uses domain knowledge ....

Mozer, M. C., & Smolensky, P. (1989). Using relevance to reduce network size automatically.


Redescription, Information And Access - Dartnall   (Correct)

....be What do we need to add to a connectionist 4 TERRY DARTNALL system, or how do we amplify it, to enable it to access its implicit knowledge Clark and Karmiloff Smith consider some possibilities. I will look at one of them, skeletonization, since it illustrates the problem. Skeletonization (Mozer and Smolensky, 1989) takes a fully trained network and deletes the least relevant units. One outcome is that this improves the network s ability to generalise, but the net loses its ability to deal with every nuance of the training set. Applying this to the RRH, a duplicate of the network would be skeletonized. This ....

Mozer, M. & Smolensky, P.: 1989, Using relevance to reduce network size automatically, Connection Science, 1: 1, pp. 3--17.


Evaluating Pruning Methods - Thimm, Fiesler (1995)   (2 citations)  (Correct)

....by monitoring the sum of all weights changes during training [Karnin 90] 3) The weight removal method of M. C. Mozer and P. Smolensky estimates the error induced by the removal of a connection based on a manipulation of the objective function (the function to be minimized by backpropagation) [Mozer 89]. 4) W. Finnoff et al. use a test statistic for the probability that a weight becomes zero, which is used in a pruning algorithm called autoprune [Finnoff 93] Minimal Network Size and Generalization The pruning of a neural network is usually motivated by two aims: to obtain networks of a small ....

M. C. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, vol. 1, num. 1, pp. 3--16, 1989.


An Efficient Sequential Learning Algorithm for.. - Huang.. (2004)   (Correct)

No context found.

M. C. Mozer and P. Smolensky, "Using relevance to reduce network size automatically," Connection Sci., vol. 1, no. 1, pp. 3--16, 1989.


Extraction of Rules from Artificial Neural Networks for.. - Setiono, Leow, Zurada (2002)   (3 citations)  (Correct)

No context found.

M. C. Mozer and P. Smolensky, "Using relevance to reduce network size automatically," Connection Sci., vol. 1, no. 1, pp. 3--16, 1989.


Convergence Rate of Minimization Learning for Neural Networks - Mohamed, Minamoto, Niijima (1997)   (Correct)

No context found.

Moze, M.C., and Smolensky, P.: Using relevance to reduce network size automatically. Connect. Sci. 1(1)(1989) 3--16


Convergence Rate of Minimization Learning for Neural Networks - Marghny Mohamed (1998)   (Correct)

No context found.

Moze, M.C., and Smolensky, P.: Using relevance to reduce network size automatically. Connect. Sci. 1(1)(1989) 3--16

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC