| M. Mozer and P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D. Touretzky, editor, Advances in Neural Information Processing Systems 1, pages 107--115. Morgan Kaufmann, San Mateo, CA, 1989. |
....senses (see for example [31) The second approach is much more studied and consists of finding a subset of network synaptic weights that, when set to zero, lead to the smallest increase of an error measure at the output. Several methods has been proposed in the last years in particular for the MLP [4,5,6,7,8]. However they can be slow, requiring periodic retraining of the network, and often require a supervisor , a non local algorithm which performs to pruning task. Moreover, some of them can require a difficult fine tuning of the pruning coefficients in order to work correctly. Therefore it is ....
M.C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessmenff, in Advances in Neural Information Proc. 1, D.S. Tourezky, Ed. Morgan Kaufman, 1989, pp. 107-115.
....(see for example [3] The second approach is much more studied and consists of finding a subset of network synaptic weights that, when set to zero, lead to the smallest increase of an error measure at the output. Several methods has been proposed in the last years in particular for the MLP [4,5,6,7,8]. However they can be slow, requiring periodic retraining of the network, and often require a supervisor , a nonlocal algorithm which performs the pruning task. Moreover, some of them can require a difficult finetuning of the pruning coefficients in order to work correctly. Therefore it is ....
M.C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment", in Advances in Neural Information Proc. 1, D.S. Tourezky, Ed. Morgan Kaufman, 1989, pp. 107-115.
....presentation of the entire training data. The idea of weight decay is experimented in [6] where the synaptic weights with low influence on the output error during learning are pruned off. A sensitivity measure of the error function to the elimination of a unit or a synaptic weight is proposed in [7,8], and a posteriori used to reduce the size of the network. A similar pruning procedure, based on the second derivative information, is reported in [9] The proposed methods can be slow, requiring periodic retraining of the network, and often require a supervisor , a non local algorithm which ....
M.C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment", in Advances in Neural Information Processing 1, D.S. Tourezky, Ed. Morgan Kaufman, 1989.
....requires to determine the relative weighting of this term with respect to the primary error term. These weighting may have to be adjusted over the learning phase; moreover, local minima correspond to compromise solutions that partially satisfy each of the error terms [7] Mozer and Smolensky [10] have proposed a different approach to prune off the units in a BP network which is not a gradient descent procedure. In particular, the idea of estimating the sensitivity of the error function to the elimination of each unit was introduced and a linear error function was used for this instead of ....
M.C. MOZER, P. SMOLENSKY, "Skeletonization: A technique for trimming the fat from a network via relevance assessment", in Advances in Neural Information Proc. 1, D.S. Tourezky, Ed. Morgan Kaufman, 1989, pp. 107-115.
....much less computational resources. Recent studies on finding a network of an appropriate size can be classified into three groups. The first group deals with network pruning; an approach to train a large, redundant network and to remove (prune) the unimportant hidden units to obtain a smaller set [2 5,9,12]. This scheme has several merits. A larger network learns with less training epochs (1 epoch = a sweep through all the training sets) compared with a smaller one, and in some problems, the pruning procedure can produce a small trained network that scarcely ever acquire the key feature of the ....
....of hidden layer units. One way to obtain a network of appropriate size is to initially train a large and overspecular network, and to remove the unimportant units later. This method is called network pruning. Various pruning methods for layered networks has been reported. Mozer and Smolensky [2], and Karnin [3] have tried to detect the omittable connections by calculating the sensitivity of the error function to the connection removal Sij EP(wo =O) EP(wij w ) 9) p=l where w is the link weight in the redundant network gained by training. On pruning, the link (or the unit) with ....
M.C. Mozer and P. Smolensky, Skeletonization: a technique for trimming the fat from a network via relevance assessment, Advances in Neural Information Processing 1, D. S. Touretzky, Ed. Morgan Kaufmann, 1990, pp. 177-185.
....removing entire neurons from an ANN rather than just links. In the same way that links are selected for removal in the OBD approach, we can remove the neurons whose removal has the least impact on the performance of a given ANN. This approach is proposed and discussed by Mozer and Smolensky in [MS89] 3.6 Constructive Learning Methods So far, I have only discussed ANN learning restricted to what I have defined as training, that is, given an ANN with predetermined links, the weights on these links are determined. Numerous constructive learning methods have been suggested to construct and ....
M. Mozer and P. Smolensky. Skeletonization: a technique for trimming the fat from a network via relevant assessment. In D. Touretzky, editor, Advances in Neural Information Processing Systems 1, pages 107--115. Morgan Kaufmann, 1989.
....of the saliency, a measure of the importance of the weights, the nodes respectively. In each iteration some of the low saliency objects are deleted. Network pruning techniques include optimal brain damage [9] optimal brain surgeon [16] as well as the skeletonizing algorithm of Mozer and Smolensky [31]. Network growing algorithms adopt a bottom up approach: Starting from a small network, nodes are added to the network until a sufficiently small training error is reached. The most famous ones of these techniques is called cascade correlation [12] The term cascade correlation is derived from ....
M.C. Mozer and P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relvance assessment. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems, volume 1, pages 107-115, San Mateo, CA, 1989. Morgan Kaufmann.
....feature set. Several pruning procedures for neural networks have been proposed [23] but most of them focus on removing hidden nodes or connections, and they are not directly applicable to prune irrelevant input nodes. Pruning procedures extended to the removal of input nodes were proposed in [8,12,14,17,18,24], where the variable selection process is typically based on a measure of the relevance of an input node, so that the less relevant features are removed. However, most of these techniques evaluate the relevance of input nodes during the training process, thus they strictly depend on the adopted ....
M.C. Mozer, P. Smolensky, Skeletonization: A technique for trimming the fat from a network via relevance assessment, in: D.S. Touretzky (Ed.), Advances in Neural Information Processing Systems I, Morgan Kaufmann, San Mateo, CA, 1990.
....pruning means choosing input weights = f 1 ; n g which indicate whether input dimension i 2 f1; ng is important for the overall function or not. There exists a couple of pruning algorithms for supervised neural networks: weight decay, skeletonization, optimum brain damage, [9, 14]. Several ideas can be transferred to unsupervised methods used for function approximation directly after additional changes such as substituting the winner takes all function by a di erentiable version if necessary. One adaptation of these methods is presented in [12] Other possibilities for ....
M. Mozer and P. Smolensky, Skeletonization: a technique for trimming the fat from a network via relevant assessment, in D. Touretzky (ed.), Advances in NIPS 1, Morgan Kaufmann, pp.107-115, 1989.
....One involves the use of larger network architecture at the beginning and pruning it down to near optimum size. Learning algorithms using this general approach are called pruning. Examples include optimal brain damage[1] optimal brain surgeon[2] interactive pruning[3] and skeletonization[4]. See[5] and[6, chapter 13] for a good review of pruning algorithms. With the other approach, the training begins with a minimal network and ends with a satisfactory network size. The algorithms using this approach are referred to as growth or constructive methods. Examples include the ....
Mozer, M.C., and P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems (Denver,
....the error measure e. To improve the estimate for small errors the absolute difference between the desired y d and the actual output y is used instead of the squared error e = X jy d Gamma yj (6. 6) Feature selection is based on an iterative pruning utilising the relevance of each node [Mozer and Smolensky, 1989]. The relevance measure r is based on the difference of the error when it is present and when it is removed from the network, i.e. the feature column. The relevance is 70 not directly calculated, it is estimated via the gating term ff [Mozer and Smolensky, 1989] y j = ff j xw T (6.7) By setting ....
....utilising the relevance of each node [Mozer and Smolensky, 1989] The relevance measure r is based on the difference of the error when it is present and when it is removed from the network, i.e. the feature column. The relevance is 70 not directly calculated, it is estimated via the gating term ff [Mozer and Smolensky, 1989] y j = ff j xw T (6.7) By setting ff j = 0 the feature has no influence, whereas for ff j = 1 it is left in place r j = e ff j =0 Gamma e ff j =1 (6.8) The relevance r of a feature is approximated by r j = Gamma e ff fi fi fi fi ff j =1 (6.9) Gamma ff X k jww T ....
Mozer, C. M. and Smolensky, P. (1989). Skeletonization: A technique for trimming the fat from a network via relevance assessment. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages 107--115. Morgan Kaufman Publishers.
....approaches to sensitivity analysis which di er in the performance function used, i.e. with respect to the objective function to be minimized, and with respect to the NN output function. Objective function sensitivity analysis has been used widely in pruning of NN parameters [5] 6] 7] 8] [9], to develop more sophisticated optimization techniques [10] 11] 12] and to study the robustness and stability of NNs [13] 14] NN output sensitivity analysis, which is a study of the in uence of small parameter perturbations on the output of the NN, has been used to study the ....
MC Mozer, P Smolensky, Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment, DS Touretzky (ed), Advances in Neural Information Processing Systems, Vol 1, 1989, pp 107-115.
....of predictive features and utilize these to construct the estimator g, discarding irrelevant or redundant features. Removal of irrelevant or redundant features usually speeds the learning process, the constructed estimator usually generalizes better and often lends itself to easier interpretation [117]. In addition, a classification function utilizing a small number of original problem features is often requires fewer resources to evaluate. The feature selection problem consists of eliminating as many features from the original n dimensional feature space as possible, while still accurately ....
M. C. Mozer and P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems I, pages 107--115, San Mateo, CA, 1989. Morgan Kaufmann.
....techniques, the selection is based entirely on the significance of each parameter on the classification of the network, and so only those parameters which really do not affect the classification are discarded. Most of these methods require a considerable amount of time for the pruning process [MS89, CH96, CWC97] Treatment of missing data values Neural networks do not allow the possibility that some input is missing and so this situation must be handled separately, since it is not possible or even desirable to reject all the cases which have some missing values from the investigation. One ....
Mozer M C, Smolensky P: Skeletonization: a technique for trimming the fat from a network via relevance assessment, In: Rouretzky D. S. ed Advances in Neural Information Processing Systems 1, San Mateo, CA, Morgan Kaufmann, 1989, 107-115.
....we calculate the principal components of the set of residuals obtained by omitting successively one network cell. The vector representing the first principal component, may reveal which cell can be excluded from the network. As a cell pruning method, this approach is similar to the one proposed by Mozer and Smolensky (1989). However, our approach has the advantage that the quantities used are based on the outcome of only one optimization procedure with all variables included. A more elaborate exposure of the procedure mentioned above, can be found in Kaashoek and van Dijk (1998) The particular network, which ....
Mozer, M.C. & P. Smolensky, Skeletonization: a technique for trimming the fat from a network via relevance assessment, in D.S. Touretzky (Ed.) Advances in Neural Information Processing Systems, vol. 1, San Mateo, CA., 1989.
.... are two basic learning approaches: ffl Constructive learning: Start with a small network, add units and or weights if necessary during training [8] ffl Destructive learning: Start with a large network delete units and or delete decay weights that have little contribution to learning, e.g. 10] [29] [21] Some of the studies are done by , Marchand et al. 25] Frean [9] In a study by [42] the network is trimmed by removing unimportant weights and even units. One approach to get rid of unimportant weights is to add a new term representing network complexity to the cost function and let it ....
M.J. Mozer and P. Smolensky, "Skeletonization: A Technique For Trimming the Fat From a Network Via Relevant Assesment", Technical Report, CMU-CS-421-89, Univ. of Colorado, Boulder, (1989).
.... the relevance of a node is to consider how well the network would perform in the absense of that node, in particular what would be the change in the error following its elimination: DeltaE = Ewithout node Gamma Ewith node (42) To make this idea more mathematically precise we can introduce [98] an attentional coefficient ff j for node j which can have a value or 0 or 1. Thus if v j is the output of node j and it can feed node i (with output u i ) then: u i = f 0 X j W ij ff j v j 1 A (43) and the ff j gates the contribution of node j to node i. This arrangement is shown in ....
....input nodes in a forward direction. We can approximate this difference by the derivative of the error with respect to ff j : 38 DeltaE = E ff j fi fi fi fi fi ff j (45) which can be computed using an error back propagation algorithm similar to the method used in the Back Propagation [98, 62, 61]. This approximation saves on the computation without much loss of accuracy in assessing the attentional strength of each node. In practice this derivative fluctuates strongly and it is best to moderate it with a component of the previous DeltaE value during pruning: DeltaE t 1 j = ....
[Article contains additional citation context not shown here]
M.C. Mozer. Skeletonization: a technique for trimming the fat from a network via relevance assessment. Neural Information Processing Systems 1, pages 107--115, 1989. 51
No context found.
M. Mozer and P. Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D. Touretzky, editor, Advances in Neural Information Processing Systems 1, pages 107--115. Morgan Kaufmann, San Mateo, CA, 1989.
No context found.
Mozer, M. C. and Smolensky, P.: 1989, `Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment', in D. Touretzky (ed.), Advances in Neural Information Processing Systems 1, Morgan Kaufmann, San Monteo, California, pp. 107--115.
No context found.
M. C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment," in Advances in Neural Inform. Processing Syst. (NIPS 1988.
No context found.
M. C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment," Connection Sci., vol. 1, no. 1, pp. 3--26, 1989.
No context found.
M. C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment ", Advances in Neural Information Processing I, D. S. Touretzky, Ed, Morgan Kauffmann, pp. 107-115, 1989.
No context found.
Mozer, M., and Smolensky, P. (1989). Skeletonization: A Technique for Trimming the Fat from a Network Via Relevance Assessment. Advances in Neural Information Processing 1. D. S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann Publishers.
No context found.
M.C. Mozer and P. Smolensky, "Skeletonization: A technique for trimming the fat from a network via relevance assessment," in Advances in Neural Information Processing Systems, D. Touretzky, Ed., vol. 1. Morgan Kaufmann, San Mateo, CA, US, 1989.
No context found.
M. C. Mozer and P. Smolensky 1988, "Skeletonization: A technique for trimming the fat from a network via relevance assessment," in Advances in Neural Information Processing Systems 1 (Denver, 1988) ed. D. S. Touretzky, pp. 107--115.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC