Results 1  10
of
1,214
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 1000 (13 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such \TreeBoost" models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classication, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Frie...
Protein Secondary Structure Prediction . . .
 JO MOL B/O/. (1999) 292, 195202 .GG
, 1999
"... ..."
How learning can guide evolution.
 Complex Syst.
, 1987
"... Abstract. The assumption that acquired characteristics are not inherited is often taken to imply that the adaptations that an organism learns during its lifetime cannot guide the course of evolution. This inference is incorrect ..."
Abstract

Cited by 408 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The assumption that acquired characteristics are not inherited is often taken to imply that the adaptations that an organism learns during its lifetime cannot guide the course of evolution. This inference is incorrect
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 395 (32 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Distributed representations of words and phrases and their compositionality
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2013
"... ..."
Residual Algorithms: Reinforcement Learning with Function Approximation
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general functionapproximation system, such ..."
Abstract

Cited by 307 (6 self)
 Add to MetaCart
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general functionapproximation system, such as a sigmoidal multilayer perceptron, a radialbasisfunction system, a memorybased learning system, or even a linear functionapproximation system. A new class of algorithms, residual gradient algorithms, is proposed, which perform gradient descent on the mean squared Bellman residual, guaranteeing convergence. It is shown, however, that they may learn very slowly in some cases. A larger class of algorithms, residual algorithms, is proposed that has the guaranteed convergence of the residual gradient algorithms, yet can retain the fast learning speed of direct algorithms. In fact, both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is s...
Optimal Unsupervised Learning in a SingleLayer Linear Feedforward Neural Network
, 1989
"... A new approach to unsupervised learning in a singlelayer linear feedforward neural network is discussed. An optimality principle is proposed which is based upon preserving maximal information in the output units. An algorithm for unsupervised learning based upon a Hebbian learning rule, which achie ..."
Abstract

Cited by 293 (2 self)
 Add to MetaCart
A new approach to unsupervised learning in a singlelayer linear feedforward neural network is discussed. An optimality principle is proposed which is based upon preserving maximal information in the output units. An algorithm for unsupervised learning based upon a Hebbian learning rule, which achieves the desired optimality is presented, The algorithm finds the eigenvectors of the input correlation matrix, and it is proven to converge with probability one. An implementation which can train neural networks using only local "synaptic" modification rules is described. It is shown that the algorithm is closely related to algorithms in statistics (Factor Analysis and Principal Components Analysis) and neural networks (Selfsupervised Backpropagation, or the "encoder" problem). It thus provides an explanation of certain neural network behavior in terms of classical statistical techniques. Examples of the use of a linear network for solving image coding and texture segmentation problems are presented. Also, it is shown that the algorithm can be used to find "visual receptive fields" which are qualitatively similar to those found in primate retina and visual cortex.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
"... Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative ..."
Abstract

Cited by 272 (47 self)
 Add to MetaCart
(Show Context)
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feedforward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks with many hidden layers, that are trained using new methods have been shown to outperform Gaussian mixture models on a variety of speech recognition benchmarks, sometimes by a large margin. This paper provides an overview of this progress and represents the shared views of four research groups who have had recent successes in using deep neural networks for acoustic modeling in speech recognition. I.
ContextDependent Pretrained Deep Neural Networks for Large Vocabulary Speech Recognition
 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
, 2012
"... We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the DNN to pr ..."
Abstract

Cited by 254 (50 self)
 Add to MetaCart
We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pretraining algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CDDNNHMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CDDNNHMMs can significantly outperform the conventional contextdependent Gaussian mixture model (GMM)HMMs, with an absolute sentence accuracy improvement of 5.8 % and 9.2 % (or relative error reduction of 16.0 % and 23.2%) over the CDGMMHMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively.