Results 1  10
of
5,646
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 534 (17 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 1000 (13 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 542 (20 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a
Algorithms for Nonnegative Matrix Factorization
 In NIPS
, 2001
"... Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minim ..."
Abstract

Cited by 1246 (5 self)
 Add to MetaCart
. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.
Learning LongTerm Dependencies with Gradient Descent is Difficult
 TO APPEAR IN THE SPECIAL ISSUE ON RECURRENT NETWORKS OF THE IEEE TRANSACTIONS ON NEURAL NETWORKS
"... Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in th ..."
Abstract

Cited by 389 (37 self)
 Add to MetaCart
in the input/output sequences span long intervals. We showwhy gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a tradeoff between efficient learning by gradient descent and latching on information
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract

Cited by 156 (1 self)
 Add to MetaCart
algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING
, 2007
"... Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a spa ..."
Abstract

Cited by 539 (17 self)
 Add to MetaCart
sparsenessinducing (ℓ1) regularization term.Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution, and compressed sensing are a few wellknown examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound
A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm
 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS
, 1993
"... A new learning algorithm for multilayer feedforward networks, RPROP, is proposed. To overcome the inherent disadvantages of pure gradientdescent, RPROP performs a local adaptation of the weightupdates according to the behaviour of the errorfunction. In substantial difference to other adaptive tech ..."
Abstract

Cited by 938 (34 self)
 Add to MetaCart
A new learning algorithm for multilayer feedforward networks, RPROP, is proposed. To overcome the inherent disadvantages of pure gradientdescent, RPROP performs a local adaptation of the weightupdates according to the behaviour of the errorfunction. In substantial difference to other adaptive
SNOPT: An SQP Algorithm For LargeScale Constrained Optimization
, 2002
"... Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first deriv ..."
Abstract

Cited by 597 (24 self)
 Add to MetaCart
Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first
Coverage Control for Mobile Sensing Networks
, 2002
"... This paper presents control and coordination algorithms for groups of vehicles. The focus is on autonomous vehicle networks performing distributed sensing tasks where each vehicle plays the role of a mobile tunable sensor. The paper proposes gradient descent algorithms for a class of utility functio ..."
Abstract

Cited by 582 (49 self)
 Add to MetaCart
This paper presents control and coordination algorithms for groups of vehicles. The focus is on autonomous vehicle networks performing distributed sensing tasks where each vehicle plays the role of a mobile tunable sensor. The paper proposes gradient descent algorithms for a class of utility
Results 1  10
of
5,646