Results 1  10
of
171,225
Parallelized stochastic gradient descent
 Advances in Neural Information Processing Systems 23
, 2010
"... With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms
Stochastic Gradient Descent Tricks
"... Abstract. Chapter 1 strongly advocates the stochastic backpropagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. Chapter 1 strongly advocates the stochastic backpropagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when
Stochastic Gradient Descent on GPUs∗
"... Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in dataparallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scalefree graphs is challenging. This work examines ..."
Abstract
 Add to MetaCart
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in dataparallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scalefree graphs is challenging. This work
Beyond stochastic gradient descent
"... Machine learning for “big data” • Largescale machine learning: large p, large n, large k – p: dimension of each observation (input) – n: number of observations – k: number of tasks (dimension of outputs) ..."
Abstract
 Add to MetaCart
Machine learning for “big data” • Largescale machine learning: large p, large n, large k – p: dimension of each observation (input) – n: number of observations – k: number of tasks (dimension of outputs)
Conjugate Directions for Stochastic Gradient Descent
 In Dorronsoro (2002
, 2002
"... The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) sett ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online
SemiStochastic Gradient Descent Methods
, 2013
"... In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradien ..."
Abstract
 Add to MetaCart
In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic
Ant colony optimization and stochastic gradient descent
 Artificial Life
, 2002
"... In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation
Largescale machine learning with stochastic gradient descent
 in COMPSTAT
, 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract

Cited by 148 (1 self)
 Add to MetaCart
for the case of smallscale and largescale learning problems. The largescale case involves the computational complexity of the underlying optimization algorithm in nontrivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for largescale problems
Stochastic Gradient Descent Training of Ensembles
 in Proceedings of the ECCTD
, 2003
"... We show how to train Discrete Time Cellular Neural Networks (DTCNN) successfully by backpropagation to perform pattern recognition on a data set of handwritten digits. By using concepts and techniques from Machine Learning, we can outperform Support Vector Machines (SVM) on this problem. ..."
Abstract
 Add to MetaCart
We show how to train Discrete Time Cellular Neural Networks (DTCNN) successfully by backpropagation to perform pattern recognition on a data set of handwritten digits. By using concepts and techniques from Machine Learning, we can outperform Support Vector Machines (SVM) on this problem.
SemiStochasticGradientDescentMethods
"... Many problems in data science (e.g. machine learning, optimization and statistics) can be cast as loss minimization problems of the form min x∈Rd ..."
Abstract
 Add to MetaCart
Many problems in data science (e.g. machine learning, optimization and statistics) can be cast as loss minimization problems of the form min x∈Rd
Results 1  10
of
171,225