Results 1  10
of
4,687
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 1533 (84 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify
Greedy layerwise training of deep networks
, 2006
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 394 (48 self)
 Add to MetaCart
linearities allowing them to compactly represent highly nonlinear and highlyvarying functions. However, until recently it was not clear how to train such deep networks, since gradientbased optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently
A VARIABLE GRADIENT RARE EARTH PERMANENT ALPHAMAGNET1
"... We have developed a Rare Earth Permanent Magnet (REPM) αmagnet that is installed at the exit of a 1.6 MeV RadioFrequency (RF) electron gun, which, together with the injector accelerating structure, matches the longitudinal phase space to our 35 MeV RaceTrack Microtron (RTM) acceptance. The magnet, ..."
Abstract
 Add to MetaCart
, two movable layers of NdFeB elements, can achieve gradients of 3.5 to 4.5 T/m with good field linearity. We have installed a cooled diaphragm in the αmagnet vacuum chamber so it can be placed at the maximally dispersed beam to obtain short gun beam bunches with small energy spread. Here we describe
Variable iterative methods for nonsymmetric systems of linear equations
 SIAM J. Numer. Anal
, 1983
"... Abstract. We consider a class of iterative algorithms for solving systems of linear equations where the coefficient matrix is nonsymmetric with positivedefinite symmetric part. The algorithms are modelled after the conjugate gradient method, and are well suited for large sparse systems. They do not ..."
Abstract

Cited by 241 (5 self)
 Add to MetaCart
Abstract. We consider a class of iterative algorithms for solving systems of linear equations where the coefficient matrix is nonsymmetric with positivedefinite symmetric part. The algorithms are modelled after the conjugate gradient method, and are well suited for large sparse systems. They do
A Comparison of Algorithms for Maximum Entropy Parameter Estimation
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract

Cited by 290 (2 self)
 Add to MetaCart
parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison
Fast human detection using a cascade of histograms of oriented gradients
 In CVPR06
, 2006
"... We integrate the cascadeofrejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variablesize blocks that capture salient features of humans automatically. Using AdaBoost for fe ..."
Abstract

Cited by 226 (0 self)
 Add to MetaCart
We integrate the cascadeofrejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variablesize blocks that capture salient features of humans automatically. Using Ada
Adaptive Probabilistic Networks with Hidden Variables
 Machine Learning
, 1997
"... . Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. In this paper, we investigate the problem ..."
Abstract

Cited by 176 (9 self)
 Add to MetaCart
the problem of learning probabilistic networks with known structure and hidden variables. This is an important problem, because structure is much easier to elicit from experts than numbers, and the world is rarely fully observable. We present a gradientbased algorithmand show that the gradient can
A Nonlinear PrimalDual Method For Total VariationBased Image Restoration
, 1995
"... . We present a new method for solving total variation (TV) minimization problems in image restoration. The main idea is to remove some of the singularity caused by the nondifferentiability of the quantity jruj in the definition of the TVnorm before we apply a linearization technique such as Newton ..."
Abstract

Cited by 232 (22 self)
 Add to MetaCart
such as Newton's method. This is accomplished by introducing an additional variable for the flux quantity appearing in the gradient of the objective function. Our method can be viewed as a primaldual method as proposed by Conn and Overton [8] and Andersen [3] for the minimization of a sum of Euclidean
Hogwild!: A lockfree approach to parallelizing stochastic gradient descent
, 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work a ..."
Abstract

Cited by 161 (9 self)
 Add to MetaCart
the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude. 1
An Accelerated Gradient Method for Trace Norm Minimization
"... We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multitask learning, matrix classification, and matrix completion. The standard semidefinite programming formulati ..."
Abstract

Cited by 111 (7 self)
 Add to MetaCart
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multitask learning, matrix classification, and matrix completion. The standard semidefinite programming
Results 1  10
of
4,687