Results 1 
5 of
5
ON PROVING LINEAR CONVERGENCE OF COMPARISONBASED STEPSIZE ADAPTIVE RANDOMIZED SEARCH ON SCALINGINVARIANT FUNCTIONS VIA STABILITY OF MARKOV CHAINS
"... Abstract. In the context of numerical optimization, this paper develops a methodology to analyze the linear convergence of comparisonbased stepsize adaptive randomized search (CBSARS), a class of probabilistic derivativefree optimization algorithms where the function is solely used through comp ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In the context of numerical optimization, this paper develops a methodology to analyze the linear convergence of comparisonbased stepsize adaptive randomized search (CBSARS), a class of probabilistic derivativefree optimization algorithms where the function is solely used through comparisons of candidate solutions. Various algorithms are included in the class of CBSARS algorithms. On the one hand, a few methods introduced already in the 60’s: the stepsize adaptive random search by Schumer and Steiglitz, the compound random search by Devroye and simplified versions of Matyas ’ random optimization algorithm or Kjellstrom and Taxen Gaussian adaptation. On the other hand, it includes simplified versions of several recent algorithms: the covariancematrixadaptation evolution strategy algorithm (CMAES), the exponential natural evolution strategy (xNES), or the cross entropy method. CBSARS algorithms typically exhibit several invariances. First of all, invariance to composing the objective function with a strictly monotonic transformation which is a direct consequence of the fact that the algorithms only use comparisons. Second, scale invariance that translates the fact that the algorithm has no intrinsic absolute notion of scale. The algorithms are investigated on scalinginvariant functions defined as functions that preserve
LMCMA: an Alternative to LBFGS for Large Scale Blackbox Optimization
"... The limited memory BFGS method (LBFGS) of Liu and Nocedal (1989) is often considered to be the method of choice for continuous optimization when first and/or second order information is available. However, the use of LBFGS can be complicated in a blackbox scenario where gradient information i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The limited memory BFGS method (LBFGS) of Liu and Nocedal (1989) is often considered to be the method of choice for continuous optimization when first and/or second order information is available. However, the use of LBFGS can be complicated in a blackbox scenario where gradient information is not available and therefore should be numerically estimated. The accuracy of this estimation, obtained by finite difference methods, is often problemdependent that may lead to premature convergence of the algorithm. In this paper, we demonstrate an alternative to LBFGS, the limited memory CovarianceMatrix Adaptation Evolution Strategy (LMCMA) proposed by Loshchilov (2014). The LMCMA is a stochastic derivativefree algorithm for numerical optimization of nonlinear, nonconvex optimization problems. Inspired by the LBFGS, the LMCMA samples candidate solutions according to a covariance matrix reproduced from m direction vectors selected during the optimization process. The decomposition of the covariance matrix into Cholesky factors allows to reduce the memory complexity to O(mn), where n is the number of decision variables. The time complexity of sampling one candidate solution is also O(mn), but scales as only about 25 scalarvector multiplications in practice. The algorithm has an important property of invariance w.r.t. strictly increasing transformations of the objective function, such transformations do not compromise its ability to approach the optimum. The LMCMA outperforms the original CMAES and its large scale versions on nonseparable illconditioned problems with a factor increasing with problem dimension. Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to LBFGS on nontrivial large scale smooth and nonsmooth optimization problems.
How to Center Deep Boltzmann Machines
, 2016
"... Abstract This work analyzes centered Restricted Boltzmann Machines (RBMs) and centered Deep Boltzmann Machines (DBMs), where centering is done by subtracting offset values from visible and hidden variables. We show analytically that (i) centered and normal Boltzmann Machines (BMs) and thus RBMs and ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This work analyzes centered Restricted Boltzmann Machines (RBMs) and centered Deep Boltzmann Machines (DBMs), where centering is done by subtracting offset values from visible and hidden variables. We show analytically that (i) centered and normal Boltzmann Machines (BMs) and thus RBMs and DBMs are different parameterizations of the same model class, such that any normal BM/RBM/DBM can be transformed to an equivalent centered BM/RBM/DBM and vice versa, and that this equivalence generalizes to artificial neural networks in general, (ii) the expected performance of centered binary BMs/RBMs/DBMs is invariant under simultaneous flip of data and offsets, for any offset value in the range of zero to one, (iii) centering can be reformulated as a different update rule for normal BMs/RBMs/DBMs, and (iv) using the enhanced gradient is equivalent to setting the offset values to the average over model and data mean. Furthermore, we present numerical simulations suggesting that (i) optimal generative performance is achieved by subtracting mean values from visible as well as hidden variables, (ii) centered binary RBMs/DBMs reach significantly higher loglikelihood values than normal binary RBMs/DBMs, (iii) centering variants whose offsets depend on the model mean, like the enhanced gradient, suffer from severe divergence problems, (iv) learning is stabilized if an exponentially moving average over the batch means is used for the offset values instead of the current batch mean, which also prevents the enhanced gradient from severe divergence, (v) on a similar level of loglikelihood values centered binary RBMs/DBMs have smaller weights and bigger bias parameters than normal binary RBMs/DBMs, (vi) centering leads to an update direction that is closer to the natural gradient, which is extremely efficient for training as we show for small binary RBMs, (vii) centering eliminates the need for greedy layerwise pretraining of DBMs, which often even deteriorates the results independently of whether centering is used or not, and (ix) centering is also beneficial for auto encoders.
CMAES: A Function Value Free Second Order Optimization Method
, 2015
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. PGMO COPI’14 manuscript No. (will be inserted by the editor)