Results 1  10
of
32
Hoeffding and bernstein races for selecting policies in evolutionary direct policy search
 In Proceedings of the 26 th International Conference on Machine Learning (ICML
, 2009
"... Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several rollouts for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMAES, a variable me ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
(Show Context)
Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several rollouts for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMAES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling adjusts individually the number of episodes considered for the evaluation of a policy. The performance estimation is kept just accurate enough for a sufficiently good ranking of candidate policies, which is in turn sufficient for the CMAES to find better solutions. This increases the learning speed as well as the robustness of the algorithm. 1.
InformationGeometric Optimization Algorithms: A Unifying Picture via Invariance Principles
, 2011
"... We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuoustime blackbox optimization method on X, the informationgeometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitr ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuoustime blackbox optimization method on X, the informationgeometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitrary choices to a minimum. The resulting method conducts a natural gradient ascent using an adaptive, timedependent transformation of the objective function, and makes no particular assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. The crossentropy method is recovered in a particular case with a large time step, and can be extended into a smoothed, parametrizationindependent maximum likelihood update. When applied to specific families of distributions on discrete or continuous spaces, the IGO framework allows to naturally recover versions
An efficient algorithm for computing hypervolume contributions
 Evolutionary Computation
"... The hypervolume indicator serves as a sorting criterion in many recent multiobjective evolutionary algorithms (MOEAs). Typical algorithms remove the solution with the smallest loss with respect to the dominated hypervolume from the population. We present a new algorithm which determines for a popul ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
The hypervolume indicator serves as a sorting criterion in many recent multiobjective evolutionary algorithms (MOEAs). Typical algorithms remove the solution with the smallest loss with respect to the dominated hypervolume from the population. We present a new algorithm which determines for a population of size n with d objectives, a solution with minimal hypervolume contribution in time O(n d/2 log n) for d> 2. This improves all previously published algorithms by a factor of n for all d> 3 and by a factor of √ n for d = 3. We also analyze hypervolume indicator based optimization algorithms which remove λ> 1 solutions from a population of size n = µ + λ. We show that there are populations such that the hypervolume contribution of iteratively chosen λ solutions is much larger than the hypervolume contribution of an optimal set of λ solutions. Selecting the optimal set of λ solutions implies calculating () n conventional hypervolume conµ tributions, which is considered to be computationally too expensive. We present the first hypervolume algorithm which calculates directly the contribution of every set of λ solutions. This gives an additive term of () n in the runtime of the calculation inµ stead of a multiplicative factor of ()
Maximum Likelihood Model Selection for 1Norm Soft Margin SVMs with Multiple Parameters
, 2010
"... Adapting the hyperparameters of support vector machines (SVMs) is a challenging model selection problem, especially when flexible kernels are to be adapted and data are scarce. We present a coherent framework for regularized model selection of 1norm soft margin SVMs for binary classification. It is ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Adapting the hyperparameters of support vector machines (SVMs) is a challenging model selection problem, especially when flexible kernels are to be adapted and data are scarce. We present a coherent framework for regularized model selection of 1norm soft margin SVMs for binary classification. It is proposed to use gradientascent on a likelihood function of the hyperparameters. The likelihood function is based on logistic regression for robustly estimating the class conditional probabilities and can be computed efficiently. Overfitting is an important issue in SVM model selection and can be addressed in our framework by incorporating suitable prior distributions over the hyperparameters. We show empirically that gradientbased optimization of the likelihood function is able to adapt multiple kernel parameters and leads to better models than four concurrent stateoftheart methods.
Active covariance matrix adaptation for the (1+1) CMAES
, 2010
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Tight bounds for the approximation ratio of the hypervolume indicator
 In Proc. 11th International Conference 29 Problem Solving from Nature (PPSN XI), volume 6238 of LNCS
, 2010
"... Abstract The hypervolume indicator is widely used to guide the search and to evaluate the performance of evolutionary multiobjective optimization algorithms. It measures the volume of the dominated portion of the objective space which is considered to give a good approximation of the Pareto front. ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Abstract The hypervolume indicator is widely used to guide the search and to evaluate the performance of evolutionary multiobjective optimization algorithms. It measures the volume of the dominated portion of the objective space which is considered to give a good approximation of the Pareto front. There is surprisingly little theoretically known about the quality of this approximation. We examine the multiplicative approximation ratio achieved by twodimensional sets maximizing the hypervolume indicator and prove that it deviates significantly from the optimal approximation ratio. This provable gap is even exponential in the ratio between the largest and the smallest value of the front. We also examine the additive approximation ratio of the hypervolume indicator and prove that it achieves the optimal additive approximation ratio apart from a small factor � n/(n − 2), where n is the size of the population. Hence the hypervolume indicator can be used to achieve a very good additive but not a good multiplicative approximation of a Pareto front. 1
Approximation Quality of the Hypervolume Indicator
, 2012
"... In order to allow a comparison of (otherwise incomparable) sets, many evolutionary multiobjective optimizers use indicator functions to guide the search and to evaluate the performance of search algorithms. The most widely used indicator is the hypervolume indicator. It measures the volume of the do ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
In order to allow a comparison of (otherwise incomparable) sets, many evolutionary multiobjective optimizers use indicator functions to guide the search and to evaluate the performance of search algorithms. The most widely used indicator is the hypervolume indicator. It measures the volume of the dominated portion of the objective space bounded from below by a reference point. Though the hypervolume indicator is very popular, it has not been shown that maximizing the hypervolume indicator of sets of bounded size is indeed equivalent to the overall objective of finding a good approximation of the Pareto front. To address this question, we compare the optimal approximation ratio with the approximation ratio achieved by twodimensional sets maximizing the hypervolume indicator. We bound the optimal multiplicative approximation ratio of n points by 1+Θ(1/n) for arbitrary Pareto fronts. Furthermore, we prove that the same asymptotic approximation ratio is achieved by sets of n points that maximize the hypervolume indicator. However, there is a provable gap between the two approximation ratios which is even exponential in the ratio between the largest and the smallest value of the front. We also examine the additive approximation ratio of the hypervolume indicator in two dimensions and prove that it achieves the optimal additive approximation ratio apart from a small ratio � n/(n−2), where n is the size of the population. Hence the hypervolume indicator can be used to achieve a good additive but not a good multiplicative approximation of a Pareto front. This motivates the introduction of a “logarithmic hypervolume indicator ” which provably achieves a good multiplicative approximation ratio.
Convergence of HypervolumeBased Archiving Algorithms II: Competitiveness
"... We study the convergence behavior of (µ + λ)archiving algorithms. A (µ + λ)archiving algorithm defines how to choose in each generationµchildren fromµparents andλoffspring together. Archiving algorithms have to choose individuals online without knowing future offspring. Previous studies assumed th ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We study the convergence behavior of (µ + λ)archiving algorithms. A (µ + λ)archiving algorithm defines how to choose in each generationµchildren fromµparents andλoffspring together. Archiving algorithms have to choose individuals online without knowing future offspring. Previous studies assumed the offspring generation to be bestcase. We assume the initial population and the offspring generation to be worstcase and use the competitive ratio to measure how much smaller hypervolumes an archiving algorithm finds due to not knowing the future in advance. We prove that all archiving algorithms which increase the hypervolume in each step (if they can) are only µcompetitive. We also present a new archiving algorithm which is (4+2/µ)competitive. This algorithm not only achieves a constant competitive ratio, but is also efficiently computable. Both properties provably do not hold for the commonly used greedy archiving algorithms, for example those used in SIBEA, SMSEMOA, or the generational MOCMAES.
Characterizing Reinforcement Learning Methods through Parameterized Learning Problems
, 2011
"... The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. When we cannot achieve optimality, the performance of RL algorithms must be measured empirically. Consequently, in order to meaningfully differentiate learning methods, it becomes necessary to characterize their performance on different problems, taking into account factors such as state estimation, exploration, function approximation, and constraints on computation and memory. To this end, we propose parameterized learning problems, in which such factors can be controlled systematically and their effects on learning methods characterized through targeted studies. Apart from providing very precise control of the parameters that affect learning, our parameterized learning problems enable benchmarking against optimal behavior; their relatively small sizes facilitate extensive experimentation. Based on a survey of existing RL applications, in this article, we focus our attention on two predominant, “first order ” factors: partial observability and function approximation. We design