Results 1  10
of
12
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
"... We consider a global optimization problem of a deterministic functionf in a semimetric space, given a finite budget ofnevaluations. The functionf is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic explorat ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
We consider a global optimization problem of a deterministic functionf in a semimetric space, given a finite budget ofnevaluations. The functionf is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a finitesample performance bound in terms of a measure of the quantity of nearoptimal states. We then define a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimallyfitted. 1
PAC Optimal Exploration in Continuous Space Markov Decision Processes
"... Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as ɛgreedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of finetu ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as ɛgreedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of finetuning to produce good results. PAC optimal exploration algorithms, on the other hand, offer strong theoretical guarantees but are inapplicable in domains of realistic size. The goal of this paper is to bridge the gap between theory and practice, by introducing CPACE, an algorithm which offers strong theoretical guarantees and can be applied to interesting, continuous space problems. 1 Introduction and
Optimistic planning for beliefaugmented Markov decision processes
 In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL
, 2013
"... Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition mode ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the beliefaugmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. I.
F.: Optimized lookahead tree policies: a bridge between lookahead tree policies and direct policy search
 International Journal of Adaptive Control and Signal Processing
, 2013
"... ar ..."
(Show Context)
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
"... We consider a global optimization problem of a deterministic function f in a semimetric space, given a finite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic exp ..."
Abstract
 Add to MetaCart
We consider a global optimization problem of a deterministic function f in a semimetric space, given a finite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a finitesample performance bound in terms of a measure of the quantity of nearoptimal states. We then define a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimallyfitted. 1
Team SequeL
"... We describe an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where the most promising st ..."
Abstract
 Add to MetaCart
We describe an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where the most promising states are expanded first, in an optimistic procedure aiming to return a good action after a strictly limited number of expansions. The novel algorithm is called optimistic planning for sparsely stochastic systems.
CHAPTER 22 OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES
"... We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the be ..."
Abstract
 Add to MetaCart
(Show Context)
We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall recedinghorizon algorithm results, which can also be seen as a type of modelpredictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value – or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example. Optimistic planning in Markov decision processes. By Buşoniu, Munos, and Babuška
An analysis of optimistic, bestfirst search for minimax sequential decision making
"... Abstract—We consider problems in which a maximizer and a minimizer agent take actions in turn, such as games or optimal control with uncertainty modeled as an opponent. We extend the ideas of optimistic optimization to this setting, obtaining a search algorithm that has been previously considered as ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We consider problems in which a maximizer and a minimizer agent take actions in turn, such as games or optimal control with uncertainty modeled as an opponent. We extend the ideas of optimistic optimization to this setting, obtaining a search algorithm that has been previously considered as the bestfirst search variant of the B * method. We provide a novel analysis of the algorithm relying on a certain structure for the values of action sequences, under which earlier actions are more important than later ones. An asymptotic branching factor is defined as a measure of problem complexity, and it is used to characterize the relationship between computation invested and nearoptimality. In particular, when action importance decreases exponentially, convergence rates are obtained. Throughout, examples illustrate analytical concepts such as the branching factor. In an empirical study, we compare the optimistic bestfirst algorithm with two classical game tree search methods, and apply it to a challenging HIV infection control problem. I.
Team SequeL
, 2012
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Optimistic planning for Markov decision processes
Optimistic Optimization of a Deterministic Function
, 2013
"... optimization of deterministic functions ..."