Results 1  10
of
15
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories
 ANN OPER RES
, 2012
"... ..."
Multistage stochastic programming: A scenario tree based approach to planning under uncertainty
 APPLICATIONS IN ARTIFICIAL INTELLIGENCE: CONCEPTS AND SOLUTIONS, CHAPTER 6
, 2011
"... In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty and stress its differences with Markov Decision Processes. We describe the main approximation technique used for solving problems formulated in the multistage stochastic progr ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty and stress its differences with Markov Decision Processes. We describe the main approximation technique used for solving problems formulated in the multistage stochastic programming framework, which is based on a discretization of the disturbance space. We explain that one issue of the approach is that the discretization scheme leads in practice to illposed problems, because the complexity of the numerical optimization algorithms used for computing the decisions restricts the number of samples and optimization variables that one can use for approximating expectations, and therefore makes the numerical solutions very sensitive to the parameters of the discretization. As the framework is weak in the absence of efficient tools for evaluating and eventually selecting competing approximate solutions, we show how one can extend it by using machine learning based techniques, so as to yield a sound and generic method to solve approximately a large class of multistage decision problems under uncertainty. The framework and solution techniques presented in the chapter are explained and illustrated on several examples. Along the way, we describe notions from decision theory that are relevant to sequential decision making under uncertainty in general.
A CAUTIOUS APPROACH TO GENERALIZATION IN REINFORCEMENT LEARNING
"... In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of tr ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbilike algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in openloop. 1
Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees
 In Recent Advances in Reinforcement Learning, 8th European Workshop, EWRL’08, LNCS (LNAI) 5323
, 2008
"... Abstract. This paper addresses the problem of solving discretetime optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x0 an optimal decision strategy can be stated as an opti ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. This paper addresses the problem of solving discretetime optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x0 an optimal decision strategy can be stated as an optimization problem which aims at finding an optimal combination of decisions attached to the nodes of a disturbance tree modeling all possible sequences of disturbances w0, w1,..., wT −1 ∈ W T over the optimization horizon T. A significant drawback of this approach is that the resulting optimization problem has a search space which is the Cartesian product of O(W  T −1) decision spaces U, which makes the approach computationally impractical as soon as the optimization horizon grows, even if W has just a handful of elements. To circumvent this difficulty, we propose to exploit an ensemble of randomly generated incomplete disturbance trees of controlled complexity, to solve their induced optimization problems in parallel, and to combine their predictions at time t = 0 to obtain a (near)optimal firststage decision. Because this approach postpones the determination of the decisions for subsequent stages until additional information about the realization of the uncertain process becomes available, we call it lazy. Simulations carried out on a robot corridor navigation problem show that even for small incomplete trees, this approach can lead to nearoptimal decisions.
Min max generalization for deterministic batch mode reinforcement learning: Relaxation schemes
 SIAM JOURNAL ON CONTROL AND OPTIMIZATION
, 2013
"... We study the minmax optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we sh ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We study the minmax optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NPhard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above].
Reinforcement learning for closedloop propofol anesthesia: A study in human volunteers
, 2014
"... Clinical research has demonstrated the efficacy of closedloop control of anesthesia using the bispectral index of the electroencephalogram as the controlled variable. These controllers have evolved to yield patientspecific anesthesia, which is associated with improved patient outcomes. Despite pr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Clinical research has demonstrated the efficacy of closedloop control of anesthesia using the bispectral index of the electroencephalogram as the controlled variable. These controllers have evolved to yield patientspecific anesthesia, which is associated with improved patient outcomes. Despite progress, the problem of patientspecific anesthesia remains unsolved. A variety of factors confound good control, including variations in human physiology, imperfect measures of drug effect, and delayed, hysteretic response to drug delivery. Reinforcement learning (RL) appears to be uniquely equipped to overcome these challenges; however, the literature offers no precedent for RL in anesthesia. To begin exploring the role RL might play in improving anesthetic care, we investigated the method’s application in the delivery of patientspecific, propofolinduced hypnosis in human volunteers. When compared to performance metrics reported in the anesthesia literature, RL demonstrated patientspecific control marked by improved accuracy and stability. Furthermore, these results suggest that RL may be considered a viable alternative for solving other difficult closedloop control problems in medicine. More rigorous clinical study, beyond the confines of controlled human volunteer studies, is needed to substantiate these findings.
F.: Optimized lookahead tree policies: a bridge between lookahead tree policies and direct policy search
 International Journal of Adaptive Control and Signal Processing
, 2013
"... ar ..."
(Show Context)
by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees
, 2008
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Stability Enhancement of a Power System Containing HighPenetration Intermittent Renewable Generation
"... This paper considers the transient stability enhancement of a power system containing large amounts of solar and wind generation in Japan. Following the Fukushima Daiichi nuclear disaster there has been an increasing awareness on the importance of a distributed architecture, based mainly on renewabl ..."
Abstract
 Add to MetaCart
(Show Context)
This paper considers the transient stability enhancement of a power system containing large amounts of solar and wind generation in Japan. Following the Fukushima Daiichi nuclear disaster there has been an increasing awareness on the importance of a distributed architecture, based mainly on renewable generation, for the Japanese power system. Also, the targets of CO2 emissions can now be approached without heavily depending on nuclear generation. Large amounts of renewable generation leads to a reduction in the total inertia of the system because renewable generators are connected to the grid by power converters, and transient stability becomes a significant issue. Simulation results show that sodiumsulfur batteries can keep the system in operation and stable after strong transient disturbances, especially for an isolated system. The results also show how the reduction of the inertia in the system can be mitigated by exploiting the kinetic energy of wind turbines.
Lipschitz Robust Control from OffPolicy Trajectories
"... We study the minmax optimization problem introduced in [Fonteneau et al. (2011), “Towards min max reinforcement learning”, Springer CCIS, vol. 129, pp. 6177] for computing control policies for batch mode reinforcement learning in a deterministic setting with fixed, finite optimization horizon. Fi ..."
Abstract
 Add to MetaCart
We study the minmax optimization problem introduced in [Fonteneau et al. (2011), “Towards min max reinforcement learning”, Springer CCIS, vol. 129, pp. 6177] for computing control policies for batch mode reinforcement learning in a deterministic setting with fixed, finite optimization horizon. First, we state that the min part of this problem is NPhard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We theoretically show that both relaxation schemes provide better results than those given in [Fonteneau et al. (2011)].