Results 1  10
of
24
Adaptive Treatment of Epilepsy via Batchmode Reinforcement Learning
"... This paper highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders. In particular, we focus on the task of optimizing a deepbrain stimulation strategy for the treatment of epilepsy. The challenge is ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
(Show Context)
This paper highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders. In particular, we focus on the task of optimizing a deepbrain stimulation strategy for the treatment of epilepsy. The challenge is to choose which stimulation action to apply, as a function of the observed EEG signal, so as to minimize the frequency and duration of seizures. We apply recent techniques from the reinforcement learning literature—namely fitted Qiteration and extremely randomized trees—to learn an optimal stimulation policy using labeled training data from animal brain tissues. Our results show that these methods are an effective means of reducing the incidence of seizures, while also minimizing the amount of stimulation applied. If these results carry over to the human model of epilepsy, the impact for patients will be substantial.
Reinforcement learning versus model predictive control: a comparison on a power system problem
 IEEE Transactions on Systems, Man, and Cybernetics  Part B: Cybernetics
, 2009
"... Abstract—This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
Abstract—This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discretetime optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes openloop policies by applying an interiorpoint solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a modelfree way closedloop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batchmode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available. Index Terms—Approximate dynamic programming (ADP), electric power oscillations damping, fitted Q iteration, interior– point method (IPM), model predictive control (MPC), reinforcement learning (RL), treebased supervised learning (SL). I.
Informing sequential clinical decisionmaking through Reinforcement learning: an empirical study
 Machine Learning
, 2011
"... Abstract This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any offtheshelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challe ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Abstract This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any offtheshelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Qfunction of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia.
Optimistic planning for sparsely stochastic systems
 In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning
, 2011
"... AbstractWe propose an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where each expansi ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
AbstractWe propose an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where each expansion exploits sparsity to add all possible successor states. Each state to expand is actively chosen to improve the knowledge about action quality, and this allows the algorithm to return a good action after a strictly limited number of expansions. More specifically, the active selection method is optimistic in that it chooses the most promising states first, so the novel algorithm is called optimistic planning for sparsely stochastic systems. We note that the new algorithm can also be seen as modelpredictive (recedinghorizon) control. The algorithm obtains promising numerical results, including the successful online control of a simulated HIV infection with stochastic drug effectiveness.
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories
 ANN OPER RES
, 2012
"... ..."
Treating epilepsy via adaptive neurostimulation: a reinforcement learning approach. International journal of neural systems
, 2009
"... This paper presents a new methodology for automatically learning an optimal neurostimulation strategy for the treatment of epilepsy. The technical challenge is to automatically modulate neurostimulation parameters, as a function of the observed EEG signal, so as to minimize the frequency and duratio ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
This paper presents a new methodology for automatically learning an optimal neurostimulation strategy for the treatment of epilepsy. The technical challenge is to automatically modulate neurostimulation parameters, as a function of the observed EEG signal, so as to minimize the frequency and duration of seizures. The methodology leverages recent techniques from the machine learning literature, in particular the reinforcement learning paradigm, to formalize this optimization problem. We present an algorithm which is able to automatically learn an adaptive neurostimulation strategy directly from labeled training data acquired from animal brain tissues. Our results suggest that this methodology can be used to automatically find a stimulation strategy which effectively reduces the incidence of seizures, while also minimizing the amount of stimulation applied. This work highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders such as epilepsy.
PAC Optimal Exploration in Continuous Space Markov Decision Processes
"... Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as ɛgreedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of finetu ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as ɛgreedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of finetuning to produce good results. PAC optimal exploration algorithms, on the other hand, offer strong theoretical guarantees but are inapplicable in domains of realistic size. The goal of this paper is to bridge the gap between theory and practice, by introducing CPACE, an algorithm which offers strong theoretical guarantees and can be applied to interesting, continuous space problems. 1 Introduction and
Adaptive Bandits: Towards the best historydependent strategy
"... We consider multiarmed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides re ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider multiarmed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ ∗ ∈ Θ. The regret is measured with respect to (w.r.t.) the best historydependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best historyclassbased strategy) for the best model in Θ. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When Θ = {θ}, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by Õ( √ T AC), where C is the number of classes of θ. Now, when many models are available, all known algorithms achieving a nice regret O ( √ T) are unfortunately not tractable and scale poorly with the number of models Θ. Our contribution here is to provide tractable algorithms with regret bounded by T 2/3 C 1/3 log(Θ) 1/2. 1
Optimized lookahead tree search policies
"... Abstract. We consider in this paper lookahead tree techniques for the discretetime control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an infinite time horizon. Given the current system state xt at time t, these techniques explore the lookahead tree repre ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We consider in this paper lookahead tree techniques for the discretetime control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an infinite time horizon. Given the current system state xt at time t, these techniques explore the lookahead tree representing possible evolutions of the system states and rewards conditioned on subsequent actions ut, ut+1,.... When the computing budget is exhausted, they output the action ut that led to the best found sequence of discounted rewards. In this context, we are interested in computing good strategies for exploring the lookahead tree. We propose a generic approach that looks for such strategies by solving an optimization problem whose objective is to compute a (budget compliant) treeexploration strategy yielding a control policy maximizing the average return over a postulated set of initial states. This generic approach is fully specified to the case where the space of candidate treeexploration strategies are “bestfirst ” strategies parameterized by a linear combination of lookahead path features – some of them having been advocated in the literature before – and where the optimization problem is solved by using an EDAalgorithm based on Gaussian distributions. Numerical experiments carried out on a model of the treatment of the HIV infection show that the optimized treeexploration strategy is orders of magnitudes better than the previously advocated ones.