Results 1 - 10
of
10
Optimization via simulation: a review
- Annals of Operations Research
, 1994
"... We review techniques for optimizing stochastic discrete-event systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on ..."
Abstract
-
Cited by 52 (16 self)
- Add to MetaCart
We review techniques for optimizing stochastic discrete-event systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on the techniques for optimization from a finite set: multiple-comparison procedures and ranking-and-selection procedures. For the continuous parameter case, we focus on gradient-based methods, including perturbation analysis, the likelihood ratio method, and frequency domain experimentation. For illustrative purposes, we compare and contrast the implementation of the techniques for some simple discrete-event systems such as the (s, S) inventory system and the GI/G/1 queue. Finally, we speculate on future directions for the field, particularly in the context of the rapid advances being made in parallel computing.
Likelihood Ratio Gradient Estimation For Stochastic Recursions
- Communications of the ACM
, 1995
"... . In this paper, we develop mathematical machinery for verifying that a broad class of general state space Markov chains reacts smoothly to certain types of perturbations in the underlying transition structure. Our main result provides conditions under which the stationary probability measure of an ..."
Abstract
-
Cited by 49 (7 self)
- Add to MetaCart
. In this paper, we develop mathematical machinery for verifying that a broad class of general state space Markov chains reacts smoothly to certain types of perturbations in the underlying transition structure. Our main result provides conditions under which the stationary probability measure of an ergodic Harris recurrent Markov chain is differentiable in a certain strong sense. The approach is based on likelihood ratio "change-of-measure" arguments, and leads directly to a "likelihood ratio gradient estimator" that can be computed numerically. Keywords: Harris recurrent Markov chain, likelihood ratio, gradient estimation, regeneration. 1 The research of this author was supported by the U. S. Army Research Office under Contract No. DAAL03-91G -0101 and by the National Science Foundation under Contract No. DDM-9101580. 2 This author's research was supported by NSERC-Canada grant No. OGP0110050 and FCAR-Qu'ebec grant No. 93ER1654. 1. Introduction In this paper, we will study the cl...
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Ordinal Optimization of DEDS
, 1996
"... . In this paper we argue that ordinal rather than cardinal optimization, i.e., concentrating on finding good, better, or best designs rather than on estimating accurately the performance value of these designs, offers a new, efficient, and complementary approach to the performance optimization of sy ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
. In this paper we argue that ordinal rather than cardinal optimization, i.e., concentrating on finding good, better, or best designs rather than on estimating accurately the performance value of these designs, offers a new, efficient, and complementary approach to the performance optimization of systems. Some experimental and analytical evidence is offered to substantiate this claim. The main purpose of the paper is to call attention to a novel and promising approach to system optimization. Original: Aug. 20, 1991 First Revision: Nov. 22, 1991 Second Revision: Jan 29, 1992 Acknowledgement: This work is supported by NSF grants CDR-88-03012, DDM-89-14277, ONR contracts N00014-90-J-1093, N00014-89-J-1023, and Army contracts DAAL-0383 -K-0171, DAAL-91-G-0194. Ordinal Optimization of DEDS Version: 11/12/96 2 1. Introduction and Rationale The problem of stochastic optimization of a multivariable function J(q) º E[L(x(t; q,x)] (1) where q, is the design parameter, L, some performance fun...
Learning from scarce experience
- Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large number of interactions with the environment as different polices are considered. We present a family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy. The algorithms combine estimation and optimization stages. The former utilizes experience to build a non-parametric representation of an optimized function. The latter performs optimization on this estimate. We show positive empirical results and provide the sample complexity bound. 1.
On Sampling-controlled Stochastic Approximation
- IEEE Transactions on Automatic Control
, 1991
"... In the general area of optimization under uncertainty, there are a large number of applications which require finding the `best' values for a set of control variables or parameters and for which the only data available consist of measurements prone to random errors. Stochastic approximation provides ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In the general area of optimization under uncertainty, there are a large number of applications which require finding the `best' values for a set of control variables or parameters and for which the only data available consist of measurements prone to random errors. Stochastic approximation provides a method of handling such noise or randomness in data; it has been widely studied in the literature and used in several applications. In this paper, we examine a new class of stochastic approximation procedures which are based on carefully controlling the number of observations or measurements taken before each computational iteration. This method, which we refer to as Sampling-controlled Stochastic Approximation, has advantages over standard stochastic approximation such as requiring less computation and the ability to handle bias in estimation. We address the growth rate required of the number of samples and prove a general convergence theorem for this new stochastic approximation method....
Approximating an Optimal Production Policy in a Continuous Flow Line: Recurrence and Asymptotic Properties
"... This work is concerned with manufacturing systems with two failure-prone tandem machines. The production is regulated by a continuous version of buffer control. Our goal is to obtain an optimal buffer-control policy to minimize a long run average cost function. Concentrating on threshold type of con ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This work is concerned with manufacturing systems with two failure-prone tandem machines. The production is regulated by a continuous version of buffer control. Our goal is to obtain an optimal buffer-control policy to minimize a long run average cost function. Concentrating on threshold type of control policies, our effort is devoted to parameter optimization problems for the continuous material produce-to-stock models. We estimate the gradients of the cost function with respect to the parameter using perturbation analysis techniques, and approximate the optimal value of the parameter via a constant step-size stochastic approximation algorithm. An analysis for error accumulation in perturbation propagation is undertaken, and a sufficient condition for breaking the propagation chain is derived. In addition, we show that the event of breaking the perturbation propagation chain is recurrent if the system has sufficient capacity, derive the consistency of the gradient estimators, and esta...
Synchronous Constrained Fluid Systems
- IBM Research Division
, 1995
"... This paper introduces the framework of synchronous constrained fluid systems (SCFS) to model ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper introduces the framework of synchronous constrained fluid systems (SCFS) to model
Proceedings of the 1992 Winter Simulation Conference
"... This tutorial discusses the issues and procedures for using simulation as a tool for optimization of stochastic complex systems that are modeled by computer simulation. It is intended to be a tutorial rather than an exhaustive literature search. Its emphasis is mostly on issues that are specific to ..."
Abstract
- Add to MetaCart
This tutorial discusses the issues and procedures for using simulation as a tool for optimization of stochastic complex systems that are modeled by computer simulation. It is intended to be a tutorial rather than an exhaustive literature search. Its emphasis is mostly on issues that are specific to simulation optimization instead of concentrating on the general optimization and mathematical programming techniques, Even though a lot of effort has been spent to provide a comprehensive overview of the field, still there are methods and techniques that have not been covered and valuable works that may not have been mentioned. 1
Printed in Great Britain 0895-7177/89 $3.00 + 0.00 Pergamon Press plc SENSITIVITY ANALYSIS AND THE “WHAT IF” PROBLEM IN SIMULATION ANALYSIS
"... (Received and accepted for publication March 1988) ..."

