Results 1 -
5 of
5
Simulation-Based Optimization of Markov Reward Processes
- IEEE Transactions on Automatic Control
, 1998
"... We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Processes where optimization takes place within a parametrized set of policies. The algorithm involves th ..."
Abstract
-
Cited by 103 (1 self)
- Add to MetaCart
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Processes where optimization takes place within a parametrized set of policies. The algorithm involves the simulation of a single sample path, and can be implemented on-line. Aconvergence result (with probability1)isprovided.
Likelihood Ratio Derivative Estimation for Finite-Time Performance Measures in Generalized Semi-Markov Processes
- Measures in Generalized Semi-Markov Processes. Management Science
, 1997
"... This paper investigates the likelihood ratio method for estimating derivatives of finite-time performance measures in generalized semi-Markov processes (GSMPs). We develop readily verifiable conditions for the applicability of this method. Our conditions mainly place restrictions on the basic buildi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates the likelihood ratio method for estimating derivatives of finite-time performance measures in generalized semi-Markov processes (GSMPs). We develop readily verifiable conditions for the applicability of this method. Our conditions mainly place restrictions on the basic building blocks (i.e., the transition probabilities, the distribution and density functions of the event lifetimes, and the initial distribution) of the GSMP, which is in contrast to the structural conditions needed for infinitesimal perturbation analysis. We explicitly show that our conditions hold in many practical settings, and in particular, for large classes of queueing and reliability models. One intermediate result which we obtain in this study, which is of independent value, is to formally show that the random variable representing the number of occurring events in a GSMP in a finite time horizon, has finite exponential moments in a neighborhood of zero. 1 Introduction When running a si...
Simulation-Based Optimization of Markov Reward Processes
"... Abstract—This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of poli ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided. Index Terms—Markov reward processes, simulation-based optimization, stochastic approximation. I.
Printed in U.S.A. ESTIMATION OF DERIVATIVES OF NONSMOOTH PERFORMANCE MEASURES IN REGENERATIVE SYSTEMS
"... We investigate the problem of estimating derivatives of expected steady-state performance measures in parametric systems. Unlike most of the existing work in the area, we allow those functions to be nonsmooth and study the estimation of directional derivatives. For the class of regenerative Markovia ..."
Abstract
- Add to MetaCart
We investigate the problem of estimating derivatives of expected steady-state performance measures in parametric systems. Unlike most of the existing work in the area, we allow those functions to be nonsmooth and study the estimation of directional derivatives. For the class of regenerative Markovian systems we provide conditions under which we can obtain consistent estimators of those directional derivatives. An example illustrates that the conditions imposed must be different from those in the differentiable case. The result also allows us to derive necessary and sufficient conditions for differentiability of the expected steady-state function. We then analyze the process formed by the subdifferentials of the original process, and show that the subdifferential set of the expected steady-state function can be expressed as an average of integrals of multifunctions, which is the approach commonly found in the literature for integrals of sets. The latter result can also be viewed as a limit theorem for more general compact-convex multivalued processes. 1. Introduction. In
Estimation of Derivatives of Nonsmooth Performance Measures in Regenerative Systems
, 1998
"... Estimation of derivatives and consequent optimization of stochastic systems are fields that have been growing considerably in recent years. Most of the work, however, has been done for differentiable systems. In this paper we investigate the problem of estimating directional derivatives of expected ..."
Abstract
- Add to MetaCart
(Show Context)
Estimation of derivatives and consequent optimization of stochastic systems are fields that have been growing considerably in recent years. Most of the work, however, has been done for differentiable systems. In this paper we investigate the problem of estimating directional derivatives of expected steadystate performance measures in parametric systems where those functions are not smooth. For the class of regenerative Markovian systems we provide conditions under which we can obtain consistent estimators of those directional derivatives. An example illustrates that the conditions imposed must be more strict than in the differentiable case. Besides yielding an estimation procedure for directional derivatives and subgradients of equilibrium quantities, the result allows us to derive necessary and sufficient conditions for differentiability of the expected steady-state function. We then analyze the process formed by the subdifferentials of the original process, and show that the subdiff...