Results 1 
5 of
5
SimulationBased Optimization of Markov Reward Processes
 IEEE Transactions on Automatic Control
, 1998
"... We propose a simulationbased algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Processes where optimization takes place within a parametrized set of policies. The algorithm involves th ..."
Abstract

Cited by 103 (1 self)
 Add to MetaCart
We propose a simulationbased algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Processes where optimization takes place within a parametrized set of policies. The algorithm involves the simulation of a single sample path, and can be implemented online. Aconvergence result (with probability1)isprovided.
Likelihood Ratio Derivative Estimation for FiniteTime Performance Measures in Generalized SemiMarkov Processes
 Measures in Generalized SemiMarkov Processes. Management Science
, 1997
"... This paper investigates the likelihood ratio method for estimating derivatives of finitetime performance measures in generalized semiMarkov processes (GSMPs). We develop readily verifiable conditions for the applicability of this method. Our conditions mainly place restrictions on the basic buildi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper investigates the likelihood ratio method for estimating derivatives of finitetime performance measures in generalized semiMarkov processes (GSMPs). We develop readily verifiable conditions for the applicability of this method. Our conditions mainly place restrictions on the basic building blocks (i.e., the transition probabilities, the distribution and density functions of the event lifetimes, and the initial distribution) of the GSMP, which is in contrast to the structural conditions needed for infinitesimal perturbation analysis. We explicitly show that our conditions hold in many practical settings, and in particular, for large classes of queueing and reliability models. One intermediate result which we obtain in this study, which is of independent value, is to formally show that the random variable representing the number of occurring events in a GSMP in a finite time horizon, has finite exponential moments in a neighborhood of zero. 1 Introduction When running a si...
SimulationBased Optimization of Markov Reward Processes
"... Abstract—This paper proposes a simulationbased algorithm for optimizing the average reward in a finitestate Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of poli ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper proposes a simulationbased algorithm for optimizing the average reward in a finitestate Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finitestate Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided. Index Terms—Markov reward processes, simulationbased optimization, stochastic approximation. I.
Printed in U.S.A. ESTIMATION OF DERIVATIVES OF NONSMOOTH PERFORMANCE MEASURES IN REGENERATIVE SYSTEMS
"... We investigate the problem of estimating derivatives of expected steadystate performance measures in parametric systems. Unlike most of the existing work in the area, we allow those functions to be nonsmooth and study the estimation of directional derivatives. For the class of regenerative Markovia ..."
Abstract
 Add to MetaCart
We investigate the problem of estimating derivatives of expected steadystate performance measures in parametric systems. Unlike most of the existing work in the area, we allow those functions to be nonsmooth and study the estimation of directional derivatives. For the class of regenerative Markovian systems we provide conditions under which we can obtain consistent estimators of those directional derivatives. An example illustrates that the conditions imposed must be different from those in the differentiable case. The result also allows us to derive necessary and sufficient conditions for differentiability of the expected steadystate function. We then analyze the process formed by the subdifferentials of the original process, and show that the subdifferential set of the expected steadystate function can be expressed as an average of integrals of multifunctions, which is the approach commonly found in the literature for integrals of sets. The latter result can also be viewed as a limit theorem for more general compactconvex multivalued processes. 1. Introduction. In
Estimation of Derivatives of Nonsmooth Performance Measures in Regenerative Systems
, 1998
"... Estimation of derivatives and consequent optimization of stochastic systems are fields that have been growing considerably in recent years. Most of the work, however, has been done for differentiable systems. In this paper we investigate the problem of estimating directional derivatives of expected ..."
Abstract
 Add to MetaCart
(Show Context)
Estimation of derivatives and consequent optimization of stochastic systems are fields that have been growing considerably in recent years. Most of the work, however, has been done for differentiable systems. In this paper we investigate the problem of estimating directional derivatives of expected steadystate performance measures in parametric systems where those functions are not smooth. For the class of regenerative Markovian systems we provide conditions under which we can obtain consistent estimators of those directional derivatives. An example illustrates that the conditions imposed must be more strict than in the differentiable case. Besides yielding an estimation procedure for directional derivatives and subgradients of equilibrium quantities, the result allows us to derive necessary and sufficient conditions for differentiability of the expected steadystate function. We then analyze the process formed by the subdifferentials of the original process, and show that the subdiff...