Results 1 
8 of
8
Parallel rollout for online solution of partially observable Markov decision processes,” Discrete Event Dynamic Systems: Theory and Application
, 2004
"... Abstract We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particula ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
Abstract We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs nearoptimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem. I.
Approximate receding horizon approach for markov decision processes: average reward case
 Journal of Mathematical Analysis and Applications
, 2003
"... We consider an approximation scheme for solving Markov Decision Processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finitehorizon subMDP of a given infinitehorizon MDP to create a stationary policy, which we call “appr ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider an approximation scheme for solving Markov Decision Processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finitehorizon subMDP of a given infinitehorizon MDP to create a stationary policy, which we call “approximate receding horizon control”. We first analyze the performance of the approximate receding horizon control for infinitehorizon average reward under an ergodicity assumption, which also generalizes the result obtained by White [36]. We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the subMDP. The first control policy is based on a finitehorizon approximation of Howard’s policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.
Online Pricing for Bandwidth Provisioning in Multiclass Networks
"... We consider the problem of pricing for bandwidth provisioning over a single link, where users arrive according to a known stochastic tra#c model. The network administrator controls the resource allocation by setting a price at every epoch, and each user's response to the price is governed by a ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of pricing for bandwidth provisioning over a single link, where users arrive according to a known stochastic tra#c model. The network administrator controls the resource allocation by setting a price at every epoch, and each user's response to the price is governed by a demand function. We formulate this problem as a partially observable Markov decision process (POMDP), and explore two novel pricing schemesreactive pricing and spot pricingand compare their performance to appropriately tuned flat pricing. We use a gradientascent approach in all the three pricing schemes. We provide methods for computing the unbiased estimates of the gradient in an online (incremental) fashion. Our simulation results show that our novel schemes take advantage of the known underlying tra#c model and significantly outperform the modelfree pricing scheme of flat pricing.
Approximation Results On Sampling Techniques For ZeroSum Discounted Markov Games
"... In this paper, we first present a key approximation result for zerosum, discounted Markov games, providing bounds on the statewise loss and the loss in the sup norm resulting from using approximate Qfunctions (e.g., Qfunctions estimated by sampling). Then, we extend the policy rollout sampling t ..."
Abstract
 Add to MetaCart
In this paper, we first present a key approximation result for zerosum, discounted Markov games, providing bounds on the statewise loss and the loss in the sup norm resulting from using approximate Qfunctions (e.g., Qfunctions estimated by sampling). Then, we extend the policy rollout sampling technique for MDPs to Markov games. Using our key approximation result, we prove that under certain conditions, the resulting rollout technique for games gives rise to a policy that is closer to the Nash equilibrium than the base policy, using an amount of sampling completely independent of the state space size. We also use our key result to provide an alternative to a published analysis of a second sampling approach to Markov games known as "sparse sampling". Thus, our theorem implies the (already known) result that, under certain conditions, the policy generated by the sparsesampling algorithm is close to the Nash equilibrium. Again, the amount of sampling that guarantees the result is independent of the size of the state space of the Markov game. We also provide simulation results to demonstrate the practicality of our extension of the rollout technique.
Sampling Techniques for Zerosum, Discounted Markov Games
"... In this paper, we first present a key approximation result for zerosum, discounted Markov games, providing bounds on the statewise loss and the loss in the sup norm resulting from using approximate Qfunctions. Then we extend the policy rollout technique for MDPs to Markov games. Using our key app ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we first present a key approximation result for zerosum, discounted Markov games, providing bounds on the statewise loss and the loss in the sup norm resulting from using approximate Qfunctions. Then we extend the policy rollout technique for MDPs to Markov games. Using our key approximation result, we prove that, under certain conditions, the rollout technique gives rise to a policy that is closer to the Nash equilibrium than the base policy. We also use our key result to provide an alternative analysis of a second sampling approach to Markov games known as sparse sampling. Our analysis implies the (already known) result that, under certain conditions, the policy generated by the sparsesampling algorithm is close to the Nash equilibrium. We prove that the amount of sampling that guarantees these results is independent of the statespacesize of the Markov game.
DISTRIBUTION STATEMENT A. APPROVED FOR PUBLIC RELEASE; DISTRIBUTION IS UNLIMITED
, 2014
"... United States Department of Defense or the United States Government. This is an academic work and should not be used to imply or infer actual mission capability or limitations. AFIT–DS–ENS–14–S–18 ..."
Abstract
 Add to MetaCart
(Show Context)
United States Department of Defense or the United States Government. This is an academic work and should not be used to imply or infer actual mission capability or limitations. AFIT–DS–ENS–14–S–18
OF PAGES
, 2001
"... ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical, ..."
Abstract
 Add to MetaCart
(Show Context)
ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical,
MARKOV GAMES: RECEDING HORIZON APPROACH
, 2001
"... ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical, ..."
Abstract
 Add to MetaCart
(Show Context)
ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical,