See this document in CiteSeerX!

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events (2006)  (Make Corrections)  
Shalabh Bhatnagar, Vivek S. Borkar, et al.



  Home/Search   Context   Related

 
View or download:
jmlr.org/papers/volu...bhatnagar06a.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/papers/v7/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest... (Update)

Active bibliography (related documents):   More   All
1.1:   Randomized Difference Two-Timescale Simultaneous.. - Bhatnagar, Fu, al. (2000)   (Correct)
0.5:   Convergence of Simulation-Based Policy Iteration - Cooper, Henderson, Lewis (2003)   (Correct)
0.3:   Optimal Multilevel Feedback Policies for ABR Flow Control.. - Bhatnagar, Fu, al. (1999)   (Correct)

Similar documents based on text:
6.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ bhatnagar-simulationbased,
  author = "Shalabh Bhatnagar and Vivek S. Borkar and et al.",
  title = "A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned
    on Rare Events",
  url = "citeseer.ist.psu.edu/bhatnagar06simulationbased.html" }
Citations (may not include all citations):
614   Reinforcement Learning: An Introduction - Sutton, Barto - 1998
413   Neuro-dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
291   Simulation Modeling and Analysis (context) - Law, Kelton - 2000
281   Machine Learning (context) - Watkins, Dayan - 1992
278   Dynamic Programming and Optimal Control (context) - Bertsekas - 2001
120   Large Deviations Techniques in Decision (context) - Bucklew - 1990
50   Simulation-based optimization of markov reward processes - Marbach, Tsitsiklis - 2001
27   Multivariate stochastic approximation using a simultaneous p.. - Spall - 1992
26   Stochastic approximation with two timescales (context) - Borkar - 1997
21   Call admission control and routing in integrated services ne.. - Marbach, Mihatsch et al. - 2000
19   Distributed asynchronous optimal routing in data networks (context) - Tsitsiklis, Bertsekas - 1986
17   Stochastic optimization of regenerative systems using infini.. (context) - Chong, Ramadge - 1994
16   Asynchronous stochastic approximations - Borkar - 1998
15   Infinite-horizon policy-gradient estimation - Baxter, Bartlett - 2001
12   Spectral theory and limit theorems for geometrically ergodic.. - Kontoyiannis, Meyn - 2003
10   Multiplicative ergodicity and large deviations for an irredu.. - Balaji, Meyn - 2000
8   Actor-critic like learning algorithms for markov decision pr.. (context) - Konda, Borkar - 1999
7   Experiments with infinite-horizon (context) - Baxter, Bartlett et al. - 2001
6   Optimization of computer simulation models with rare events - Rubinstein - 1997
4   Multiscale stochastic approximation for parametric optimizat.. (context) - Bhatnagar, Borkar - 1997
4   A one-measurement form of simultaneous perturbation stochast.. (context) - Spall - 1997
4   Perturbation Analysis of Discrete Event Dynamical Systems (context) - Ho, Cao - 1991
3   Risk sensitive control of markov processes in countable stat.. (context) - andez-Hern, Marcus - 1996
2   A two time scale stochastic approximation scheme for simulat.. (context) - Bhatnagar, Borkar - 1998
2   A unified approach to markov decision problems and performan.. (context) - Cao, Guo - 2004
1   Adaptive importance sampling technique for markov chains usi.. (context) - Ahamed, Borkar et al. - 2006
1   Two timescale algorithms for simulation optimization of hidd.. (context) - Bhatnagar, Fu et al. - 2001
1   A sensitivity formula for risk-sensitive cost and the actor-.. (context) - Borkar - 2001
1   Two-timescale simultaneous perturbation stochastic approxima.. (context) - Bhatnagar, Fu et al. - 2003
1   A simultaneous perturbation stochastic approximation based a.. (context) - Bhatnagar, Kumar - 2004
1   learning for risk-sensitive control (context) - Borkar - 2002
1   learning for adaptive load based routing (context) - Nowe, Steenhaut et al. - 1994
1   Avoidance of traps in stochastic approximation (context) - Borkar - 2003
1   Reinforcing reachable routes - Varadarajan, Ramakrishnan et al. - 2003
1   Performance analysis conditioned on rare events: an adaptive.. (context) - Borkar, Juneja et al. - 2004
1   Some pathological traps for stochastic approximation (context) - Brandiere - 1998
1   Risk-sensitive optimal control for markov decision processes.. (context) - Borkar, Meyn - 2002
1   Adaptive multivariate three-timescale stochastic approximati.. (context) - Bhatnagar - 2005
1   Discrete Event Dynamic Systems (context) - Cao, among et al. - 1998

Documents on the same site (http://jmlr.csail.mit.edu/papers/v7/):   More
Estimating the "Wrong" Graphical Model: Benefits in the.. - Wainwright (2006)   (Correct)
A Hierarchy of Support Vector Machines for Pattern Detection - Sahbi, Geman (2006)   (Correct)
Efficient Learning of Label Ranking by Soft Projections.. - Shalev-Shwartz, Singer (2006)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC