See this document in CiteSeerX!

Reinforcement Learning by Policy Search (2001)  (Make Corrections)  (9 citations)
Leonid Peshkin



  Home/Search   Context   Related

 
View or download:
mit.edu/~pesha/Public/aitr.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/~pesha/disser (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: This dissertation presents work started at Brown University and completed at the Articial Intelligence Laboratory, MIT over the course of several years. A significant part of the research presented in this dissertation has been previously published elsewhere. (Update)

Context of citations to this paper:   More

.... [10] We apply a more recent technique from reinforcement learning called GAPS (which stands for Gradient Ascent for Policy Search) [11]. In GAPS, the learner plays a parameterised strategy represented, e.g. by a finite state automaton, where parameters are probabilities...

.... APPROACH We propose to use a recent reinforcement learning algorithm called GAPS (which stands for Gradient Ascent for Policy Search) [10]. In GAPS, the learner plays a parameterised strategy represented by a non deterministic Moore automaton, where the parameters are the...

Cited by:   More
Performance Management in Competitive Distributed Web Search - Khoussainov, Kushmerick (2003)   (Correct)
Automated Index Management for Distributed Web Search - Khoussainov, Kushmerick (2003)   (Correct)
Reinforcement Learning for Adaptive Routing - Peshkin, Savova (2002)   (Correct)

Similar documents (at the sentence level):
18.2%:   Reinforcement Learning by Policy Search - Peshkin (2000)   (Correct)
5.1%:   Bounds on Sample Size for Policy Evaluation in Markov.. - Peshkin, Mukherjee (2001)   (Correct)

Active bibliography (related documents):   More   All
1.3:   Importance Sampling For Markov Chains: Asymptotics For The Variance - Glynn (1994)   (Correct)
1.0:   Context-Based Policy Search: Transfer of Experience Across.. - Peshkin, de Jong (2002)   (Correct)
0.8:   Learning POMDP Policies with Internal State using Gradient.. - Aberdeen, Baxter (2001)   (Correct)

Similar documents based on text:   More   All
0.1:   Automatic Acquisition of Language Models for Speech recognition - McCandless (1994)   (Correct)
0.1:   Building an Active Node on the Internet - Murphy (1997)   (Correct)
0.1:   A Learning Approach to Personalized Information Filtering - Sheth (1994)   (Correct)

Related documents from co-citation:   More   All
7:   A Course in Game Theory (context) - Osborne, Rubinstein - 1994
7:   Complexity results about Nash equilibria - Conitzer, Sandholm - 2002
7:   Modelling Bounded Rationality (context) - Rubinstein - 1997

BibTeX entry:   (Update)

Leonid Peshkin. Reinforcement Learning by Policy Search. PhD thesis, Brown University, Providence, RI, 2001. http://citeseer.ist.psu.edu/peshkin01reinforcement.html   More

@misc{ peshkin01reinforcement,
  author = "L. Peshkin",
  title = "Reinforcement Learning by Policy Search",
  text = "Leonid Peshkin. Reinforcement Learning by Policy Search. PhD thesis, Brown
    University, Providence, RI, 2001.",
  year = "2001",
  url = "citeseer.ist.psu.edu/peshkin01reinforcement.html" }
Citations (may not include all citations):
1491   Learning internal representations by error propagation (context) - Rumelhart, Hinton et al. - 1986
1362   A tutorial on hidden Markov models and selected applications.. (context) - Rabiner - 1989
947   Statistical learning theory (context) - Vapnik - 1998
658   Learning from delayed rewards (context) - Watkins - 1989
614   Reinforcement learning: An introduction - Sutton, Barto - 1998
493   Communications of the ACM (context) - Valiant, of et al. - 1984
413   Neuro-dynamic programming (context) - Bertsekas, Tsitsiklis - 1996
362   An introduction to hidden Markov models (context) - Rabiner, Juang - 1986
318   Convergence of stochastic processes (context) - Pollard - 1984
317   Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
310   A note on two problems in connection with graphs (context) - Dijkstra - 1959
282   A course in game theory (context) - Osborne, Rubinstein - 1994
268   Dynamic programming and Markov processes (context) - Howard - 1960
246   Markov decision processes (context) - Puterman - 1994
221   Society of mind (context) - Minsky - 1996
216   The optimal control of partially observable Markov processes (context) - Sondik - 1971
216   The optimal control of partially observable Markov processes.. (context) - Smallwood, Sondik - 1973
199   Markov games as a framework for multi-agent reinforcement le.. - Littman - 1994
157   Probability inequalities for sums of bounded random variable.. (context) - Hoe - 1963
148   Acting optimally in partially observable stochastic domains - Cassandra, Kaelbling et al. - 1994
126   Simulation and the Monte Carlo method (context) - Rubinstein - 1981
124   Improving elevator performance using reinforcement learning - Crites, Barto - 1997
120   Large deviation techniques in decision (context) - Bucklew - 1990
99   Multiagent reinforcement learning: Theoretical framework and.. - Hu, Wellman - 1998
96   International Journal on Computer Vision (context) - Aloimonos, Weiss et al. - 1987
88   Neural networks learning: Theoretical foundations (context) - Anthony, Bartlett - 1999
85   The dynamics of reinforcement learning in cooperative multia.. - Claus, Boutilier - 1998
84   Princeton University Press (context) - Bellman, programming - 1957
83   From local actions to global tasks: Stigmergy and collective.. (context) - Beckers, Holland et al. - 1994
81   Reinforcement learning algorithm for partially observable Ma.. - Jaakkola, Singh et al. - 1994
70   Policy gradient methods for reinforcement learning with func.. - Sutton, McAllester et al. - 1999
60   Purposive behavior acquisition for a real robot by vision-ba.. - Asada, Nakamura et al. - 1996
60   A representation for the adaptive generation of simple seque.. (context) - Cramer - 1985
56   Algorithms for non-negative matrix factorization - Lee, Seung - 2000
55   Reinforcement learning for dynamic channel allocation in cel.. - Singh, Bertsekas - 1997
54   High-level vision (context) - Ullman - 1996
53   Gradient descent for general reinforcement learning - Baird, Moore - 1999
52   Packet routing in dynamically changing networks: A reinforce.. - Boyan, Littman - 1994
50   Simulation-based optimization of Markov reward processes - Marbach, Tsitsiklis - 1998
47   Numerical recipes in C: The art of scienti c computing (context) - Press, Teukolsky et al. - 1993
44   PEGASUS: A policy search method for large MDPs and POMDPs - Ng, Jordan - 2000
40   Learning factorial codes by predictability minimization - Schmidhuber - 1992
38   Convergence results for single-step on-policy reinforcement-.. - Singh, Jaakkola et al. - 2000
37   Natural gradient works e ciently in learning (context) - Amari - 1998
36   Approximate planning in large POMDPs via reusable trajectori.. - Kearns, Mansour et al. - 1999
35   Planning and acting in partially observable stochastic domai.. - Kaelbling, Littman et al. - 1998
34   Solving very large weakly coupled Markov decision processes - Meuleau, Hauskrecht et al. - 1998
34   Ants and reinforcement learning: A case study in routing in .. - Subramanian, Druschel et al. - 1997
33   Shifting inductive bias with success-story algorithm - Schmidhuber, Zhao et al. - 1997
33   the method of bounded di erences (context) - McDiarmid, combinatorics et al. - 1989
33   Robust monte carlo methods for light transport simulation (context) - Veach - 1997
31   Reinforcement learning in POMDP's via direct gradient ascent - Baxter, Bartlett - 2000
29   The complexity of Markov decision processes (context) - Papadimitriou, Tsitsiklis - 1987
29   Planning and control in stochastic domains with imperfect in.. - Hauskrecht - 1998
29   Measuring the VCdimension of a learning machine - Vapnik, Levin et al. - 1994
28   Finite-memory control of partially observable systems (context) - Hansen - 1998
27   Using collective intelligence to route internet trac - Wolpert, Tumer et al. - 1998
26   IEEE Transactions on Information Theory (context) - Merhav, Feder et al. - 1998
26   Learning policies with external memory - Peshkin, Meuleau et al. - 1999
26   Uniform central limit theorems (context) - Dudley - 1999
25   Reinforcement learning with selective perception and hidden .. (context) - McCallum - 1996
24   Model selection and error estimation - Bartlett, Boucheron et al. - 2000
24   Dynamic programming and optimal control (context) - Bertsekas - 1995
23   Direct search algorithms for optimization calculations - Powell - 1998
22   Neuronlike adaptive elements that can solve dicult learning .. (context) - Barto, Sutton et al. - 1983
22   Sample complexity for learning recurrent perceptron mappings - Dasgupta, Sontag - 1996
21   Theoretical neuroscience (context) - Dayan, Abbott - 2001
21   Adaptive Behavior (context) - Wiering, Schmidhuber - 1998
21   Learning and value function approximation in complex decisio.. - Van Roy - 1998
21   Call admission control and routing in integrated service net.. - Marbach, Mihatsch et al. - 2000
20   Oxford English dictionary (context) - Simpson, Weiner - 1989
19   Exploration of multi-state environments: Local measures and .. - Meuleau, Bourgine - 1999
18   Distributed value functions - Schneider, Wong et al. - 1999
18   Reinforcement learning in connectionist networks: A mathemat.. (context) - Williams - 1986
17   Predictive reward signal of dopamine neurons (context) - Schultz - 1998
17   Evolutionary Algorithms for Reinforcement Learning - Moriarty, Schultz et al. - 1999
16   Practical issues in temporal di erence learning (context) - Tesauro - 1992
16   Manifesto for an Evolutionary Economics of Intelligence (context) - Baum, networks et al. - 1998
15   Worst case prediction over sequences in under log loss - Opper, Haussler - 1997
14   The sciences of the arti cial (context) - Simon - 1996
13   Methods of reducing sample size in Monte Carlo computations (context) - Kahn, Marshall - 1953
13   Classi er systems and genetic algorithms (context) - Booker, Goldberg et al. - 1989
12   Reinforcement learning for call admission control and routin.. - Marbach, Mihatsch et al. - 1998
12   Ant colonies for adaptive routing in packetswitched communic.. - Di Caro, Dorigo - 1998
11   Greedy packet scheduling - Cidon, Kutten et al. - 1995
11   A framework for mesencephalic dopamine systems based on pred.. (context) - Montague, Dayan et al. - 1996
11   Simulation-based methods for Markov decision processes - Marbach - 1998
11   Learning nite-state controllers for partially observable env.. (context) - Meuleau, Peshkin et al. - 1999
10   Localizing policy gradient estimates to action transitions - Grudic, Ungar - 2000
10   Optimizing admission control while ensuring quality of servi.. - Brown, Tong et al. - 1999
9   Sequential optimality and coordination in multiagent systems - Boutilier - 1999
9   Continual learning in reinforcement environments (context) - Ring - 1994
9   Exploration in gradient-based reinforcement learning (context) - Meuleau, Peshkin et al. - 2000
8   Self calibration of motion and stereo vision for mobile robo.. - Brooks, Flynn et al. - 1987
8   Operations Research (context) - control, observable et al. - 1978
8   Adaptive importance sampling in Monte Carlo integration (context) - Oh, Berger - 1992
8   Exact and approximate algorithms for partially observable Ma.. (context) - Cassandra - 1998
7   Optimization of stochastic systems (context) - Glynn - 1986
7   Reinforcement learning through gradient descent - Baird - 1999
7   Function optimization using connectionist reinforcement lear.. - Williams, Peng - 1991
7   Learning without state-estimation in partially observable Ma.. - Singh, Jaakkola et al. - 1994
7   Weighted uniform sampling - a Monte Carlo technique for redu.. (context) - Powell, Swann - 1966
7   Open theoretical questions in reinforcement learning - Sutton - 1999
7   Solving combinatorial optimization tasks by reinforcement le.. - Zhang, Dietterich - 2000
7   capacity of sets in function spaces (context) - Kolmogorov, Tikhomirov - 1961
6   Adaptive importance sampling for estimation in structured do.. - Ortiz, Kaelbling - 2000
6   Fast reinforcement learning through eugenic neuro-evolution - Polani, Miikkulainen - 1999
5   Stability of the gradient process in n- person games (context) - Arrow, Hurwicz - 1960
5   Direct policy search using paired statistical tests - Strens, Moore - 2001
5   analysi actorcritic algorithm using eligibility trace Reinfo.. (context) - Kimura, An et al. - 1998
5   Adaptive importance sampling for integration (context) - Zhou - 1998
5   A framework for control of a camera head - Andersen - 1996
5   Importance sampling for reinforcement learning with multiple.. (context) - Shelton - 2001
5   Using eligibility traces to nd the best memoryless policy in.. (context) - Loch, Singh - 1998
5   Arti cial Intelligence (context) - Ballard, vision - 1991
5   Decision theoretic generalizations of the PAC model (context) - Haussler - 1992
5   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
4   Universal sequential coding of single measures (context) - Shtarkov - 1987
4   ect of representation and knowledge on goal-directed explora.. (context) - Koenig, Simmons - 1996
4   Machine Learning (context) - of, an et al. - 1999
4   Ecient exploration in reinforcement learning (context) - Thrun - 1992
4   A neural substrate of prediction and reward (context) - Schultz, Dayan et al. - 1997
4   Advances in Neural Information Processing Systems (context) - Konda, Tsitsiklis et al. - 1999
4   Direct policy search and uncertain policy evaluation - Schmidhuber, Zhao - 1998
4   Kernel-based reinforcement learning in average-cost problems.. - Ormoneit, Glynn - 2000
4   Risk-sensitive and minimax control of discrete-time - Coraluppi, Marcus - 1999
4   ding's inequality for uniformly ergodic Markov chains (context) - Glynn, Ormoneit - 2001
3   A new family of estimators for random walk problems (context) - Spanier - 1979
3   policy gradient approach to network routing (context) - Tao, Baxter et al. - 2001
3   Low power wireless communication via reinforcement learning - Brown - 1999
3   policy policy evaluation (context) - Precup, Sutton et al. - 2000
3   The Monte Carlo method and related problems (context) - Ermakov - 1971
3   The bias-variance dilemma of the Monte Carlo method - Zlochin, Baram - 2000
3   Reinforcement learning for embedded agents facing complex ta.. - Mart - 1998
3   Reinforcement learning and mistake bounded algorithms - Mansour - 1999
3   Feature discovery by competetive learning (context) - Rumelhart, Zipser - 1986
3   Advances in Neural Information Processing Systems (context) - Kakade, policy - 2001
3   Selecting approximately-optimal actions in complex structura.. (context) - Ortiz - 2002
3   The compplexity of decentralized control of Markov decision .. (context) - Bernstein, Zilberstein et al. - 2000
2   On biological plausibility of policy search (context) - Peshkin, Savova - 2002
2   Design and analysis of cognitive packet networks (context) - Gelenbe, Lent et al. - 2001
2   Statisticheskoje modelirovanie (context) - Ermakov, Mikailov - 1982
2   Reinforcement learning for admission control and routing (context) - Carlstrom - 2000
2   Central limit theorems for Markov random walks (context) - Niemi, Nummelin - 1982
2   An integrated connectionist approach to reinforcement learni.. - Hougen, Gini et al. - 2000
2   Learning to control an inverted pendilum using neural networ.. (context) - Anderson - 1989
2   Communications of the ACM (context) - gradient, stochastic - 1990
2   Hebbian synaptic modi cations in spiking neurons that learn (context) - Bartlett, Baxter - 1999
2   Learning to generate arti cial fovea trajectories for target.. (context) - Schmidhuber, Huber - 1991
2   Narendra and Mandayam (context) - Kumpati - 1989
2   Machine Learning (context) - gradient-following, connectionist et al. - 1992
2   Worst-case bounds for the logarithmic loss of predictors - Cesa-Bianchi, abor - 2001
2   Dopamine bonuses - Kakade, Dayan - 2001
2   Bounds on sample size for policy evaluation in Markov enviro.. - Peshkin, Mukherjee - 2001
1   The theory of probability (context) - Bernstein - 1946
1   pole balancing with recurrent evolutionary networks (context) - Gomez, Miikkulainen - 1998
1   SIAM Journal on Control and Optimization (context) - algorithms - 2001
1   Stochastic Models (context) - for, chains et al. - 1994
1   Ecient reinforcement learning through symbolic evolution (context) - Moriarty, Miikkulainen - 1996
1   Machine Learning (context) - of, processes et al. - 2000
1   Evolutionary Computation (context) - networks, ecient et al. - 1997
1   Brown University (context) - by, search et al. - 2001
1   Management Science (context) - for, simulations - 1989
1   Bounds for the minimal number of elements of an #-net in var.. (context) - Kolmogorov - 1955
1   The Transaction of the Institute of Electrical Enginners of .. (context) - using, algorithm et al. - 1999
1   Synchronizing to the environment: Information theoretic cons.. (context) - eld, Feldman - 2001
1   Machine Learning (context) - control, reinforcement et al. - 1998
1   Arti cial Intelligence at MIT: Expanding Frontiers (context) - Katz, for et al. - 1990
1   Con dence-based Q-routing: An on-line adaptive network routi.. (context) - Kumar, Miikkulainen - 1998
1   From Animals to Animats (context) - policies, limitations et al. - 1994
1   Learning from scarse experience (context) - Peshkin, Shelton - 2002
1   Semilinear predictability minimzation produces well-known fe.. (context) - Schmidhuber, Eldracher et al. - 1996
1   Brown University (context) - sequential, making et al. - 1996
1   Report FKI (context) - how, learning et al. - 1989
1   Handbook of Brain Theory and Neural Networks (context) - Loeb - 2001
1   How to compare common and rare words (context) - Savova, Peshkin - 2001
1   Statistical analysis of brain imaging data (context) - Peshkin - 1995
1   Optimal risk sensitive control of semi-Markov decision proce.. (context) - Chawla - 2000
1   Sucient statistics in the optimal control of stochastic syst.. (context) - Stribel - 1965
1   personal communication (context) - Szepesvari - 1999



The graph only includes citing articles where the year of publication is known.


Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC