(Enter summary)
Abstract: This dissertation presents work started at Brown University and completed at the Articial Intelligence Laboratory, MIT over the course of several years. A significant part of the research presented in this dissertation has been previously published elsewhere. (Update)
Context of citations to this paper: More
.... [10] We apply a more recent technique from reinforcement learning called GAPS (which stands for Gradient Ascent for Policy Search) [11]. In GAPS, the learner plays a parameterised strategy represented, e.g. by a finite state automaton, where parameters are probabilities...
.... APPROACH We propose to use a recent reinforcement learning algorithm called GAPS (which stands for Gradient Ascent for Policy Search) [10]. In GAPS, the learner plays a parameterised strategy represented by a non deterministic Moore automaton, where the parameters are the...
Cited by: More
Performance Management in Competitive Distributed Web Search - Khoussainov, Kushmerick (2003)
(Correct)
Automated Index Management for Distributed Web Search - Khoussainov, Kushmerick (2003)
(Correct)
Reinforcement Learning for Adaptive Routing - Peshkin, Savova (2002)
(Correct)
Similar documents (at the sentence level):
18.2%: Reinforcement Learning by Policy Search - Peshkin (2000)
(Correct)
5.1%: Bounds on Sample Size for Policy Evaluation in Markov.. - Peshkin, Mukherjee (2001)
(Correct)
Active bibliography (related documents): More All
1.3: Importance Sampling For Markov Chains: Asymptotics For The Variance - Glynn (1994)
(Correct)
1.0: Context-Based Policy Search: Transfer of Experience Across.. - Peshkin, de Jong (2002)
(Correct)
0.8: Learning POMDP Policies with Internal State using Gradient.. - Aberdeen, Baxter (2001)
(Correct)
Similar documents based on text: More All
0.1: Automatic Acquisition of Language Models for Speech recognition - McCandless (1994)
(Correct)
0.1: Building an Active Node on the Internet - Murphy (1997)
(Correct)
0.1: A Learning Approach to Personalized Information Filtering - Sheth (1994)
(Correct)
Related documents from co-citation: More All
7: A Course in Game Theory (context) - Osborne, Rubinstein - 1994
7: Complexity results about Nash equilibria
- Conitzer, Sandholm - 2002
7: Modelling Bounded Rationality (context) - Rubinstein - 1997
BibTeX entry: (Update)
Leonid Peshkin. Reinforcement Learning by Policy Search. PhD thesis, Brown University, Providence, RI, 2001. http://citeseer.ist.psu.edu/peshkin01reinforcement.html More
@misc{ peshkin01reinforcement,
author = "L. Peshkin",
title = "Reinforcement Learning by Policy Search",
text = "Leonid Peshkin. Reinforcement Learning by Policy Search. PhD thesis, Brown
University, Providence, RI, 2001.",
year = "2001",
url = "citeseer.ist.psu.edu/peshkin01reinforcement.html" }
Citations (may not include all citations):
1491
Learning internal representations by error propagation (context) - Rumelhart, Hinton et al. - 1986
1362
A tutorial on hidden Markov models and selected applications.. (context) - Rabiner - 1989
947
Statistical learning theory (context) - Vapnik - 1998
658
Learning from delayed rewards (context) - Watkins - 1989
614
Reinforcement learning: An introduction
- Sutton, Barto - 1998
493
Communications of the ACM (context) - Valiant, of et al. - 1984
413
Neuro-dynamic programming (context) - Bertsekas, Tsitsiklis - 1996
362
An introduction to hidden Markov models (context) - Rabiner, Juang - 1986
318
Convergence of stochastic processes (context) - Pollard - 1984
317
Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
310
A note on two problems in connection with graphs (context) - Dijkstra - 1959
282
A course in game theory (context) - Osborne, Rubinstein - 1994
268
Dynamic programming and Markov processes (context) - Howard - 1960
246
Markov decision processes (context) - Puterman - 1994
221
Society of mind (context) - Minsky - 1996
216
The optimal control of partially observable Markov processes (context) - Sondik - 1971
216
The optimal control of partially observable Markov processes.. (context) - Smallwood, Sondik - 1973
199
Markov games as a framework for multi-agent reinforcement le..
- Littman - 1994
157
Probability inequalities for sums of bounded random variable.. (context) - Hoe - 1963
148
Acting optimally in partially observable stochastic domains
- Cassandra, Kaelbling et al. - 1994
126
Simulation and the Monte Carlo method (context) - Rubinstein - 1981
124
Improving elevator performance using reinforcement learning
- Crites, Barto - 1997
120
Large deviation techniques in decision (context) - Bucklew - 1990
99
Multiagent reinforcement learning: Theoretical framework and..
- Hu, Wellman - 1998
96
International Journal on Computer Vision (context) - Aloimonos, Weiss et al. - 1987
88
Neural networks learning: Theoretical foundations (context) - Anthony, Bartlett - 1999
85
The dynamics of reinforcement learning in cooperative multia..
- Claus, Boutilier - 1998
84
Princeton University Press (context) - Bellman, programming - 1957
83
From local actions to global tasks: Stigmergy and collective.. (context) - Beckers, Holland et al. - 1994
81
Reinforcement learning algorithm for partially observable Ma..
- Jaakkola, Singh et al. - 1994
70
Policy gradient methods for reinforcement learning with func..
- Sutton, McAllester et al. - 1999
60
Purposive behavior acquisition for a real robot by vision-ba..
- Asada, Nakamura et al. - 1996
60
A representation for the adaptive generation of simple seque.. (context) - Cramer - 1985
56
Algorithms for non-negative matrix factorization
- Lee, Seung - 2000
55
Reinforcement learning for dynamic channel allocation in cel..
- Singh, Bertsekas - 1997
54
High-level vision (context) - Ullman - 1996
53
Gradient descent for general reinforcement learning
- Baird, Moore - 1999
52
Packet routing in dynamically changing networks: A reinforce..
- Boyan, Littman - 1994
50
Simulation-based optimization of Markov reward processes
- Marbach, Tsitsiklis - 1998
47
Numerical recipes in C: The art of scienti c computing (context) - Press, Teukolsky et al. - 1993
44
PEGASUS: A policy search method for large MDPs and POMDPs
- Ng, Jordan - 2000
40
Learning factorial codes by predictability minimization
- Schmidhuber - 1992
38
Convergence results for single-step on-policy reinforcement-..
- Singh, Jaakkola et al. - 2000
37
Natural gradient works eciently in learning (context) - Amari - 1998
36
Approximate planning in large POMDPs via reusable trajectori..
- Kearns, Mansour et al. - 1999
35
Planning and acting in partially observable stochastic domai..
- Kaelbling, Littman et al. - 1998
34
Solving very large weakly coupled Markov decision processes
- Meuleau, Hauskrecht et al. - 1998
34
Ants and reinforcement learning: A case study in routing in ..
- Subramanian, Druschel et al. - 1997
33
Shifting inductive bias with success-story algorithm
- Schmidhuber, Zhao et al. - 1997
33
the method of bounded dierences (context) - McDiarmid, combinatorics et al. - 1989
33
Robust monte carlo methods for light transport simulation (context) - Veach - 1997
31
Reinforcement learning in POMDP's via direct gradient ascent
- Baxter, Bartlett - 2000
29
The complexity of Markov decision processes (context) - Papadimitriou, Tsitsiklis - 1987
29
Planning and control in stochastic domains with imperfect in..
- Hauskrecht - 1998
29
Measuring the VCdimension of a learning machine
- Vapnik, Levin et al. - 1994
28
Finite-memory control of partially observable systems (context) - Hansen - 1998
27
Using collective intelligence to route internet trac
- Wolpert, Tumer et al. - 1998
26
IEEE Transactions on Information Theory (context) - Merhav, Feder et al. - 1998
26
Learning policies with external memory
- Peshkin, Meuleau et al. - 1999
26
Uniform central limit theorems (context) - Dudley - 1999
25
Reinforcement learning with selective perception and hidden .. (context) - McCallum - 1996
24
Model selection and error estimation
- Bartlett, Boucheron et al. - 2000
24
Dynamic programming and optimal control (context) - Bertsekas - 1995
23
Direct search algorithms for optimization calculations
- Powell - 1998
22
Neuronlike adaptive elements that can solve dicult learning .. (context) - Barto, Sutton et al. - 1983
22
Sample complexity for learning recurrent perceptron mappings
- Dasgupta, Sontag - 1996
21
Theoretical neuroscience (context) - Dayan, Abbott - 2001
21
Adaptive Behavior (context) - Wiering, Schmidhuber - 1998
21
Learning and value function approximation in complex decisio..
- Van Roy - 1998
21
Call admission control and routing in integrated service net..
- Marbach, Mihatsch et al. - 2000
20
Oxford English dictionary (context) - Simpson, Weiner - 1989
19
Exploration of multi-state environments: Local measures and ..
- Meuleau, Bourgine - 1999
18
Distributed value functions
- Schneider, Wong et al. - 1999
18
Reinforcement learning in connectionist networks: A mathemat.. (context) - Williams - 1986
17
Predictive reward signal of dopamine neurons (context) - Schultz - 1998
17
Evolutionary Algorithms for Reinforcement Learning
- Moriarty, Schultz et al. - 1999
16
Practical issues in temporal dierence learning (context) - Tesauro - 1992
16
Manifesto for an Evolutionary Economics of Intelligence (context) - Baum, networks et al. - 1998
15
Worst case prediction over sequences in under log loss
- Opper, Haussler - 1997
14
The sciences of the arti cial (context) - Simon - 1996
13
Methods of reducing sample size in Monte Carlo computations (context) - Kahn, Marshall - 1953
13
Classi er systems and genetic algorithms (context) - Booker, Goldberg et al. - 1989
12
Reinforcement learning for call admission control and routin..
- Marbach, Mihatsch et al. - 1998
12
Ant colonies for adaptive routing in packetswitched communic..
- Di Caro, Dorigo - 1998
11
Greedy packet scheduling
- Cidon, Kutten et al. - 1995
11
A framework for mesencephalic dopamine systems based on pred.. (context) - Montague, Dayan et al. - 1996
11
Simulation-based methods for Markov decision processes
- Marbach - 1998
11
Learning nite-state controllers for partially observable env.. (context) - Meuleau, Peshkin et al. - 1999
10
Localizing policy gradient estimates to action transitions
- Grudic, Ungar - 2000
10
Optimizing admission control while ensuring quality of servi..
- Brown, Tong et al. - 1999
9
Sequential optimality and coordination in multiagent systems
- Boutilier - 1999
9
Continual learning in reinforcement environments (context) - Ring - 1994
9
Exploration in gradient-based reinforcement learning (context) - Meuleau, Peshkin et al. - 2000
8
Self calibration of motion and stereo vision for mobile robo..
- Brooks, Flynn et al. - 1987
8
Operations Research (context) - control, observable et al. - 1978
8
Adaptive importance sampling in Monte Carlo integration (context) - Oh, Berger - 1992
8
Exact and approximate algorithms for partially observable Ma.. (context) - Cassandra - 1998
7
Optimization of stochastic systems (context) - Glynn - 1986
7
Reinforcement learning through gradient descent
- Baird - 1999
7
Function optimization using connectionist reinforcement lear..
- Williams, Peng - 1991
7
Learning without state-estimation in partially observable Ma..
- Singh, Jaakkola et al. - 1994
7
Weighted uniform sampling - a Monte Carlo technique for redu.. (context) - Powell, Swann - 1966
7
Open theoretical questions in reinforcement learning
- Sutton - 1999
7
Solving combinatorial optimization tasks by reinforcement le..
- Zhang, Dietterich - 2000
7
capacity of sets in function spaces (context) - Kolmogorov, Tikhomirov - 1961
6
Adaptive importance sampling for estimation in structured do..
- Ortiz, Kaelbling - 2000
6
Fast reinforcement learning through eugenic neuro-evolution
- Polani, Miikkulainen - 1999
5
Stability of the gradient process in n- person games (context) - Arrow, Hurwicz - 1960
5
Direct policy search using paired statistical tests
- Strens, Moore - 2001
5
analysi actorcritic algorithm using eligibility trace Reinfo.. (context) - Kimura, An et al. - 1998
5
Adaptive importance sampling for integration (context) - Zhou - 1998
5
A framework for control of a camera head
- Andersen - 1996
5
Importance sampling for reinforcement learning with multiple.. (context) - Shelton - 2001
5
Using eligibility traces to nd the best memoryless policy in.. (context) - Loch, Singh - 1998
5
Arti cial Intelligence (context) - Ballard, vision - 1991
5
Decision theoretic generalizations of the PAC model (context) - Haussler - 1992
5
Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
4
Universal sequential coding of single measures (context) - Shtarkov - 1987
4
ect of representation and knowledge on goal-directed explora.. (context) - Koenig, Simmons - 1996
4
Machine Learning (context) - of, an et al. - 1999
4
Ecient exploration in reinforcement learning (context) - Thrun - 1992
4
A neural substrate of prediction and reward (context) - Schultz, Dayan et al. - 1997
4
Advances in Neural Information Processing Systems (context) - Konda, Tsitsiklis et al. - 1999
4
Direct policy search and uncertain policy evaluation
- Schmidhuber, Zhao - 1998
4
Kernel-based reinforcement learning in average-cost problems..
- Ormoneit, Glynn - 2000
4
Risk-sensitive and minimax control of discrete-time
- Coraluppi, Marcus - 1999
4
ding's inequality for uniformly ergodic Markov chains (context) - Glynn, Ormoneit - 2001
3
A new family of estimators for random walk problems (context) - Spanier - 1979
3
policy gradient approach to network routing (context) - Tao, Baxter et al. - 2001
3
Low power wireless communication via reinforcement learning
- Brown - 1999
3
policy policy evaluation (context) - Precup, Sutton et al. - 2000
3
The Monte Carlo method and related problems (context) - Ermakov - 1971
3
The bias-variance dilemma of the Monte Carlo method
- Zlochin, Baram - 2000
3
Reinforcement learning for embedded agents facing complex ta..
- Mart - 1998
3
Reinforcement learning and mistake bounded algorithms
- Mansour - 1999
3
Feature discovery by competetive learning (context) - Rumelhart, Zipser - 1986
3
Advances in Neural Information Processing Systems (context) - Kakade, policy - 2001
3
Selecting approximately-optimal actions in complex structura.. (context) - Ortiz - 2002
3
The compplexity of decentralized control of Markov decision .. (context) - Bernstein, Zilberstein et al. - 2000
2
On biological plausibility of policy search (context) - Peshkin, Savova - 2002
2
Design and analysis of cognitive packet networks (context) - Gelenbe, Lent et al. - 2001
2
Statisticheskoje modelirovanie (context) - Ermakov, Mikailov - 1982
2
Reinforcement learning for admission control and routing (context) - Carlstrom - 2000
2
Central limit theorems for Markov random walks (context) - Niemi, Nummelin - 1982
2
An integrated connectionist approach to reinforcement learni..
- Hougen, Gini et al. - 2000
2
Learning to control an inverted pendilum using neural networ.. (context) - Anderson - 1989
2
Communications of the ACM (context) - gradient, stochastic - 1990
2
Hebbian synaptic modi cations in spiking neurons that learn (context) - Bartlett, Baxter - 1999
2
Learning to generate arti cial fovea trajectories for target.. (context) - Schmidhuber, Huber - 1991
2
Narendra and Mandayam (context) - Kumpati - 1989
2
Machine Learning (context) - gradient-following, connectionist et al. - 1992
2
Worst-case bounds for the logarithmic loss of predictors
- Cesa-Bianchi, abor - 2001
2
Dopamine bonuses
- Kakade, Dayan - 2001
2
Bounds on sample size for policy evaluation in Markov enviro..
- Peshkin, Mukherjee - 2001
1
The theory of probability (context) - Bernstein - 1946
1
pole balancing with recurrent evolutionary networks (context) - Gomez, Miikkulainen - 1998
1
SIAM Journal on Control and Optimization (context) - algorithms - 2001
1
Stochastic Models (context) - for, chains et al. - 1994
1
Ecient reinforcement learning through symbolic evolution (context) - Moriarty, Miikkulainen - 1996
1
Machine Learning (context) - of, processes et al. - 2000
1
Evolutionary Computation (context) - networks, ecient et al. - 1997
1
Brown University (context) - by, search et al. - 2001
1
Management Science (context) - for, simulations - 1989
1
Bounds for the minimal number of elements of an #-net in var.. (context) - Kolmogorov - 1955
1
The Transaction of the Institute of Electrical Enginners of .. (context) - using, algorithm et al. - 1999
1
Synchronizing to the environment: Information theoretic cons.. (context) - eld, Feldman - 2001
1
Machine Learning (context) - control, reinforcement et al. - 1998
1
Arti cial Intelligence at MIT: Expanding Frontiers (context) - Katz, for et al. - 1990
1
Con dence-based Q-routing: An on-line adaptive network routi.. (context) - Kumar, Miikkulainen - 1998
1
From Animals to Animats (context) - policies, limitations et al. - 1994
1
Learning from scarse experience (context) - Peshkin, Shelton - 2002
1
Semilinear predictability minimzation produces well-known fe.. (context) - Schmidhuber, Eldracher et al. - 1996
1
Brown University (context) - sequential, making et al. - 1996
1
Report FKI (context) - how, learning et al. - 1989
1
Handbook of Brain Theory and Neural Networks (context) - Loeb - 2001
1
How to compare common and rare words (context) - Savova, Peshkin - 2001
1
Statistical analysis of brain imaging data (context) - Peshkin - 1995
1
Optimal risk sensitive control of semi-Markov decision proce.. (context) - Chawla - 2000
1
Sucient statistics in the optimal control of stochastic syst.. (context) - Stribel - 1965
1
personal communication (context) - Szepesvari - 1999
The graph only includes citing articles where the year of publication is known.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC