658 citations found. Retrieving documents...
Watkins, C. (1989). Learning From Delayed Rewards. Ph.D. thesis, Cambridge University. Webb, S. (1994). Optimising the planning of intensity-modulated radiotherapy. Phys. Med. Biol., 39, 2229--2246.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Journal of Machine Learning Research 1 (????) ??--??.. - Improve Local Search   (Correct)

No context found.

Watkins, C. (1989). Learning From Delayed Rewards. Ph.D. thesis, Cambridge University. Webb, S. (1994). Optimising the planning of intensity-modulated radiotherapy. Phys. Med. Biol., 39, 2229--2246.


Probabilistic Policy Reuse in a Reinforcement Learning - Agent Fernando Fern   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.


Probabilistic Reuse of Past Policies - Fernando Fern Andez   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989. 11


Reusing and Building a Policy Library - Fernando Fern Andez   (Correct)

No context found.

Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, King's College, Cambridge, UK.


Journal of Artificial Intelligence Research 22 (2004).. - Michael Bowling Bowling   (Correct)

No context found.

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, King's College, Cambridge, UK.


Building a Library of Policies through Policy - Reuse Fernando Fern   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.


Existence of Multiagent Equilibria with Limited Agents - Michael Bowling Mhb   (Correct)

No context found.

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, King's College, Cambridge, UK.


Exploration and Policy Reuse - Fernando Fern Andez   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989. 12


Convergence Problems of General-Sum Multiagent.. - Michael Bowling Mhb   (Correct)

No context found.

Watkins, C. J. C. H. (1989). Learning from delayed rewards.


in Continuous - State And Action   (Correct)

No context found.

Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.


Journal of Machine Learning Research 7 (2006) 1079--1105.. - Multi-Armed Bandit And   (Correct)

No context found.

C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.


Reinforcement Learning applied to the control of an - Autonomous Underwater Vehicle   (Correct)

No context found.

Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.


Algorithms for Planning under Uncertainty in Prediction .. - O'Kane, Tovar, Cheng.. (2005)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England, 1989.


Boundedness of Iterates in Q-Learning - Abhijit Gosavi Department   (Correct)

No context found.

C. Watkins. Learning from Delayed Rewards. Ph.D. thesis, Kings College, Cambridge, England, 1989. 8


Reinforcement Learning for Long-Run Average Cost - Abhijit Gosavi Assistant   (Correct)

No context found.

C.J. Watkins. Learning from Delayed Rewards. PhD thesis, Kings College, Cambridge, England, May 1989. 44


Reinforcement Learning Algorithm for - Partially Observable Markov   (Correct)

No context found.

Watkins, C.J.C.H. (1989). Learning from delayed rewards. PhD Thesis, University of Cambridge, England.


On the Convergence of Stochastic Iterative - Dynamic Programming Algorithms   (Correct)

No context found.

Watkins, C.J.C.H. (1989). Learning from delayed rewards. PhD Thesis, University of Cambridge, England.


Map Learning with Uninterpreted Sensors and Effectors - David Pierce And (1997)   (14 citations)  (Correct)

No context found.

Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge.


Intensive Reinforcement Learning - Wawrzyski (2005)   (Correct)

No context found.

C. Watkins, "Learning From Delayed Rewards," Ph.D. dissertation, Cambridge University Press, Cambridge, England, 1989.


A New Approach in Agent Path-Finding using State Mark - Gradients Florin Leon   (Correct)

No context found.

Watkins, C. J. C. H., Learning from Delayed Rewards, PhD Thesis, King's College, Cambridge University, 1989.


Adaptive Robotic Communication Using Coordination Costs.. - Rosenfeld, Kaminka.. (2006)   (Correct)

No context found.

Watkins, C. J. C. H. 1989. Learning from delayed rewards. Ph.D. Dissertation, Kings College.


Adaptive Robotic Communication Using Coordination Costs - Rosenfeld, Kaminka, Kraus (2006)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from delayed rewards. Ph.D. Dissertation, Kings College, 1989.


Spoken Dialogue Management Using Hierarchical Reinforcement.. - Cuayįhuitl (2005)   (Correct)

No context found.

Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD Thesis, King's College.


Building a Library of Policies through Policy - Reuse Fernando Fern (2005)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.


Probabilistic Reuse of Past Policies - Fernando Fern Andez (2005)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989. 11


Exploration and Policy Reuse - Fernando Fern Andez (2005)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989. 12


Impact of Imitation on the Dynamics of Animat.. - Laroque, Cuperlier.. (2004)   (Correct)

No context found.

C.J.C.H. Watkins, Learning from delayed rewards, PhD Thesis, Psychology Dept, Cambridge University, England, 1989.


Towards a Unified Theory of State Abstraction for MDPs - Li, Walsh, Littman (2006)   (Correct)

No context found.

Christopher J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, University of Cambridge, UK, 1989.


Improving Coordination with Communication in Multi-agent.. - Learning Daniel Szer (2004)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, King's College of Cambridge, UK., 1989.


On Utilizing Stochastic Learning Weak Estimators for Training.. - Oommen, Rueda   (Correct)

No context found.

C. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, England, 1989.


Heuristically Accelerated Q-Learning: a new approach to.. - Bianchi, Ribeiro, Costa   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.


An Evaluation of TRACA's Generalisation Performance - Matthew Mitchell School (2002)   (Correct)

No context found.

Watkins, C. (1989). Learning From Delayed Rewards.


A Reinforcement Learning Approach for Supply Chain Management - Stockheim, Schwind, Koenig   (Correct)

No context found.

Watkins, C. J. (1989), Learning from Delayed Rewards, PhD thesis, Cambridge University, Cambridge, MA.


Reinforcement Learning for Parameter Control of Text Detection .. - Taylor, Wolf (2004)   (Correct)

No context found.

C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.


Reinforcement Learning for Factored Markov Decision Processes - Sallans (2002)   (Correct)

No context found.

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Cambridge, UK: Cambridge University. Ph.D. thesis.


epsilon-MDPs: Learning in Varying Environments - Szita, Takacs, Lorincz (2002)   (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards. Ph.d. thesis, King's College, Cambridge, UK, 1989.


High Quality Thermostat Control by Reinforcement Learning - A.. - Riedmiller (1998)   (Correct)

No context found.

C. J. Watkins. Learning from Delayed Rewards. Phd thesis, Cambridge University, 1989.


Using Machine Learning Techniques in Complex Multi-Agent Domains - Riedmiller, Merke (2002)   (Correct)

No context found.

C. J. Watkins. Learning from Delayed Rewards. Phd thesis, Cambridge University, 1989.


Reinforcement Learning for Cooperating and.. - Riedmiller, Moore.. (2001)   (1 citation)  (Correct)

No context found.

C. J. Watkins. Learning from Delayed Rewards. Phd thesis, Cambridge University, 1989.


A Characterization of Sapient Agents - van Otterlo, Wiering, Dastani..   (Correct)

No context found.

C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, King's College, Cambridge, England, 1989.


To be published in: IEEE International Conference on - Systems Man And   (Correct)

No context found.

C. J. Watkins. Learning from Delayed Rewards. Phd thesis, Cambridge University, 1989.


Improving the Learning Rate by Inducing a Transition Model - Department (2004)   (Correct)

No context found.

C. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, 1989.


Explanation-Based Neural Network Learning for Robot Control - Mitchell, Thrun (1992)   (22 citations)  (Correct)

No context found.

Chris J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, England, 1989.


Complementing Search Engines with Online Web Mining Agents - Menczer (2002)   (2 citations)  (Correct)

No context found.

C. Watkins, Learning from delayed rewards, PhD thesis, King's College, Cambridge, UK, 1989.


Folk Psychology for Human Modelling: Extending the BDI.. - Emma Norling Department (2004)   (3 citations)  (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards.PhD thesis, King's College, Cambridge, UK, 1989.


Reinforcement Learning for Stochastic Cooperative.. - Lauer, Riedmiller (2004)   (Correct)

No context found.

C. Watkins. Learning from delayed rewards.PhDthesis. Cambridge, UK, 1989.


Reinforcement Learning for Stochastic Cooperative.. - Lauer, Riedmiller   (Correct)

No context found.

C. Watkins. Learning from delayed rewards. PhD thesis. Cambridge, UK, 1989.


Intelligent Traffic Light Control - Wiering, van Veenen, Vreeken.. (2004)   (Correct)

No context found.

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards.PhDthesis, King's College, Cambridge, England.


An Adaptive Network Routing Strategy with Temporal Differences - Yvn Tpac Valdivia   (Correct)

No context found.

Watkins, C. J. C. H.: Learning from Delayed Rewards, Ph.D. thesis, University of Cambridge, England (1989).


Folk Psychology for Human Modelling: Extending the BDI Paradigm - Norling (2004)   (3 citations)  (Correct)

No context found.

C. J. C. H. Watkins. Learning from Delayed Rewards.PhD thesis, King's College, Cambridge, UK, 1989.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC