189 citations found. Retrieving documents...
R. Sutton and A. Barto. Reinforcement Learning. 1998.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

A New Reinforcement Learning Algorithm - Serban (2003)   (Correct)

....URU Algorithm The algorithm is shown in Figure 1. 1) Initialize the state utilities with some initial values; 2) Initialize the current state with the initial state sc : si; 3) Choose a state s neighbor of sc (s h(sc) using some known action selection mechanisms (# Greedy or SoftMax [2]) following the steps: a) determine the set of successors of the current state (m = h(sc) b) if the current state has no successors (m is empty) return to the previous state (s : sc) otherwise go to step (c) c) select from m a subset m1 containing the states that were not visited yet ....

Sutton, R., Barto, A., G.: Reinforcement Learning. The MIT Press, Cambridge, England, 1998


Learning Probabilistic Models for Optimal Visual Servo.. - Nikovski, Nourbakhsh   (Correct)

....criteria usually result in intractable systems of differential equations. Learning the equations of motion further complicates the solution. The field of reinforcement learning is specifically concerned with optimal sequential decision making when the dynamics of the controlled system are unknown [10]. Hard control problems such as balancing an inverted pendulum and the Acrobot task have been studied extensively and also implemented on real setups [4, 9] The state of these systems, however, is assumed to be completely known, or (in the case of real robots) precisely measurable from joint ....

R. S. Sutton and A. G. Barto. Reinforcement learning. The MIT Press, Cambridge, Massachusetts, 1998.


Viewpoint Selection - Planning Optimal Sequences of.. - Deinzer, Denzler..   (Correct)

....to compute classification results, we have to be able to perform a fusion of several views. A way to solve this problem using particle filters is given in section 2.1. Second, the main task, the planning of view sequences, must be properly formulated. An approach based on reinforcement learning [14] is presented in section 2.2 2.1 Fusion of Multiple Views by Density Propagation In active object recognition a series of observed images f t , f t 1 , f 0 of an object are given together with the camera movements a t 1 , a 0 between these images. Based on these observations of ....

....for the decision process in (8) One of the demands defined in section 1 is that the selection of the most promising view should be learned without user interaction. Reinforcement learning provides many di#erent algorithms to estimate the action value function based on a trial and error method [14]. Trial and error means that the system itself is responsible for trying certain actions in a certain state. The result of such a trial is then used to update Q( and to improve its policy #. In reinforcement learning a series of episodes are performed: Each episode k consists of a sequence of ....

[Article contains additional citation context not shown here]

R.S. Sutton and A.G. Barto. Reinforcement Learning. A Bradford Book, Cambridge, London, 1998.


For a Formal Foundation of the Ant Programming Approach to.. - Birattari, al. (2000)   (Correct)

....the function Q which indeed informs on the long term cost of a given action, provided that future actions are selected optimally. In ant programming, as generally in reinforcement learning, the search in the space of the policies is performed through some form of generalized policy iteration [19]. Starting from some arbitrary initial policy, ant programming iteratively generates a number of paths in order to evaluate the current policy and then improves it on the basis of the result of the evaluation. At each iteration, therefore, a cohort of ants is considered, each generating a ....

....are to be related to the di erent update strategies in reinforcement learning. In particular, for an ant to propose values of T only for the visited transitions and on the basis of the cost of the associated solution, is equivalent to what in reinforcement learning is called Monte Carlo update [19]. On the other hand, it is equivalent to a Q learning update [20] to propose a value of T for a visited transition on the basis of the experienced cost for the transition itself and of the minimum of the current values that T assumes on the edges departing from the node to which the considered ....

[Article contains additional citation context not shown here]

R. S. Sutton and A. G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, USA, 1998.


Toward the formal foundation of Ant Programming - Birattari, Di Caro, Dorigo (2002)   (1 citation)  (Correct)

....of ant colony optimization but which is more amenable to theoretical analysis for what concerns the concepts of representation and state. In particular, ant programming bridges the terminological gap between ant colony optimization and the fields of optimal control [3] and reinforcement learning [17]. Accordingly, the name ant programming was chosen for its assonance with dynamic programming, with which ant programming has in common the stress on the concept of state and the related idea of reformulating an optimization problem as a multi stage decision problem and then searching for a good ....

....with the function Q which indeed informs on the long term cost of a given action, provided that future actions are selected optimally. In ant programming, as generally in reinforcement learning, the search in the space of the policies is performed through some form of generalized policy iteration [17]. Starting from some arbitrary initial policy, ant programming iteratively generates a number of paths in order to evaluate the current policy and then improves it on the basis of the result of the evaluation. At each iteration, therefore, a cohort of ants is considered, each generating a solution ....

[Article contains additional citation context not shown here]

R. S. Sutton and A. G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, USA, 1998.


Overcoming Non-stationarity in Uncommunicative Learning - February Un Vers   (Correct)

.... always defect or tit for tat , is considered as (evolutionarily) stable [Axelrod, 1984; Fudenberg and Kreps, 1993] To build multi agent learning algorithm, we resort to reinforcement learning, which is an elegant mathematical framework for studying such tasks because it requires few assumptions [Sutton and Barto, 1998]. The only crucial one is associating a utility for each state that In zero sum games the payo matrices are implied from agents own payo s. Monkey(B) Boss(A) Work Shirk Inspect P W I 0 I No Inspect P W R W (a) Work n Shirk Bob(B) Alice(A) Cooperate Defect Cooperate R=3 T=5 ....

R. S. Sutton and A. G. Barto, Reinforcement Learning, MIT Press, 1998.


Journal of Machine Learning Research 7 (2006) 1079--1105.. - Multi-Armed Bandit And   (Correct)

No context found.

R. Sutton and A. Barto. Reinforcement Learning. 1998.


Overcoming Non-stationarity in Uncommunicative Learning - February Un Vers   (Correct)

No context found.

R. S. Sutton and A. G. Barto, Reinforcement Learning, MIT Press, 1998.


Journal of Machine Learning Research 7 (2006) 493--518.. - Pat Langley Langley   (Correct)

No context found.

Sutton, R. S. and Barto, A. G. Reinforcement learning. Cambridge, MA: MIT Press, 1998.


Reinforcement Learning for Long-Run Average Cost - Abhijit Gosavi Assistant   (Correct)

No context found.

R. Sutton and A. G. Barto. Reinforcement Learning. The MIT Press, Cambridge, Massachusetts, 1998.


Unknown -   (Correct)

No context found.

Sutton, R. and Barto, A. G. (1998). Reinforcement learning. MIT Press, Cambridge, MA.


PEGASUS: A policy search method for large MDPs and POMDPs - Andrew Ng Uc (2000)   (35 citations)  (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1998.


Probabilistic Inference for Solving Discrete and.. - Markov Decision Processes   (Correct)

No context found.

Sutton, R., & Barto, A. (1998). Reinforcement learning. MIT Press, Cambridge.


An Analytic Solution to Discrete . . . - Poupart, Vlassis, Hoey, Regan   (Correct)

No context found.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.


How Do Computational Models of the Role of Dopamine as a.. - Error Map On   (Correct)

No context found.

Sutton, R.S., & Barto, A.G. (1998) Reinforcement learning, MIT Press.


Reinforcement Learning for Humanoid Robotics - Peters, Vijayakumar, Schaal (2003)   (Correct)

No context found.

Sutton, R.S., and Barto, A.G. Reinforcement Learning, The MIT Press, 1998


The Exploration-Exploitation Dilemma for Adaptive Agents - Rejeb, Guessoum, M'Hallah (2005)   (Correct)

No context found.

Sutton, R. S. and Barto, A.G.: Reinforcement learning, an introduction. The MIT Press, (1998).


Stochasticsand - Statistics Reinforcement Learning   (Correct)

No context found.

R. Button, A.G. Barto, Reinforcement Learning, The MIT Press,CambridL , MA, 1998.


A Model For Prejudiced Learning In Noisy - Environments Andreas Schmidt   (Correct)

No context found.

R. S. Sutton and A. G. Barto, Reinforcement Learning, MIT press, Cambridge, Massachusetts, London, England, 1998.


Learning a World Model and Planning With a - Self-Organizing Dynamic Neural   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, 1998.


The Evolution Of Genetic Representations And Modular Adaptation - Toussaint (2003)   (4 citations)  (Correct)

No context found.

Sutton, R. S. & A. G. Barto (1998). Reinforcement Learning. MIT Press, Cambridge.


Hybrid Reinforcement/Supervised Learning for Dialogue.. - Henderson, Lemon.. (2005)   (Correct)

No context found.

Richard Sutton and Andrew Barto. Reinforcement Learning. MIT Press, 1998.


A Reinforcement Learning Algorithm based on Policy Iteration for.. - Gosavi (2004)   (Correct)

No context found.

R.S. Sutton and Barto, A. (1996). Reinforcement Learning. Cambridge, MA: MIT Press.


Multi-Objective Hyper-Heuristic Approaches For Space.. - Burke, Silva, Soubeiga (2003)   (Correct)

No context found.

Sutton R.S., Barto A.G. (1998). Reinforcement Learning, MIT Press.


Reinforcing Reachable Routes - Varadarajan, Ramakrishnan.. (2003)   (1 citation)  (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998. 28


Using Machine Learning Techniques in Complex Multi-Agent Domains - Riedmiller, Merke (2002)   (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.


Using Machine Learning Techniques in Complex Multi-Agent Domains - Riedmiller, Merke (2002)   (Correct)

No context found.

A. Barto R. Sutton. Reinforcement Learning. MIT Press, Cambridge, Massachusetts, 1998.


Scheduling Nurses Using a Tabu-Search Hyperheuristic - Burke, Soubeiga (2003)   (2 citations)  (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement Learning. The MIT Press, 1998.


Creating a Robot Soccer Team from Scratch: the.. - Arbatzat, Freitag, .. (2003)   (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement learning. MIT Press, 1998.


Hierarchical Reinforcement Learning in.. - Fischer, Rovatsos, Weiss (2004)   (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement Learning. An Introduction. MIT Press/A Bradford Book, Cambridge, MA, 1998.


Optimal Admission Control Using Handover Prediction in Mobile - Cellular Networks Vicent   (Correct)

No context found.

R. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, Massachusetts: The MIT press, 1998.


Markov Decision Processes Based Optimal Control Policies.. - Abul, Alhajj, Polat   (Correct)

No context found.

R. Sutton and A. Barto. Reinforcement Learning. MIT Press, 1998.


On the Empirical State-Action Frequencies in Markov.. - Mannor, Tsitsiklis (2004)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, 1998. 29


Reinforcement Learning for Average Reward Zero-Sum Games - Mannor   (Correct)

No context found.

A.G. Barto and R.S. Sutton. Reinforcement Learning. MIT Press, 1998.


Asynchronous Neurocomputing for optimal - Control And Reinforcement (2004)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning, An introduction. BradFord Book. The MIT Press, 1998. 25


Finite-time Analysis of the Multi-armed Bandit Problem - Auer, Cesa-Bianchi, Fischer (2000)   (3 citations)  (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning, an Introduction. MIT Press / Bradford Books, Cambridge, 1998.


An Architectural Framework for Integrated Multiagent Planning.. - Weiß   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. An Introduction. MIT Press/A Bradford Book, Cambridge, MA, 1998.


Modular Self-Organization for a Long-Living Autonomous Agent - Bruno Scherrer Scherrer (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning, An introduction. BradFord Book. The MIT Press, 1998.


A Multiagent Variant of Dyna-Q - Weiß   (Correct)

No context found.

R. Sutton and A. Barto. Reinforcement Learning. An Introduction. MIT Press/A Bradford Book, Cambridge, MA, 1998.


Reinforcement Learning of a Simple Visual Task - Ryan Adams September   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.


The Domestic Robot - A Friendly Cognitive System Takes Care Of.. - Robot (2003)   (Correct)

No context found.

R. S. Sutton, A. G. B. (1998). Reinforcement Learning. A Bradford book. MIT press.


Parallel Asynchronous Distributed Computations - Of Optimal Control (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning, An introduction. BradFord Book. The MIT Press, 1998. Appendix: derivation of an upper bound of the interpolation error


Navigating the World-Wide-Web - Levene, Wheeldon (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, Ma., 1998.


Adaptive Building Automation - A multi-Agent approach - Rutishauser, Schäfer (2002)   (Correct)

No context found.

Andrew G.Barto Richard S.Sutton. Reinforcement Learning, An Introduction. MIT Press, 2000. ISBN : 0-262-19398-1.


Hierarchical Reinforcement Learning in.. - Fischer, Rovatsos, Weiss   (Correct)

No context found.

R. S. Sutton and A. G. Barto. Reinforcement Learning. An Introduction. MIT Press/A Bradford Book, Cambridge, MA, 1998.


Viewpoint Selection - Planning Optimal Sequences of.. - Deinzer, Denzler.. (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. A Bradford Book, Cambridge, London, 1998.


Cognitive Navigation Based on Nonuniform Gabor Space.. - Arleo, Smeraldi.. (2004)   (Correct)

No context found.

R. S. Sutton and A. G. Barto, Reinforcement Learning, an Introduction. Cambridge, MA: MIT Press-Bradford, 1998.


Modular Self-Organization for a Long-Living Autonomous Agent - Bruno Scherrer Scherrer (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning, An introduction. BradFord Book. The MIT Press, 1998.


Learning a World Model and Planning With a Self-Organizing.. - Toussaint (2004)   (1 citation)  (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, 1998.


Sampling-Based Planning and Control - Michael Branicky Michael (2003)   (Correct)

No context found.

R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC