Home     Top: Machine Learning: Reinforcement Learning    [Case-based Learning   Fuzzy Systems   Genetic Algorithms   Neural Networks   Pattern Recognition   Reinforcement Learning   Rule Based Systems   Vision]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the number of citations

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

265   Reinforcement Learning I: Introduction - Sutton, Barto (1998)   (Correct)
Introduction Richard S. Sutton and Andrew G. Barto c fl All rights reserved [In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to oth... / Course Notes Reinforcement Learning I Introduction br intuitive sense of what reinforcement learning is and how it differs and

236   Reinforcement Learning: A Survey - Leslie Pack Kaelbling, Michael L.. (1996)   (Correct)
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of t... / published Reinforcement Learning A Survey Leslie Pack br paper surveys the field of reinforcement learning from a computer-science

179   Learning to Act using Real-Time Dynamic Programming - Barto, Bradtke, Singh (1995)   (Correct)
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning ... / aspects of other DP-based reinforcement learning methods such as Watkins' br algoithms are examples of reinforcement learning methods by which

154   Automatic Programming of Behavior-based Robots using Reinforcement.. - Mahadevan, Connell (1991)   (Correct)
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two a... / credit assignment in reinforcement learning. PhD thesis University br cooperative mechanisms in reinforcement learning. In Proceedings of the

104   Prioritized Sweeping: Reinforcement Learning with Less Data and Less.. - Moore, Atkeson (1993)   (Correct)
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Qlearning have fast ... / Prioritized Sweeping Reinforcement Learning with Less Data and Less br Sweeping with other reinforcement learning schemes for a number of

104   Markov games as a framework for multi-agent reinforcement learning - Littman (1994)   (Correct)
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic ... / a framework for multi-agent reinforcement learning Michael L. Littman br MDP formalization of reinforcement learning a single adaptive agent

96   Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents - Tan (1993)   (Correct)
Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information,... / Multi-Agent Reinforcement Learning Independent vs. br Given the same number of reinforcement learning agents will cooperative

90   Improving Elevator Performance Using Reinforcement Learning - Crites, Barto (1996)   (Correct)
This paper describes the application of reinforcement learning (RL) to the difficult real world problem of elevator dispatching. The elevator domain poses a combination of challenges not seen in most ... / Elevator Performance Using Reinforcement Learning Robert H. Crites br the application of reinforcement learning RL to the difficult

88   Reinforcement Learning with Perceptual Aliasing: The Perceptual.. - Chrisman (1992)   (Correct)
It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situat... / Reinforcement Learning with Perceptual Aliasing br the effectiveness of reinforcement learning algorithms Whitehead

87   The Parti-game Algorithm for Variable Resolution Reinforcement.. - Moore, Atkeson (1995)   (Correct)
Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly... / for Variable Resolution Reinforcement Learning in Multidimensional br few minutes. Keywords Reinforcement Learning Curse of Dimensionality

85   Machine Learning Research: Four Current Directions - Dietterich (1997)   (Correct)
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) impr... / learning algorithms c reinforcement learning and d learning complex br learning algorithms c reinforcement learning and d learning complex

84   Feudal Reinforcement Learning - Dayan, Hinton (1993)   (Correct)
One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-learning managerial hierarchy... / San Mateo CA Feudal Reinforcement Learning Peter Dayan CNL The br One way to speed up reinforcement learning is to enable learning to

83   Simple Statistical Gradient-Following Algorithms for Connectionist.. - Williams (1992)   (Correct)
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are show... / for Connectionist Reinforcement Learning Ronald J. Williams br class of associative reinforcement learning algorithms for

77   Generalization in Reinforcement Learning: Successful Examples Using.. - Sutton (1996)   (Correct)
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases th... / . Generalization in Reinforcement Learning Successful Examples br On large problems reinforcement learning systems must use

76   On the Convergence of Stochastic Iterative Dynamic Programming.. - Jaakkola, Jordan, Singh (1993)   (Correct)
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD() algorit... / developments in the area of reinforcement learning have yielded a number of

75   Artificial Life and Real Robots - Brooks (1992)   (Correct)
The first part of this paper explores the general issues in using Artificial Life techniques to program actual mobile robots. In particular it explores the difficulties inherent in transferring progra... / new behaviors using reinforcement learning e.g.Kaelbling and br Behavior-based Robots using Reinforcement Learning Sridhar Mahadevan and

72   Generalization in Reinforcement Learning: Safely Approximating the.. - Boyan, Moore (1995)   (Correct)
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimension... / Generalization in Reinforcement Learning Safely Approximating the br curse of dimensionality in reinforcement learning and dynamic programming

69   Learning policies for partially observable environments: Scaling up - Littman, Cassandra, Kaelbling (1995)   (Correct)
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of... / and practice. Using reinforcement-learning techniques and insights br problems addressed in the reinforcement-learning literature Moore

65   Cooperative Mobile Robotics: Antecedents and Directions - Cao, Fukunaga, Kahng, Meng (1995)   (Correct)
There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying suc... / fault-tolerance and reinforcement learning. By contrast DJR br the architecture that uses reinforcement learning to adjust the parameters

62   Residual Algorithms: Reinforcement Learning with Function.. - Leemon Baird (1995)   (Correct)
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can ... / Residual Algorithms Reinforcement Learning with Function br ABSTRACT A number of reinforcement learning algorithms have been

62   Transfer of Learning by Composing Solutions of Elemental Sequential.. - Singh (1992)   (Correct)
Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focussed on singl... / tasks most applications of reinforcement learning have focussed on single br application of reinforcement learning to multiple tasks requires

61   Approximating Optimal Policies for Partially Observable Stochastic.. - Parr, Russell (1995)   (Correct)
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence. If the state of the world is known at all times, the world can be modeled as a Markov Decision Pr... / can be combined with reinforcement learning methods a combination br rule that is amenable to reinforcement learning methods and will permit

58   Reinforcement Learning Algorithm for Partially Observable Markov.. - Tommi Jaakkola (1995)   (Correct)
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov ass... / Reinforcement Learning Algorithm for Partially br attention has been paid to reinforcement learning algorithms in recent

58   Learning to coordinate without sharing information - Sen, Sekaran, Hale (1994)   (Correct)
Researchers in the field of Distributed Artificial Intelligence (DAI) have been developing efficient mechanisms to coordinate the activities of multiple autonomous agents. The need for coordination ar... / coordination. We use reinforcement learning techniques on a block br on similar problems. Reinforcement learning based coordination can be

56   The Role Of Exploration In Learning Control - Thrun (1992)   (Correct)
Introduction Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be sufficiently explored in ord... / adaptive neurocontrol and reinforcement learning. In Section we discuss br trade-off Kaelbling reinforcement learning Watkins

55   Reinforcement Learning with Replacing Eligibility Traces - Singh (1996)   (Correct)
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze i... / in The Netherlands. Reinforcement Learning with Replacing br basic mechanisms used in reinforcement learning to handle delayed reward.

54   A Reinforcement Learning Approach to Job-shop Scheduling - Zhang (1995)   (Correct)
We apply reinforcement learning methods to learn domain-specific heuristics for job shop scheduling. A repair-based scheduler starts with a critical-path schedule and incrementally repairs constraint ... / A Reinforcement Learning Approach to Job-shop br A. Abstract We apply reinforcement learning methods to learn

48   Classifier Fitness Based on Accuracy - Wilson (1995)   (Correct)
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS,... / for a wide range of reinforcement learning situations where br for a wide range of reinforcement learning situations where

47   Reinforcement Learning with Hierarchies of Machines - Ron Parr (1997)   (Correct)
We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of ... / Reinforcement Learning with Hierarchies of br present a new approach to reinforcement learning in which the policies

47   Stable Function Approximation in Dynamic Programming - Gordon (1995)   (Correct)
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area ... / Abstract The success of reinforcement learning in practical problems br W. Moore. Generalization in reinforcement learning safely approximating the

46   Efficient Algorithms for Minimizing Cross Validation Error - Moore, Lee (1994)   (Correct)
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict futu... / exploitation dilemma in reinforcement learning. Greiner and Jurisica

45   Incremental Multi-Step Q-Learning - Peng, Williams (1996)   (Correct)
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD() return estimation process, which is ty... / dynamic programming-based reinforcement learning method with the TD br dynamic programming-based reinforcement learning method. The parameter

45   Efficient Exploration In Reinforcement Learning - Sebastian B. Thrun (1992)   (Correct)
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, di... / Efficient Exploration In Reinforcement Learning Sebastian B. Thrun br domains embedded in a reinforcement learning framework delayed

44   An Adaptive Communication Protocol for Cooperating Mobile Robots - Yanco, Stein (1993)   (Correct)
We describe mobile robots engaged in a cooperative task that requires communication. The robots are initially given a fixed but uninterpreted vocabulary for communication. In attempting to perform the... / the design of appropriate reinforcement learning algorithms to learn br his symbolic test suite for reinforcement learning algorithms. Work on the

44   Algorithms for Sequential Decision Making - Littman (1996)   (Correct)
of "Algorithms for Sequential Decision Making" by Michael Lederman Littman, Ph.D., Brown University, May 1996. unknown Michael Lederman Liftman Ph.D. Dissertation Department of Computer Science Br... / anyone makes the field of reinforcement learning a nice place to work. br Justin Boyan games and reinforcement learning Anne Condon solving

43   Overcoming Incomplete Perception with Utile Distinction Memory - McCallum (1993)   (Correct)
This paper presents a method by which a reinforcement learning agent can solve the incomplete perception problem using memory. The agent uses a hidden Markov model (HMM) to represent its internal stat... / a method by which a reinforcement learning agent can solve the br will build a In reinforcement learning good task performance is

42   Efficient Memory-based Learning for Robot Control - Moore (1990)   (Correct)
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data rec... / . Reinforcement Learning using Dynamic Programming br what was correct. C Reinforcement learning. If after each action is

40   Efficient Learning and Planning Within the Dyna Framework - Peng, Williams (1993)   (Correct)
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhanc... / be cast in the form of reinforcement learning tasks. Recent work in br and the creation of new reinforcement learning algorithms such as

40   Packet Routing in Dynamically Changing Networks: A Reinforcement.. - Boyan, Littman (1994)   (Correct)
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each no... / Changing Networks A Reinforcement Learning Approach Justin A. Boyan br packet routing in which a reinforcement learning module is embedded into

39   Tight Performance Bounds on Greedy Policies Based on Imperfect Value.. - Williams (1993)   (Correct)
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal va... / result from applying a reinforcement learning algorithm. Unless this br error typically used in reinforcement learning applications. The

39   Reinforcement Learning for Dynamic Channel Allocation in Cellular.. - Satinder Singh (1997)   (Correct)
In cellular telephone systems, an important problem is to dynamically allocate the communication resource (channels) so as to maximize service in a stochastic caller environment. This problem is natur... / Reinforcement Learning for Dynamic Channel br problem and we use a reinforcement learning RL method to find

38   Soccer Server: a tool for research on multi-agent systems - Noda, Matsubara, Hiraki, Frank (1997)   (Correct)
This paper describes Soccer Server, a simulator of the game of soccer designed as a test-bench for evaluating multi-agent systems and cooperative algorithms. In real life, successful soccer teams requ... / colleagues have been using reinforcement learning to develop the skills of a br and K. Hosoda. Vision-based reinforcement learning for purposive behavior

37   Memoryless Policies: Theoretical Limitations and Practical Results - Michael L. Littman (1994)   (Correct)
One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time it takes to reach a goal state. This paper presents some theoretical and empirical findings on algorit... / and more recently by reinforcement learning researchers e.g. br A classic example from the reinforcement learning literature is Sutton's

36   Purposive Behavior Acquisition on a Real Robot by a Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1994)   (Correct)
In [1], we have presented the soccer robot which had learned to shoot a ball into the goal using the Q-learning. In this paper, we discuss several issues in applying the Qlearning method to a real rob... / Robot By A Vision-Based Reinforcement Learning Minoru Asada Shoichi br method for robot learning reinforcement learning has recently been

36   Purposive Behavior Acquisition for a Real Robot by Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1996)   (Correct)
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a ... / Real Robot By Vision-Based Reinforcement Learning Minoru Asada Shoichi br a method of vision-based reinforcement learning by which a robot learns to

36   Hierarchical Reinforcement Learning with the MAXQ Value Function.. - Thomas Dietterich (1998)   (Correct)
This paper describes the MAXQ method for hierarchical reinforcement learning based on a hierarchical decomposition of the value function and derives conditions under which the MAXQ decomposition can r... / Hierarchical Reinforcement Learning with the MAXQ Value br method for hierarchical reinforcement learning based on a hierarchical

36   Multiagent Reinforcement Learning: Theoretical Framework and an.. - Hu, Wellman (1998)   (Correct)
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework... / Multiagent Reinforcement Learning Theoretical Framework and br a framework for multiagent reinforcement learning. Our work extends

35   Learning Without State-Estimation in Partially Observable Markovian.. - Singh, Jaakkola, Jordan (1994)   (Correct)
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see ... / Abstract Reinforcement learning RL algorithms provide a br state of the environment. Reinforcement learning RL techniques provide a

34   Reinforcement Learning in the Multi-Robot Domain - Mataric (1997)   (Correct)
This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environemnts such as in the complex concurrent multi-robot learning domain. The methodology involve... / Reinforcement Learning in the Multi-Robot Domain br describes a formulation of reinforcement learning that enables learning in

34   Reinforcement Learning with Soft State Aggregation - Singh, Jaakkola, Jordan (1995)   (Correct)
It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of t... / Reinforcement Learning with Soft State br is crucial to scaling reinforcement learning RL algorithms to

34   Adding learning to the cellular development of neural networks.. - Gruau, Whitley, Pyeatt (1993)   (Correct)
A grammar tree is used to encode a cellular developmental process that can generate whole families of Boolean neural networks for computing parity and symmetry. The development process resembles bio... / supervised learning and for reinforcement learning applications. Genetic

33   Gradient Descent for General Reinforcement Learning - Baird, Moore (1998)   (Correct)
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, def... / Descent for General Reinforcement Learning Leemon Baird Andrew br generate a wide range of new reinforcementlearning algorithms. These

33   Continual Learning In Reinforcement Environments - Ring (1994)   (Correct)
Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed.... / on the sections involving reinforcement learning. Thanks also to Risto br complicated non-Markovian reinforcement-learning tasks and can then

32   Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)   (Correct)
Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, ... / Finding Structure in Reinforcement Learning Sebastian Thrun br eds. Abstract Reinforcement learning addresses the problem of

32   Learning to Use Selective Attention and Short-Term Memory in.. - McCallum (1996)   (Correct)
This paper presents U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces an... / paper presents U-Tree a reinforcement learning algorithm that uses br question How can a reinforcement learning agent successfully learn

32   Instance-Based Utile Distinctions for Reinforcement Learning with.. - Andrew Mccallum (1995)   (Correct)
We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous wo... / Utile Distinctions for Reinforcement Learning with Hidden State R. br Utile Suffix Memory a reinforcement learning algorithm that uses

32   Hierarchical Learning in Stochastic Domains: Preliminary Results - Kaelbling (1993)   (Correct)
This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Sp... / INTRODUCTION Reinforcement learning is a general tool for br A crucial problem in reinforcement learning is temporal credit

31   On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach - Salzberg (1997)   (Correct)
An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, co... / error back propagation reinforcement learning and rule learning. Over

31   Coordination Of Multiple Behaviors Acquired By A Vision-Based.. - Asada, Uchibe, Noda, Tawaratsumida.. (1994)   (Correct)
A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors whi... / Acquired By A Vision-Based Reinforcement Learning Minoru Asada Eiji br acquired by a vision-based reinforcement learning. First individual

31   ZCS: A Zeroth Level Classifier System - Wilson (1994)   (Correct)
A basic classifier system, ZCS, is presented which keeps much of Holland's original framework but simplifies it to increase understandability and performance. ZCS's relation to Q-learning is brought o... / on the related field of reinforcement learning Barto efforts to br under the heading of reinforcement learning and appears to provide a

29   Connectionist Learning for Control: An Overview - Barto (1989)   (Correct)
to appear. [91] C. Stanfill and D. Waltz. Toward memory-based reasoning. Communications of the ACM, 29:1213--1228, December 1986. [92] R. S. Sutton. Temporal Credit Assignment in Reinforcement Learn... / R. J. Williams. Reinforcement learning in connectionist networks br R. J. Williams. Reinforcement-learning connectionist systems.

28   The MAXQ Method for Hierarchical Reinforcement Learning - Dietterich (1998)   (Correct)
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics---as a subroutin... / Method for Hierarchical Reinforcement Learning Thomas G. Dietterich br approach to hierarchical reinforcement learning based on the MAXQ

28   Temporal Difference Learning of Position Evaluation in the Game of Go - Schraudolph, Dayan, Sejnowski (1994)   (Correct)
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. D... / In order to demonstrate reinforcement learning as a viable alternative to br Credit Assignment in Reinforcement Learning. PhD thesis University

28   The Dynamics of Reinforcement Learning in Cooperative Multiagent.. - Claus, Boutilier (1998)   (Correct)
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dy... / The Dynamics of Reinforcement Learning in Cooperative Multiagent br Abstract Reinforcement learning can provide a robust and

28   Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in.. - Sutton, Precup, Singh (1999)   (Correct)
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed wit... / Temporal Abstraction in Reinforcement Learning Richard S. Sutton br mathematical framework of reinforcement learning and Markov decision

27   A Robot Controller Using Learning by Imitation - Hayes, Demiris (1994)   (Correct)
Roboticists have already invested considerable energy in building robot controllers which model the learning capacities of single animals. In this paper we present a new type of controller which dra... / computationally expensive reinforcement learning stage is permissible it br a negotiation strategy. A reinforcement learning module could be useful

27   On the Complexity of Solving Markov Decision Problems - Littman, Dean, Kaelbling (1995)   (Correct)
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize resul... / automated planning and reinforcement learning. In this paper we br planning reinforcement learning and other sequential

27   Multiagent coordination with learning classifier systems - Sen, Sekaran (1996)   (Correct)
this paper, we evaluate a particular reinforcement learning methodology, a genetic algorithm based machine learning mechanism known as classifier systems [ Holland, 1986 ] for developing action polici... / agents. We have used reinforcement learning Barto et al. br we evaluate a particular reinforcement learning methodology a genetic

27   Adaptive Load Balancing: A Study in Multi-Agent Learning - Schaerf, Shoham, Tennenholtz (1995)   (Correct)
We study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first defi... / the process of multi-agent reinforcement learning in the context of load br investigates multi-agent reinforcement learning in the context of a

27   Discovery of Subroutines in Genetic Programming - Rosca, Ballard (1996)   (Correct)
Introduction Hierarchical Genetic Programming (HGP) extensions discover, modify, and exploit subroutines to accelerate the evolution of programs [Koza 1992, Rosca and Ballard 1994a] . The use of subr... / in the larger context of reinforcement learning problems. Finally br the fitness of subroutines. Reinforcementlearning RL algorithms such as

26   Average Reward Reinforcement Learning: Foundations, Algorithms, and.. - Mahadevan (1996)   (Correct)
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounte... / Average Reward Reinforcement Learning Foundations Algorithms br study of average reward reinforcement learning an undiscounted

26   MIMIC: Finding Optima by Estimating Probability Densities - De Bonet, Isbell, Jr., Viola (1996)   (Correct)
In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely... / by Sabes and Jordan for a reinforcement learning task Sabes and Jordan br and Jordan M. I. Reinforcement learning by probability matching.

25   Automatic Programming of Robots using Genetic Programming - Koza (1992)   (Correct)
The goal in automatic programming is to get a computer to perform a task by telling it what needs to be done, rather than by explicitly programming it. This paper considers the task of automatically g... / to the reported ability of reinforcement learning techniques such as Q br requirements of reinforcement learning necessitates considerable

25   Learning To Solve Markovian Decision Processes - Singh (1994)   (Correct)
LEARNING TO SOLVE MARKOVIAN DECISION PROCESSES February 1994 Satinder P. Singh B.Tech., INDIAN INSTITUTE OF TECHNOLOGY NEW DELHI M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHU... / researchers have developed reinforcement learning RL algorithms based on br . Why Reinforcement Learning

25   Solving POMDPs by Searching in Policy Space - Hansen (1998)   (Correct)
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solvi... / using value iteration or reinforcement learning. Because the policy is

25   Reinforcement Learning Applied to Linear Quadratic Regulation - Bradtke (1993)   (Correct)
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the c... / Reinforcement Learning Applied to Linear br Recent research on reinforcement learning has focused on algorithms

24   High-Performance Job-Shop Scheduling With A Time-Delay TD(lambda).. - Zhang, Dietterich (1995)   (Correct)
Job-shop scheduling is an important task for manufacturing industries. We are interested in the particular task of scheduling payload processing for NASA's space shuttle program. This paper summarizes... / task for solution by the reinforcement learning algorithm TD A br Navigation and Planning Reinforcement Learning Presentation

24   ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods.. - Menczer (1997)   (Correct)
ARACHNID is a distributed algorithm for information discovery in large, dynamic, distributed environments such as the World Wide Web. The approach is based on a distributed, adaptive population of int... / the user on the basis of reinforcement learning Armstrong et al. br agent's reproductive cycle. Reinforcement learning is the natural extension

24   Robot Shaping: Experiment In Behavior Engineering - Dorigo, Colombetti (1997)   (Correct)
its performance. In fact, we use the expression robot shaping to denote the use of learning as a means to translate suggestions coming from an external trainer into an effective control strategy that... / is an approach based on reinforcement learning with reinforcements br we have experimented with reinforcement learning RL RL can be seen as a

24   Evolving Artificial Neural Networks - Yao (1999)   (Correct)
Learning and evolution are two fundamental forms of adaptation. There has been a great interest in combining learning and evolution with artificial neural networks (ANNs) in recent years. This paper (... / unsupervised and reinforcement learning. Supervised learning is br to minimize the error. Reinforcement learning is a special case of

24   Policy Gradient Methods for Reinforcement Learning with Function.. - Sutton, McAllester, Singh, Mansour (2000)   (Correct)
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable.... / Gradient Methods for Reinforcement Learning with Function br is essential to reinforcement learning but the standard

24   Self-Learning Fuzzy Controllers Based on Temporal Back Propagation - Jyh-Shing Jang Department (1992)   (Correct)
This paper presents a generalized control strategy that enhances fuzzy controllers with self-learning capability for achieving prescribed control objectives in a near-optimal manner. This methodology,... / are mostly based on reinforcement learning Our learning br controllers by reinforcement learning. In Proc. of the Eighth

23   A Machine Learning Architecture for Optimizing Web Search Engines - Boyan, Freitag, Joachims (1996)   (Correct)
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing... / a novel one inspired by reinforcement learning techniques for propagating br motivated by an analogy to reinforcement learning as studied in artificial

23   Converges with Probability - Peter Dayan Terrence (1994)   (Correct)
The methods of temporal differences (Samuel, 1959; Sutton 1984, 1988) allow agents to learn accurate predictions about stationary stochastic future outcomes. The learning is effectively stochastic a... / Probability Keywords reinforcement learning temporal differences br well as other classes of reinforcement learning algorithm.

23   Hierarchical Control and Learning for Markov Decision Processes - Parr (1998)   (Correct)
Hierarchical Control and Learning for Markov Decision Processes by Ronald Edward Parr Doctor of Philosophy in Computer Science University of California at Berkeley Professor Stuart Russell, Cha... / . Reinforcement Learning Methods br . Reinforcement learning with HAMs

22   Sequential Behavior and Learning in Evolved Dynamical Neural Networks - Yamauchi, Beer (1994)   (Correct)
This paper explores the use of a real-valued modular genetic algorithm to evolve continuous-time recurrent neural networks capable of sequential behavior and learning. We evolve networks that can gene... / approach to related work on reinforcement learning the induction of regular br large body of research on reinforcement learning algorithms. Kaelbling's

22   Emergent Adaptive Lexicons - Steels (1996)   (Correct)
The paper reports experiments to test the hypothesis that language is an autonomous evolving adaptive system maintained by a group of distributed agents without central control. The experiments show h... / small group of robots using reinforcement learning. Again the size of the

22   Explanation-Based Neural Network Learning for Robot Control - Mitchell, Thrun (1993)   (Correct)
How can artificial neural nets generalize better from fewer examples? In order to generalize successfully, neural network learning methods typically require large training data sets. We introduce a ne... / robot task based on reinforcement learning. Introduction br from recent research on reinforcement learning Barto et al.

22   ALECSYS and the AutonoMouse: Learning to Control a Real Robot by.. - Dorigo (1995)   (Correct)
In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constra... / Classifier Systems Reinforcement Learning Genetic Algorithms br this article belongs to the reinforcement learning research field. Holland's

22   Action Selection methods using Reinforcement Learning - Mark Humphrys University (1996)   (Correct)
Action Selection schemes, when translated into precise algorithms, typically involve considerable design effort and tuning of parameters. Little work has been done on solving the problem using lea... / Selection methods using Reinforcement Learning Mark Humphrys br selection problem using Reinforcement Learning learning from rewards

22   Training Agents To Perform Sequential Behavior - Colombetti, Dorigo (1993)   (Correct)
This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive robot. Obviously, a pure r... / work we have been applying reinforcement learning techniques to control a br application of evolutionary reinforcement learning to the development of

21   Issues in Using Function Approximation for Reinforcement Learning - Thrun, Schwartz (1993)   (Correct)
this paper we identify a prime source of such failures---namely, a systematic overestimation of utility values. Using Watkins' Q-Learning [18] as an example, we give a theoretical account of the pheno... / Function Approximation for Reinforcement Learning Sebastian Thrun Anton br schwartz cs.stanford.edu Reinforcement learning techniques address the

21   Reinforcement Learning And Its Application To Control - Gullapalli (1992)   (Correct)
REINFORCEMENT LEARNING AND ITS APPLICATION TO CONTROL February 1992 Vijaykumar Gullapalli, B.S., Birla Institute of Technology and Science, India M.S., University of Massachusetts Ph.D., University of... / Reinforcement Learning And Its Application To br All Rights Reserved Reinforcement Learning And Its Application To

21   Memory Approaches To Reinforcement Learning In Non-Markovian Domains - Long-Ji Lin (1992)   (Correct)
Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a... / Memory Approaches To Reinforcement Learning In Non-Markovian Domains br PA Abstract Reinforcement learning is a type of unsupervised

21   Interaction and Intelligent Behavior - Mataric (1994)   (Correct)
This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: 1) synthesis and analysis of intelligent group behavior, and 2) learning in complex group en... / A novel formulation of reinforcement learning is proposed that makes br with the existing reinforcement learning algorithms allowing it to

21   Simulation-Based Optimization of Markov Reward Processes - Marbach, Tsitsiklis (1998)   (Correct)
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Proce... / go under the names of reinforcement learning or neuro-dynamic br Singh and M. I. Jordan Reinforcement Learning Algorithm for Partially

20   Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)   (Correct)
Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper ... / Using Reinforcement Learning to Spider the Web br best framed and solved by reinforcement learning a branch of machine

20   Convergence Results for Single-Step On-Policy Reinforcement-Learning.. - Singh, Jaakkola, al. (1998)   (Correct)
An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration /exploitation ... / for Single-Step On-Policy Reinforcement-Learning Algorithms SATINDER br An important application of reinforcement learning RL is to finite-state

20   A Distributed Reinforcement Learning Scheme for Network Routing - Littman, Boyan (1993)   (Correct)
In this paper we describe a self-adjusting algorithm for packet routing in which a reinforcement learning method is embedded into each node of a network. Only local information is used at each node to... / A Distributed Reinforcement Learning Scheme for Network br packet routing in which a reinforcement learning method is embedded into

20   Reinforcement Learning Methods for Continuous-Time Markov Decision.. - Steven Bradtke (1995)   (Correct)
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution... / Reinforcement Learning Methods for br Problems. A number of reinforcement learning algorithms have been

20   Coevolution of a Backgammon Player - Pollack, Blair, Land (1996)   (Correct)
One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims's work on artificial ro... / good news about the reinforcement learning method For the idea of br framework for multi-agent reinforcement learning. In Machine Learning

20   Reinforcement Learning with a Hierarchy of Abstract Models - Singh (1992)   (Correct)
Models Satinder P. Singh Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu Abstract Reinforcement learning (RL) algorithms have traditionally been thou... / Reinforcement Learning with a Hierarchy of br Abstract Reinforcement learning RL algorithms have

20   Genetics-based Machine Learning and Behaviour Based Robotics: A New.. - Dorigo, Schnepf (1993)   (Correct)
Intelligent robots should be able to use sensor information to learn how to behave in a changing environment. As environmental complexity grows, the learning task becomes more and more difficult. We f... / belong to the class of reinforcement learning systems Fig. br rewards Fig. -A general reinforcement learning model. The name

19   Exploration and Model Building in Mobile Robot Domains - Thrun (1993)   (Correct)
I present first results on COLUMBUS, an autonomous mobile robot. COLUMBUS operates in initially unknown, structured environments. Its task is to explore and model the environment efficiently while avo... / in the context of reinforcement learning Thrun b br Programming robots using reinforcement learning and teaching. In

19   An Approach to Anytime Learning - Grefenstette, Ramsey (1992)   (Correct)
Anytime learning is a general approach to continuous learning in a changing environment. The agent's learning module continuously tests new strategies against a simulation model of the task environmen... / methods especially other reinforcement learning methods Barto Sutton

18   An Improved Algorithm for Incremental Induction of Decision Trees - Utgoff (1994)   (Correct)
This paper presents an algorithm for incremental induction of decision trees that is able to handle both numeric and symbolic variables. In order to handle numeric variables, a new tree revision opera... / of recent work in reinforcement learning e.g. Sutton's

18   Symbiotic Evolution of Neural Networks in Sequential Decision Tasks - Moriarty (1997)   (Correct)
viii Chapter 1 Introduction 1 1.1 Sequential Decision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Examples of Sequential Decision Tasks . . . . . . . . . . . . . . ... / Reinforcements . Reinforcement Learning vs. Supervised Learning . br . Temporal Difference Reinforcement Learning .

18   TD Models: Modeling the World at a Mixture of Time Scales - Sutton (1995)   (Correct)
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We... / as is commonly done in reinforcement learning but also to predict br can be used in model-based reinforcement-learning architectures and dynamic

18   The Efficient Learning of Multiple Task Sequences - Singh (1992)   (Correct)
I present a modular network architecture and a learning algorithm based on incremental dynamic programming that allows a single learning agent to learn to solve multiple Markovian decision tasks (MDTs... / model of the environment. Reinforcement learning algorithms such as

18   Instance-Based State Identification for Reinforcement Learning - Andrew Mccallum (1994)   (Correct)
This paper presents instance-based state identification, an approach to reinforcement learning and hidden state that builds disambiguating amounts of short-term memory on-line, and also learns with an... / State Identification for Reinforcement Learning R. Andrew McCallum br an approach to reinforcement learning and hidden state that

18   A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov .. - Kearns, Mansour, Ng (1999)   (Correct)
An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments wi... / traditional planning and reinforcement learning algorithms are often br processes MDPs and reinforcement learning have become a standard

18   Simulation-Based Optimization of Markov Reward Processes.. - Marbach, Tsitsiklis (1999)   (Correct)
We consider discrete time, finite state space Markov reward processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the... / neuro-dynamic programming reinforcement learning in JSJ we refer to br P. Singh and M. I. Jordan. Reinforcement Learning Algorithm for Partially

17   Moving Furniture with Teams of Autonomous Robots - Rus, Donald, Jennings (1995)   (Correct)
We wish to organize furniture in a room with a team of robots that can push objects. We show how coordinated pushing by robots can change the pose (position and orientation) of objects and then we ask... / robots. In Par a reinforcement learning strategy that focuses on

17   MINERVA: A Second-Generation Museum Tour-Guide Robot - Thrun, Bennewitz, Burgard, Cremers.. (1999)   (Correct)
This paper describes an interactive tour-guide robot, which was successfully exhibited in a Smithsonian museum. During its two weeks of operation, the robot interacted with thousands of people, traver... / intents and employs reinforcement learning for tailoring its br Minerva used a memory-based reinforcement learning approach no delayed

17   Evolutionary Artificial Neural Networks - Yao (1993)   (Correct)
Evolutionary Artificial Neural Networks (EANNs) can be considered as a combination of artificial neural networks (ANNs) and evolutionary search procedures, such as genetic algorithms (GAs). This paper... / feature selection genetic reinforcement learning initial weight br unsupervised and reinforcement learning. Supervised learning is

17   An incremental approach to developing intelligent neural network.. - Meeden (1995)   (Correct)
By beginning with simple reactive behaviors and gradually building up to more memorydependent behaviors, it may be possible for connectionist systems to eventually achieve the level of planning. This ... / and a global method of reinforcement learning are contrasted-a special br And Compared. Iii. Reinforcement Learning Methods Robotics

17   Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma - Sandholm, Crites (1995)   (Correct)
Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable ... / Multiagent Reinforcement Learning in the Iterated br - - Abstract Reinforcement learning RL is based on the idea

17   Using Randomization to Break the Curse of Dimensionality - Rust (1996)   (Correct)
This paper introduces random versions of successive approximations and multigrid algorithms for computing approximate solutions to a class of finite and infinite horizon Markovian decision problems ... / convergence of stochastic reinforcement learning algorithms such as real

17   Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)   (Correct)
We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After o... / Near-Optimal Reinforcement Learning in Polynomial Time br present new algorithms for reinforcement learning and prove that they have

16   Using Marker-Based Genetic Encoding Of Neural Networks To Evolve.. - Fullmer (1991)   (Correct)
A new mechanism for genetic encoding of neural networks is proposed, which is loosely based on the marker structure of biological DNA. The mechanism allows all aspects of the network structure, includ... / requirement is relaxed in reinforcement learning where only an estimate

16   Towards Collaborative and Adversarial Learning: A Case Study in.. - Stone, Veloso (1997)   (Correct)
Soccer is a rich domain for the study of multiagent learning issues. Not only must the players learn low-level skills, but they must also learn to work together and to adapt to the behaviors of differ... / papers describes a reinforcement learning agent which incorporates br Ford et al. used a Reinforcement Learning RL approach with sensory

16   Modular Neural Networks for Learning Context-Dependent Game Strategies - Boyan (1992)   (Correct)
The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, who... / . . Reinforcement Learning br paradigm. TS . . Reinforcement Learning Tesauro was right.

16   Incorporating Advice into Agents that Learn from Reinforcements - Maclin (1994)   (Correct)
Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present an approach that ... / agents. However reinforcement learning usually requires a large br function. Subsequent reinforcement learning further integrates and

16   Online Learning with Random Representations - Sutton, Whitehead (1993)   (Correct)
We consider the requirements of online learning---learning which must be done incrementally and in realtime, with the results of learning available soon after each new example is acquired. Despite the... / e.g.as components of reinforcement learning systems. Most of these br needed as components of reinforcement learning systems for example to

16   Generalization and Scaling in Reinforcement Learning - David Ackley (1990)   (Correct)
In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedback signals from the inpu... / and scaling in reinforcement learning David H. Ackley br ABSTRACT In associative reinforcement learning an environment generates

16   Layered Learning in Multi-Agent Systems - Stone (1998)   (Correct)
Multi-agent systems in complex, real-time domains require agents to act e#ectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of auton... / hierarchical learning reinforcement learning decision tree learning br a new multi-agent reinforcement learning algorithm namely

16   Problem Solving With Reinforcement Learning - Rummery (1995)   (Correct)
This dissertation is submitted for consideration for the dwree of Doctor' of Philosophy at the Uziver'sity of Cambr'idge Summary This thesis is concerned with practical issues surrounding the appli... / Problem Solving With Reinforcement Learning Gavin Adrian Rummery br problem. The resulting reinforcement learning system has the properties

15   Self-fulfilling Bias in Multiagent Learning - Hu, Wellman (1996)   (Correct)
Learning in a multiagent environment is complicated by the fact that as other agents learn, the environment effectively changes. Moreover, other agents' actions are often not directly observable, and ... / Russell Norvig In reinforcement learning the initial hypothesis br investigated some form of reinforcement learning Tan Wei

15   Integrated Architectures for Learning, Planning, and Reacting Based.. - Sutton (1990)   (Correct)
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforce... / integrate trial-and-error reinforcement learning and execution-time br Q-learning a new kind of reinforcement learning. Dyna-Q uses a less

15   Reinforcement Learning With HIGH-DIMENSIONAL, CONTINUOUS ACTIONS - III, Klopf (1993)   (Correct)
Many reinforcement learning systems, such as Q-learning (Watkins, 1989), or advantage updating (Baird, 1993), require that a function f(x,u) be learned, and that the value of arg max u f x , u ( ) b... / Reinforcement Learning With br ABSTRACT Many reinforcement learning systems such as

15   Training Second-Order Recurrent Neural Networks using Hints - Omlin, Giles (1992)   (Correct)
We investigate a method for inserting rules into discrete-time second-order recurrent neural networks which are trained to recognize regular languages. The rules defining regular languages can be expr... / task. Berenji uses reinforcement learning to refine reasoning-based br Controllers By Reinforcement Learning Proceedings of the

15   Learning Roles: Behavioral Diversity in Robot Teams - Tucker Balch (1997)   (Correct)
This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it b... / strategy using reinforcement learning. The agents learn br in a task is available reinforcement learning can shift the burden of

15   Model-Based Learning for Mobile Robot Navigation from the Dynamical.. - Tani (1996)   (Correct)
This paper discusses how a behavior-based robot can construct a "symbolic process " that accounts for its deliberative thinking processes using models of the environment. The paper focuses on two esse... / genetic programming reinforcement learning and others. These br space quantisation for reinforcement learning of collision-free

15   Learning Optimal Dialogue Strategies: A Case Study of a Spoken.. - Walker, Fromer, Narayanan (1998)   (Correct)
This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of comm... / is based on algorithms for reinforcement learning such as dynamic br S i derived Several reinforcement learning algorithms based on

15   Explanation-Based Learning and Reinforcement Learning: A Unified View - Dietterich, Flann (1995)   (Correct)
In speedup-learning problems, where full descriptions of operators are always known, both explanation-based learning (EBL) and reinforcement learning (RL) can be applied. This paper shows that both me... / Learning and Reinforcement Learning A Unified View Thomas br learning EBL and reinforcement learning RL can be applied. This

15   Truncating Temporal Differences: On the Efficient Implementation of.. - Cichosz (1995)   (Correct)
Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor . Currently the most important application ... / of TD for Reinforcement Learning Pawe Cichosz br credit assignment in reinforcement learning. Well known reinforcement

15   Imitative Learning Mechanisms in Robots and Humans - Demiris, Hayes (1996)   (Correct)
We do not exist alone. Humans and most other animal species live in societies where the behaviour of an individual influences and is influenced by other members of the society. Within societies, an ... / learn any task through reinforcement learning Sutton but the br to learn these skills using reinforcement learning considerably. For

15   Reinforcement Learning Algorithms for Average-Payoff Markovian.. - Singh (1994)   (Correct)
Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in robotics and artificial intelligence. RL researchers have focussed almost exclusively on problems whe... / pp. - . Reinforcement Learning Algorithms for br Abstract Reinforcement learning RL has become a central

15   A Comparison between Cellular Encoding and Direct Encoding for.. - Gruau, Whitley, Pyeatt (1996)   (Correct)
This paper compares the efficiency of two encoding schemes for Artificial Neural Networks optimized by evolutionary algorithms. Direct Encoding encodes the weights for an a priori fixed neural ne... / Introduction For reinforcement learning problems a training set of br Particularly difficult reinforcement learning problems are those that

15   Learning Policies with External Memory - Peshkin, Meuleau, Kaelbling (1999)   (Correct)
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in... / Introduction A reinforcement-learning agent must learn a mapping br perform fairly well. Basic reinforcement-learning techniques such as

15   A Multistrategy Learning Scheme For Agent Knowledge Acquisition - Gordon, Subramanian (1993)   (Correct)
this paper). Although room does not permit listing them all, some examples are: TF bearing(X,Y) = right AND turn(X) = left THEN heading(Y,X) headon unknown Informatica 17 331 A MULTISTRATEGY LEARNI... / learning and reinforcement learning The second more br then refine them with reinforcement learn- ing and

15   Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent.. - Baxter, Weaver (1999)   (Correct)
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). unkn... / Direct Gradient-Based Reinforcement Learning II. Gradient Ascent br -based approaches to reinforcement learning is that it guarantees

15   Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation.. - Baxter, Bartlett (1999)   (Correct)
Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy generat... / Direct Gradient-Based Reinforcement Learning I. Gradient Estimation br based approaches to reinforcement learning suffer from a paucity of

14   Complexity Analysis of Real-Time Reinforcement Learning Applied to.. - Koenig, Simmons (1997)   (Correct)
This report analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous real-time versions of Q-learning and value-iteration, applied to the problems of reaching any goal... / Analysis of Real-Time Reinforcement Learning Applied to Finding br Machine Learning Reinforcement Learning Learning Adaptation

14   Evolving Optimal Populations with XCS Classifier Systems - Kovacs (1996)   (Correct)
This work investigates some uses of self-monitoring in classifier systems (CS) using Wilson's recent XCS system as a framework. XCS is a significant advance in classifier systems technology which shif... / payoff environment in the reinforcement learning tradition in contrast to br . Reinforcement Learning Problems . Payoff

14   Learning from Demonstration - Schaal (1997)   (Correct)
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initia... / applied in the context of reinforcement learning. We consider priming the br problems only model-based reinforcement learning shows significant speed-up

14   Hidden State and Reinforcement Learning with Instance-Based State.. - Andrew Mccallum   (Correct)
Real robots with real sensors are not omniscient. When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, b... / Hidden State and Reinforcement Learning with Instance-Based State br a new approach to reinforcement learning with state identification

14   Some Studies in Distributed Machine Learning and Organizational Design - Weiss (1994)   (Correct)
This article focusses on the intersection of distributed machine learning and organizational design in the context of multi-agent systems. A computational approach to distributed reinforcement learn... / approach to distributed reinforcement learning from experience and br structuring and distributed reinforcement learning from experience and

14   Scaling Up Average Reward Reinforcement Learning by Approximating the .. - Prasad Tadepalli (1996)   (Correct)
Almost all the work in Average-reward Reinforcement Learning (ARL) so far has focused on table-based methods which do not scale to domains with large state spaces. In this paper, we propose two extens... / Scaling Up Average Reward Reinforcement Learning by Approximating the br the work in Average-reward Reinforcement Learning ARL so far has focused

14   Reinforcement Learning in Markovian and Non-Markovian Environments - Jürgen Schmidhuber   (Correct)
This work addresses three problems with reinforcement learning and adaptive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization.... / Reinforcement Learning in Markovian and br three problems with reinforcement learning and adaptive

14   Density-Adaptive Learning and Forgetting - Marcos Salganicoff (1993)   (Correct)
We describe a density-adaptive reinforcement learning and a density-adaptive forgetting algorithm. This learning algorithm uses hybrid k-D/2 k -trees to allow for a variable resolution partitioning... / describe a density-adaptive reinforcement learning and a density-adaptive br Density-Adaptive Reinforcement Learning DARLING for

14   Complexity Analysis of Real-Time Reinforcement Learning - Koenig, Simmons (1997)   (Correct)
This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal stat... / Analysis of Real-Time Reinforcement Learning Sven Koenig and Reid br the complexity of on-line reinforcement learning algorithms namely

14   Consideration of Risk in Reinforcement Learning - Heger (1994)   (Correct)
Most Reinforcement Learning (RL) work supposes policies for sequential decision tasks to be optimal that minimize the expected total discounted cost (e. g. Q- Learning [Wat 89], AHC [Bar Sut And 83])... / Consideration of Risk in Reinforcement Learning Matthias Heger br Abstract Most Reinforcement Learning RL work supposes

14   Cellular Encoding Applied to Neurocontrol - Whitley, Gruau, Pyeatt (1995)   (Correct)
Neural networks are trained for balancing 1 and 2 poles attached to a cart on a fixed track. For one variant of the single pole system, only pole angle and cart position variables are supplied as ... / training neural networks is reinforcement learning. For these types of br C. Genetic Reinforcement Learning for Neurocontrol

14   Hierarchical Learning with Procedural Abstraction Mechanisms - Rosca (1997)   (Correct)
Evolutionary computation (EC) consists of the design and analysis of probabilistic algorithms inspired by the principles of natural selection and variation. Genetic Programming (GP) is one subfield of... / . Reinforcement learning offers insights to br Bibliography A Reinforcement Learning B Minimum Description

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute