This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
265 Reinforcement Learning I: Introduction - Sutton, Barto (1998)(Correct)
Introduction
Richard S. Sutton and Andrew G. Barto
c fl All rights reserved
[In which we try to give a basic intuitive sense of what reinforcement
learning is and how it differs and relates to oth... / Course Notes Reinforcement Learning I Introduction br intuitive sense of what reinforcement learning is and how it differs and
236 Reinforcement Learning: A Survey - Leslie Pack Kaelbling, Michael L.. (1996)(Correct)
This paper surveys the field of reinforcement learning from a computer-science perspective.
It is written to be accessible to researchers familiar with machine learning. Both
the historical basis of t... / published Reinforcement Learning A Survey Leslie Pack br paper surveys the field of reinforcement learning from a computer-science
179 Learning to Act using Real-Time Dynamic Programming - Barto, Bradtke, Singh (1995)(Correct)
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning ... / aspects of other DP-based reinforcement learning methods such as Watkins' br algoithms are examples of reinforcement learning methods by which
154 Automatic Programming of Behavior-based Robots using Reinforcement.. - Mahadevan, Connell (1991)(Correct)
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two a... / credit assignment in reinforcement learning. PhD thesis University br cooperative mechanisms in reinforcement learning. In Proceedings of the
104 Prioritized Sweeping: Reinforcement Learning with Less Data and Less.. - Moore, Atkeson (1993)(Correct)
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic
Markov systems. Incremental learning methods such as Temporal Differencing and Qlearning
have fast ... / Prioritized Sweeping Reinforcement Learning with Less Data and Less br Sweeping with other reinforcement learning schemes for a number of
104 Markov games as a framework for multi-agent reinforcement learning - Littman (1994)(Correct)
In the Markov decision process (MDP) formalization
of reinforcement learning, a single adaptive
agent interacts with an environment defined by a
probabilistic transition function. In this solipsistic
... / a framework for multi-agent reinforcement learning Michael L. Littman br MDP formalization of reinforcement learning a single adaptive agent
96 Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents - Tan (1993)(Correct)
Intelligent human agents exist in a cooperative
social environment that facilitates
learning. They learn not only by trialand
-error, but also through cooperation by
sharing instantaneous information,... / Multi-Agent Reinforcement Learning Independent vs. br Given the same number of reinforcement learning agents will cooperative
90 Improving Elevator Performance Using Reinforcement Learning - Crites, Barto (1996)(Correct)
This paper describes the application of reinforcement learning (RL)
to the difficult real world problem of elevator dispatching. The elevator
domain poses a combination of challenges not seen in most
... / Elevator Performance Using Reinforcement Learning Robert H. Crites br the application of reinforcement learning RL to the difficult
88 Reinforcement Learning with Perceptual Aliasing: The Perceptual.. - Chrisman (1992)(Correct)
It is known that Perceptual Aliasing may significantly
diminish the effectiveness of reinforcement
learning algorithms [ Whitehead and Ballard,
1991 ] . Perceptual aliasing occurs when multiple
situat... / Reinforcement Learning with Perceptual Aliasing br the effectiveness of reinforcement learning algorithms Whitehead
87 The Parti-game Algorithm for Variable Resolution Reinforcement.. - Moore, Atkeson (1995)(Correct)
Parti-game is a new algorithm for learning feasible trajectories to goal regions in
high dimensional continuous state-spaces. In high dimensions it is essential that learning does not
plan uniformly... / for Variable Resolution Reinforcement Learning in Multidimensional br few minutes. Keywords Reinforcement Learning Curse of Dimensionality
85 Machine Learning Research: Four Current Directions - Dietterich (1997)(Correct)
Machine Learning research has been making great progress in many directions. This article summarizes four of
these directions and discusses some current open problems. The four directions are (a) impr... / learning algorithms c reinforcement learning and d learning complex br learning algorithms c reinforcement learning and d learning complex
84 Feudal Reinforcement Learning - Dayan, Hinton (1993)(Correct)
One way to speed up reinforcement learning is to enable learning
to happen simultaneously at multiple resolutions in space and
time. This paper shows how to create a Q-learning managerial
hierarchy... / San Mateo CA Feudal Reinforcement Learning Peter Dayan CNL The br One way to speed up reinforcement learning is to enable learning to
83 Simple Statistical Gradient-Following Algorithms for Connectionist.. - Williams (1992)(Correct)
This article presents a general class of associative reinforcement learning algorithms for
connectionist networks containing stochastic units. These algorithms, called REINFORCE
algorithms, are show... / for Connectionist Reinforcement Learning Ronald J. Williams br class of associative reinforcement learning algorithms for
77 Generalization in Reinforcement Learning: Successful Examples Using.. - Sutton (1996)(Correct)
On large problems, reinforcement learning systems must use parameterized
function approximators such as neural networks in order to generalize
between similar situations and actions. In these cases th... / . Generalization in Reinforcement Learning Successful Examples br On large problems reinforcement learning systems must use
75 Artificial Life and Real Robots - Brooks (1992)(Correct)
The first part of this paper explores the general issues in using Artificial Life techniques to program actual mobile robots. In particular it explores the difficulties inherent in transferring progra... / new behaviors using reinforcement learning e.g.Kaelbling and br Behavior-based Robots using Reinforcement Learning Sridhar Mahadevan and
72 Generalization in Reinforcement Learning: Safely Approximating the.. - Boyan, Moore (1995)(Correct)
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural
Information Processing Systems 7, MIT Press, Cambridge MA, 1995.
A straightforward approach to the curse of dimension... / Generalization in Reinforcement Learning Safely Approximating the br curse of dimensionality in reinforcement learning and dynamic programming
65 Cooperative Mobile Robotics: Antecedents and Directions - Cao, Fukunaga, Kahng, Meng (1995)(Correct)
There has been increased research interest in systems composed of multiple
autonomous mobile robots exhibiting collective behavior. Groups of
mobile robots are constructed, with an aim to studying suc... / fault-tolerance and reinforcement learning. By contrast DJR br the architecture that uses reinforcement learning to adjust the parameters
62 Residual Algorithms: Reinforcement Learning with Function.. - Leemon Baird (1995)(Correct)
A number of reinforcement learning algorithms have
been developed that are guaranteed to converge to the
optimal solution when used with lookup tables. It is
shown, however, that these algorithms can ... / Residual Algorithms Reinforcement Learning with Function br ABSTRACT A number of reinforcement learning algorithms have been
62 Transfer of Learning by Composing Solutions of Elemental Sequential.. - Singh (1992)(Correct)
Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focussed on singl... / tasks most applications of reinforcement learning have focussed on single br application of reinforcement learning to multiple tasks requires
61 Approximating Optimal Policies for Partially Observable Stochastic.. - Parr, Russell (1995)(Correct)
The problem of making optimal decisions in uncertain
conditions is central to Artificial Intelligence.
If the state of the world is known at all times, the
world can be modeled as a Markov Decision Pr... / can be combined with reinforcement learning methods a combination br rule that is amenable to reinforcement learning methods and will permit
58 Reinforcement Learning Algorithm for Partially Observable Markov.. - Tommi Jaakkola (1995)(Correct)
Increasing attention has been paid to reinforcement learning algorithms
in recent years, partly due to successes in the theoretical
analysis of their behavior in Markov environments. If the Markov
ass... / Reinforcement Learning Algorithm for Partially br attention has been paid to reinforcement learning algorithms in recent
58 Learning to coordinate without sharing information - Sen, Sekaran, Hale (1994)(Correct)
Researchers in the field of Distributed Artificial Intelligence (DAI) have been developing efficient mechanisms to coordinate the activities of multiple autonomous agents. The need for coordination ar... / coordination. We use reinforcement learning techniques on a block br on similar problems. Reinforcement learning based coordination can be
56 The Role Of Exploration In Learning Control - Thrun (1992)(Correct)
Introduction
Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to
be combined. On the one hand, the environment must be sufficiently explored in ord... / adaptive neurocontrol and reinforcement learning. In Section we discuss br trade-off Kaelbling reinforcement learning Watkins
55 Reinforcement Learning with Replacing Eligibility Traces - Singh (1996)(Correct)
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze i... / in The Netherlands. Reinforcement Learning with Replacing br basic mechanisms used in reinforcement learning to handle delayed reward.
54 A Reinforcement Learning Approach to Job-shop Scheduling - Zhang (1995)(Correct)
We apply reinforcement learning methods to
learn domain-specific heuristics for job shop
scheduling. A repair-based scheduler starts
with a critical-path schedule and incrementally
repairs constraint ... / A Reinforcement Learning Approach to Job-shop br A. Abstract We apply reinforcement learning methods to learn
48 Classifier Fitness Based on Accuracy - Wilson (1995)(Correct)
In many classifier systems, the classifier strength parameter serves as a predictor of
future payoff and as the classifier's fitness for the genetic algorithm. We investigate
a classifier system, XCS,... / for a wide range of reinforcement learning situations where br for a wide range of reinforcement learning situations where
47 Reinforcement Learning with Hierarchies of Machines - Ron Parr (1997)(Correct)
We present a new approach to reinforcement learning in which the policies considered by the
learning process are constrained by hierarchies of partially specified machines. This allows for the
use of ... / Reinforcement Learning with Hierarchies of br present a new approach to reinforcement learning in which the policies
47 Stable Function Approximation in Dynamic Programming - Gordon (1995)(Correct)
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area ... / Abstract The success of reinforcement learning in practical problems br W. Moore. Generalization in reinforcement learning safely approximating the
46 Efficient Algorithms for Minimizing Cross Validation Error - Moore, Lee (1994)(Correct)
Model selection is important in many areas of
supervised learning. Given a dataset and a set
of models for predicting with that dataset, we
must choose the model which is expected to best
predict futu... / exploitation dilemma in reinforcement learning. Greiner and Jurisica
45 Incremental Multi-Step Q-Learning - Peng, Williams (1996)(Correct)
This paper presents a novel incremental algorithm that combines Q-learning, a
well-known dynamic programming-based reinforcement learning method, with the TD() return
estimation process, which is ty... / dynamic programming-based reinforcement learning method with the TD br dynamic programming-based reinforcement learning method. The parameter
45 Efficient Exploration In Reinforcement Learning - Sebastian B. Thrun (1992)(Correct)
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration
in active learning and describes several local techniques for exploration in finite, di... / Efficient Exploration In Reinforcement Learning Sebastian B. Thrun br domains embedded in a reinforcement learning framework delayed
44 An Adaptive Communication Protocol for Cooperating Mobile Robots - Yanco, Stein (1993)(Correct)
We describe mobile robots engaged in a cooperative task that requires communication. The robots are initially given a fixed but uninterpreted vocabulary for communication. In attempting to perform the... / the design of appropriate reinforcement learning algorithms to learn br his symbolic test suite for reinforcement learning algorithms. Work on the
44 Algorithms for Sequential Decision Making - Littman (1996)(Correct)
of "Algorithms for Sequential Decision Making"
by Michael Lederman Littman, Ph.D., Brown University, May 1996. unknown Michael Lederman Liftman
Ph.D. Dissertation
Department of Computer Science
Br... / anyone makes the field of reinforcement learning a nice place to work. br Justin Boyan games and reinforcement learning Anne Condon solving
43 Overcoming Incomplete Perception with Utile Distinction Memory - McCallum (1993)(Correct)
This paper presents a method by which a
reinforcement learning agent can solve the
incomplete perception problem using memory.
The agent uses a hidden Markov model
(HMM) to represent its internal stat... / a method by which a reinforcement learning agent can solve the br will build a In reinforcement learning good task performance is
42 Efficient Memory-based Learning for Robot Control - Moore (1990)(Correct)
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data rec... / . Reinforcement Learning using Dynamic Programming br what was correct. C Reinforcement learning. If after each action is
40 Efficient Learning and Planning Within the Dyna Framework - Peng, Williams (1993)(Correct)
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhanc... / be cast in the form of reinforcement learning tasks. Recent work in br and the creation of new reinforcement learning algorithms such as
40 Packet Routing in Dynamically Changing Networks: A Reinforcement.. - Boyan, Littman (1994)(Correct)
This paper describes the Q-routing algorithm for packet routing,
in which a reinforcement learning module is embedded into each
node of a switching network. Only local communication is used
by each no... / Changing Networks A Reinforcement Learning Approach Justin A. Boyan br packet routing in which a reinforcement learning module is embedded into
39 Tight Performance Bounds on Greedy Policies Based on Imperfect Value.. - Williams (1993)(Correct)
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal va... / result from applying a reinforcement learning algorithm. Unless this br error typically used in reinforcement learning applications. The
39 Reinforcement Learning for Dynamic Channel Allocation in Cellular.. - Satinder Singh (1997)(Correct)
In cellular telephone systems, an important problem is to dynamically
allocate the communication resource (channels) so as to maximize
service in a stochastic caller environment. This problem is
natur... / Reinforcement Learning for Dynamic Channel br problem and we use a reinforcement learning RL method to find
38 Soccer Server: a tool for research on multi-agent systems - Noda, Matsubara, Hiraki, Frank (1997)(Correct)
This paper describes Soccer Server, a simulator of the game of soccer designed as a test-bench for evaluating multi-agent systems and cooperative algorithms. In real life, successful soccer teams requ... / colleagues have been using reinforcement learning to develop the skills of a br and K. Hosoda. Vision-based reinforcement learning for purposive behavior
37 Memoryless Policies: Theoretical Limitations and Practical Results - Michael L. Littman (1994)(Correct)
One form of adaptive behavior is "goal-seeking"
in which an agent acts so as to minimize the time
it takes to reach a goal state. This paper presents
some theoretical and empirical findings on algorit... / and more recently by reinforcement learning researchers e.g. br A classic example from the reinforcement learning literature is Sutton's
36 Purposive Behavior Acquisition for a Real Robot by Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1996)(Correct)
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a ... / Real Robot By Vision-Based Reinforcement Learning Minoru Asada Shoichi br a method of vision-based reinforcement learning by which a robot learns to
36 Hierarchical Reinforcement Learning with the MAXQ Value Function.. - Thomas Dietterich (1998)(Correct)
This paper describes the MAXQ method for hierarchical reinforcement learning based on a
hierarchical decomposition of the value function and derives conditions under which the MAXQ
decomposition can r... / Hierarchical Reinforcement Learning with the MAXQ Value br method for hierarchical reinforcement learning based on a hierarchical
36 Multiagent Reinforcement Learning: Theoretical Framework and an.. - Hu, Wellman (1998)(Correct)
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework... / Multiagent Reinforcement Learning Theoretical Framework and br a framework for multiagent reinforcement learning. Our work extends
35 Learning Without State-Estimation in Partially Observable Markovian.. - Singh, Jaakkola, Jordan (1994)(Correct)
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see ... / Abstract Reinforcement learning RL algorithms provide a br state of the environment. Reinforcement learning RL techniques provide a
34 Reinforcement Learning in the Multi-Robot Domain - Mataric (1997)(Correct)
This paper describes a formulation of reinforcement learning that
enables learning in noisy, dynamic environemnts such as in the complex
concurrent multi-robot learning domain. The methodology involve... / Reinforcement Learning in the Multi-Robot Domain br describes a formulation of reinforcement learning that enables learning in
34 Reinforcement Learning with Soft State Aggregation - Singh, Jaakkola, Jordan (1995)(Correct)
It is widely accepted that the use of more compact representations
than lookup tables is crucial to scaling reinforcement learning (RL)
algorithms to real-world problems. Unfortunately almost all of t... / Reinforcement Learning with Soft State br is crucial to scaling reinforcement learning RL algorithms to
33 Gradient Descent for General Reinforcement Learning - Baird, Moore (1998)(Correct)
A simple learning rule is derived, the VAPS algorithm, which can
be instantiated to generate a wide range of new reinforcementlearning
algorithms. These algorithms solve a number of open
problems, def... / Descent for General Reinforcement Learning Leemon Baird Andrew br generate a wide range of new reinforcementlearning algorithms. These
33 Continual Learning In Reinforcement Environments - Ring (1994)(Correct)
Continual learning is the constant development of complex behaviors with no final end in
mind. It is the process of learning ever more complicated skills by building on those skills already
developed.... / on the sections involving reinforcement learning. Thanks also to Risto br complicated non-Markovian reinforcement-learning tasks and can then
32 Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)(Correct)
Reinforcement learning addresses the problem of learning to select actions in order to
maximize one's performance in unknown environments. To scale reinforcement learning
to complex real-world tasks, ... / Finding Structure in Reinforcement Learning Sebastian Thrun br eds. Abstract Reinforcement learning addresses the problem of
32 Learning to Use Selective Attention and Short-Term Memory in.. - McCallum (1996)(Correct)
This paper presents U-Tree, a reinforcement learning
algorithm that uses selective attention and shortterm
memory to simultaneously address the intertwined
problems of large perceptual state spaces an... / paper presents U-Tree a reinforcement learning algorithm that uses br question How can a reinforcement learning agent successfully learn
32 Instance-Based Utile Distinctions for Reinforcement Learning with.. - Andrew Mccallum (1995)(Correct)
We present Utile Suffix Memory, a reinforcement
learning algorithm that uses short-term memory
to overcome the state aliasing that results from
hidden state. By combining the advantages of
previous wo... / Utile Distinctions for Reinforcement Learning with Hidden State R. br Utile Suffix Memory a reinforcement learning algorithm that uses
32 Hierarchical Learning in Stochastic Domains: Preliminary Results - Kaelbling (1993)(Correct)
This paper presents the HDG learning algorithm,
which uses a hierarchical decomposition of the
state space to make learning to achieve goals
more efficient with a small penalty in path quality.
Sp... / INTRODUCTION Reinforcement learning is a general tool for br A crucial problem in reinforcement learning is temporal credit
31 ZCS: A Zeroth Level Classifier System - Wilson (1994)(Correct)
A basic classifier system, ZCS, is presented which keeps much of Holland's original
framework but simplifies it to increase understandability and performance.
ZCS's relation to Q-learning is brought o... / on the related field of reinforcement learning Barto efforts to br under the heading of reinforcement learning and appears to provide a
29 Connectionist Learning for Control: An Overview - Barto (1989)(Correct)
to appear.
[91] C. Stanfill and D. Waltz. Toward memory-based reasoning. Communications of the ACM,
29:1213--1228, December 1986.
[92] R. S. Sutton. Temporal Credit Assignment in Reinforcement Learn... / R. J. Williams. Reinforcement learning in connectionist networks br R. J. Williams. Reinforcement-learning connectionist systems.
28 The MAXQ Method for Hierarchical Reinforcement Learning - Dietterich (1998)(Correct)
This paper presents a new approach to hierarchical
reinforcement learning based on the
MAXQ decomposition of the value function.
The MAXQ decomposition has both a procedural
semantics---as a subroutin... / Method for Hierarchical Reinforcement Learning Thomas G. Dietterich br approach to hierarchical reinforcement learning based on the MAXQ
28 Temporal Difference Learning of Position Evaluation in the Game of Go - Schraudolph, Dayan, Sejnowski (1994)(Correct)
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. D... / In order to demonstrate reinforcement learning as a viable alternative to br Credit Assignment in Reinforcement Learning. PhD thesis University
28 The Dynamics of Reinforcement Learning in Cooperative Multiagent.. - Claus, Boutilier (1998)(Correct)
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dy... / The Dynamics of Reinforcement Learning in Cooperative Multiagent br Abstract Reinforcement learning can provide a robust and
28 Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in.. - Sutton, Precup, Singh (1999)(Correct)
Learning, planning, and representing knowledge at multiple levels
of temporal abstraction are key, longstanding challenges for AI. In
this paper we consider how these challenges can be addressed wit... / Temporal Abstraction in Reinforcement Learning Richard S. Sutton br mathematical framework of reinforcement learning and Markov decision
27 A Robot Controller Using Learning by Imitation - Hayes, Demiris (1994)(Correct)
Roboticists have already invested considerable energy in building robot controllers which model the
learning capacities of single animals. In this paper we present a new type of controller which dra... / computationally expensive reinforcement learning stage is permissible it br a negotiation strategy. A reinforcement learning module could be useful
27 On the Complexity of Solving Markov Decision Problems - Littman, Dean, Kaelbling (1995)(Correct)
Markov decision problems (MDPs) provide
the foundations for a number of problems
of interest to AI researchers studying automated
planning and reinforcement learning.
In this paper, we summarize resul... / automated planning and reinforcement learning. In this paper we br planning reinforcement learning and other sequential
27 Multiagent coordination with learning classifier systems - Sen, Sekaran (1996)(Correct)
this paper, we evaluate a particular reinforcement
learning methodology, a genetic algorithm based
machine learning mechanism known as classifier systems
[ Holland, 1986 ] for developing action polici... / agents. We have used reinforcement learning Barto et al. br we evaluate a particular reinforcement learning methodology a genetic
27 Adaptive Load Balancing: A Study in Multi-Agent Learning - Schaerf, Shoham, Tennenholtz (1995)(Correct)
We study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first defi... / the process of multi-agent reinforcement learning in the context of load br investigates multi-agent reinforcement learning in the context of a
27 Discovery of Subroutines in Genetic Programming - Rosca, Ballard (1996)(Correct)
Introduction
Hierarchical Genetic Programming (HGP) extensions discover, modify, and exploit subroutines
to accelerate the evolution of programs [Koza 1992, Rosca and Ballard 1994a] .
The use of subr... / in the larger context of reinforcement learning problems. Finally br the fitness of subroutines. Reinforcementlearning RL algorithms such as
26 Average Reward Reinforcement Learning: Foundations, Algorithms, and.. - Mahadevan (1996)(Correct)
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounte... / Average Reward Reinforcement Learning Foundations Algorithms br study of average reward reinforcement learning an undiscounted
26 MIMIC: Finding Optima by Estimating Probability Densities - De Bonet, Isbell, Jr., Viola (1996)(Correct)
In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely... / by Sabes and Jordan for a reinforcement learning task Sabes and Jordan br and Jordan M. I. Reinforcement learning by probability matching.
25 Automatic Programming of Robots using Genetic Programming - Koza (1992)(Correct)
The goal in automatic programming is to get a computer to perform a task by telling it what needs to be done, rather than by explicitly programming it. This paper considers the task of automatically g... / to the reported ability of reinforcement learning techniques such as Q br requirements of reinforcement learning necessitates considerable
25 Learning To Solve Markovian Decision Processes - Singh (1994)(Correct)
LEARNING TO SOLVE MARKOVIAN DECISION PROCESSES February 1994 Satinder P. Singh B.Tech., INDIAN INSTITUTE OF TECHNOLOGY NEW DELHI M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHU... / researchers have developed reinforcement learning RL algorithms based on br . Why Reinforcement Learning
25 Solving POMDPs by Searching in Policy Space - Hansen (1998)(Correct)
Most algorithms for solving POMDPs iteratively
improve a value function that implicitly
represents a policy and are said to search
in value function space. This paper presents
an approach to solvi... / using value iteration or reinforcement learning. Because the policy is
25 Reinforcement Learning Applied to Linear Quadratic Regulation - Bradtke (1993)(Correct)
Recent research on reinforcement learning has focused on algorithms
based on the principles of Dynamic Programming (DP).
One of the most promising areas of application for these algorithms
is the c... / Reinforcement Learning Applied to Linear br Recent research on reinforcement learning has focused on algorithms
24 High-Performance Job-Shop Scheduling With A Time-Delay TD(lambda).. - Zhang, Dietterich (1995)(Correct)
Job-shop scheduling is an important task for manufacturing industries.
We are interested in the particular task of scheduling payload
processing for NASA's space shuttle program. This paper summarizes... / task for solution by the reinforcement learning algorithm TD A br Navigation and Planning Reinforcement Learning Presentation
24 ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods.. - Menczer (1997)(Correct)
ARACHNID is a distributed algorithm for information discovery in large, dynamic, distributed environments such as the World Wide Web. The approach is based on a distributed, adaptive population of int... / the user on the basis of reinforcement learning Armstrong et al. br agent's reproductive cycle. Reinforcement learning is the natural extension
24 Robot Shaping: Experiment In Behavior Engineering - Dorigo, Colombetti (1997)(Correct)
its performance. In
fact, we use the expression robot shaping to denote the use of learning as a means to translate
suggestions coming from an external trainer into an effective control strategy that... / is an approach based on reinforcement learning with reinforcements br we have experimented with reinforcement learning RL RL can be seen as a
24 Evolving Artificial Neural Networks - Yao (1999)(Correct)
Learning and evolution are two fundamental forms of adaptation. There has been a great
interest in combining learning and evolution with artificial neural networks (ANNs) in recent
years. This paper (... / unsupervised and reinforcement learning. Supervised learning is br to minimize the error. Reinforcement learning is a special case of
24 Self-Learning Fuzzy Controllers Based on Temporal Back Propagation - Jyh-Shing Jang Department (1992)(Correct)
This paper presents a generalized control strategy that enhances fuzzy controllers with self-learning capability
for achieving prescribed control objectives in a near-optimal manner. This methodology,... / are mostly based on reinforcement learning Our learning br controllers by reinforcement learning. In Proc. of the Eighth
23 A Machine Learning Architecture for Optimizing Web Search Engines - Boyan, Freitag, Joachims (1996)(Correct)
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing... / a novel one inspired by reinforcement learning techniques for propagating br motivated by an analogy to reinforcement learning as studied in artificial
23 Converges with Probability - Peter Dayan Terrence (1994)(Correct)
The methods of temporal differences (Samuel, 1959; Sutton 1984, 1988)
allow agents to learn accurate predictions about stationary stochastic future
outcomes. The learning is effectively stochastic a... / Probability Keywords reinforcement learning temporal differences br well as other classes of reinforcement learning algorithm.
23 Hierarchical Control and Learning for Markov Decision Processes - Parr (1998)(Correct)
Hierarchical Control and Learning
for
Markov Decision Processes
by
Ronald Edward Parr
Doctor of Philosophy in Computer Science
University of California at Berkeley
Professor Stuart Russell, Cha... / . Reinforcement Learning Methods br . Reinforcement learning with HAMs
22 Sequential Behavior and Learning in Evolved Dynamical Neural Networks - Yamauchi, Beer (1994)(Correct)
This paper explores the use of a real-valued modular genetic algorithm to evolve continuous-time recurrent neural networks capable of sequential behavior and learning. We evolve networks that can gene... / approach to related work on reinforcement learning the induction of regular br large body of research on reinforcement learning algorithms. Kaelbling's
22 Emergent Adaptive Lexicons - Steels (1996)(Correct)
The paper reports experiments to test the hypothesis
that language is an autonomous evolving
adaptive system maintained by a group of distributed
agents without central control. The experiments
show h... / small group of robots using reinforcement learning. Again the size of the
22 Explanation-Based Neural Network Learning for Robot Control - Mitchell, Thrun (1993)(Correct)
How can artificial neural nets generalize better from fewer examples? In order to generalize successfully, neural network learning methods typically require large training data sets. We introduce a ne... / robot task based on reinforcement learning. Introduction br from recent research on reinforcement learning Barto et al.
22 ALECSYS and the AutonoMouse: Learning to Control a Real Robot by.. - Dorigo (1995)(Correct)
In this article we investigate the feasibility of using learning classifier systems as a tool for building
adaptive control systems for real robots. Their use on real robots imposes efficiency constra... / Classifier Systems Reinforcement Learning Genetic Algorithms br this article belongs to the reinforcement learning research field. Holland's
22 Action Selection methods using Reinforcement Learning - Mark Humphrys University (1996)(Correct)
Action Selection schemes, when translated into
precise algorithms, typically involve considerable
design effort and tuning of parameters. Little
work has been done on solving the problem using
lea... / Selection methods using Reinforcement Learning Mark Humphrys br selection problem using Reinforcement Learning learning from rewards
22 Training Agents To Perform Sequential Behavior - Colombetti, Dorigo (1993)(Correct)
This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive robot. Obviously, a pure r... / work we have been applying reinforcement learning techniques to control a br application of evolutionary reinforcement learning to the development of
21 Issues in Using Function Approximation for Reinforcement Learning - Thrun, Schwartz (1993)(Correct)
this paper we identify a prime source of such
failures---namely, a systematic overestimation of utility values. Using Watkins' Q-Learning [18] as an
example, we give a theoretical account of the pheno... / Function Approximation for Reinforcement Learning Sebastian Thrun Anton br schwartz cs.stanford.edu Reinforcement learning techniques address the
21 Reinforcement Learning And Its Application To Control - Gullapalli (1992)(Correct)
REINFORCEMENT LEARNING AND ITS APPLICATION TO CONTROL February 1992 Vijaykumar Gullapalli, B.S., Birla Institute of Technology and Science, India M.S., University of Massachusetts Ph.D., University of... / Reinforcement Learning And Its Application To br All Rights Reserved Reinforcement Learning And Its Application To
21 Memory Approaches To Reinforcement Learning In Non-Markovian Domains - Long-Ji Lin (1992)(Correct)
Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning
is probably the best-understood reinforcement learning algorithm. In Q-learning, the
agent learns a... / Memory Approaches To Reinforcement Learning In Non-Markovian Domains br PA Abstract Reinforcement learning is a type of unsupervised
21 Interaction and Intelligent Behavior - Mataric (1994)(Correct)
This thesis addresses situated, embodied agents interacting in complex domains. It
focuses on two problems: 1) synthesis and analysis of intelligent group behavior, and
2) learning in complex group en... / A novel formulation of reinforcement learning is proposed that makes br with the existing reinforcement learning algorithms allowing it to
21 Simulation-Based Optimization of Markov Reward Processes - Marbach, Tsitsiklis (1998)(Correct)
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Proce... / go under the names of reinforcement learning or neuro-dynamic br Singh and M. I. Jordan Reinforcement Learning Algorithm for Partially
20 Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)(Correct)
Consider the task of exploring the Web in order to find pages of a
particular kind or on a particular topic. This task arises in the construction
of search engines and Web knowledge bases. This paper ... / Using Reinforcement Learning to Spider the Web br best framed and solved by reinforcement learning a branch of machine
20 Convergence Results for Single-Step On-Policy Reinforcement-Learning.. - Singh, Jaakkola, al. (1998)(Correct)
An important application of reinforcement learning (RL) is to finite-state control
problems and one of the most difficult problems in learning for control is balancing the exploration
/exploitation ... / for Single-Step On-Policy Reinforcement-Learning Algorithms SATINDER br An important application of reinforcement learning RL is to finite-state
20 A Distributed Reinforcement Learning Scheme for Network Routing - Littman, Boyan (1993)(Correct)
In this paper we describe a self-adjusting algorithm for packet
routing in which a reinforcement learning method is embedded into
each node of a network. Only local information is used at each node
to... / A Distributed Reinforcement Learning Scheme for Network br packet routing in which a reinforcement learning method is embedded into
20 Reinforcement Learning Methods for Continuous-Time Markov Decision.. - Steven Bradtke (1995)(Correct)
Semi-Markov Decision Problems are continuous time generalizations
of discrete time Markov Decision Problems. A number of
reinforcement learning algorithms have been developed recently
for the solution... / Reinforcement Learning Methods for br Problems. A number of reinforcement learning algorithms have been
20 Coevolution of a Backgammon Player - Pollack, Blair, Land (1996)(Correct)
One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims's work on artificial ro... / good news about the reinforcement learning method For the idea of br framework for multi-agent reinforcement learning. In Machine Learning
20 Reinforcement Learning with a Hierarchy of Abstract Models - Singh (1992)(Correct)
Models
Satinder P. Singh
Department of Computer Science
University of Massachusetts
Amherst, MA 01003
singh@cs.umass.edu
Abstract
Reinforcement learning (RL) algorithms have traditionally
been thou... / Reinforcement Learning with a Hierarchy of br Abstract Reinforcement learning RL algorithms have
20 Genetics-based Machine Learning and Behaviour Based Robotics: A New.. - Dorigo, Schnepf (1993)(Correct)
Intelligent robots should be able to use sensor information to learn how to behave in a changing
environment. As environmental complexity grows, the learning task becomes more and more
difficult. We f... / belong to the class of reinforcement learning systems Fig. br rewards Fig. -A general reinforcement learning model. The name
19 Exploration and Model Building in Mobile Robot Domains - Thrun (1993)(Correct)
I present first results on COLUMBUS, an autonomous mobile robot. COLUMBUS
operates in initially unknown, structured environments. Its task is to explore and model
the environment efficiently while avo... / in the context of reinforcement learning Thrun b br Programming robots using reinforcement learning and teaching. In
19 An Approach to Anytime Learning - Grefenstette, Ramsey (1992)(Correct)
Anytime learning is a general approach to continuous
learning in a changing environment.
The agent's learning module continuously tests
new strategies against a simulation model of the
task environmen... / methods especially other reinforcement learning methods Barto Sutton
18 TD Models: Modeling the World at a Mixture of Time Scales - Sutton (1995)(Correct)
Temporal-difference (TD) learning can be
used not just to predict rewards, as is commonly
done in reinforcement learning, but
also to predict states, i.e., to learn a model
of the world's dynamics. We... / as is commonly done in reinforcement learning but also to predict br can be used in model-based reinforcement-learning architectures and dynamic
18 The Efficient Learning of Multiple Task Sequences - Singh (1992)(Correct)
I present a modular network architecture and a learning algorithm based
on incremental dynamic programming that allows a single learning agent
to learn to solve multiple Markovian decision tasks (MDTs... / model of the environment. Reinforcement learning algorithms such as
18 Instance-Based State Identification for Reinforcement Learning - Andrew Mccallum (1994)(Correct)
This paper presents instance-based state identification, an approach
to reinforcement learning and hidden state that builds disambiguating
amounts of short-term memory on-line, and also learns with an... / State Identification for Reinforcement Learning R. Andrew McCallum br an approach to reinforcement learning and hidden state that
18 A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov .. - Kearns, Mansour, Ng (1999)(Correct)
An issue that is critical for the application of
Markov decision processes (MDPs) to realistic
problems is how the complexity of planning
scales with the size of the MDP. In stochastic
environments wi... / traditional planning and reinforcement learning algorithms are often br processes MDPs and reinforcement learning have become a standard
18 Simulation-Based Optimization of Markov Reward Processes.. - Marbach, Tsitsiklis (1999)(Correct)
We consider discrete time, finite state space Markov reward processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the... / neuro-dynamic programming reinforcement learning in JSJ we refer to br P. Singh and M. I. Jordan. Reinforcement Learning Algorithm for Partially
17 Moving Furniture with Teams of Autonomous Robots - Rus, Donald, Jennings (1995)(Correct)
We wish to organize furniture in a room with a team
of robots that can push objects. We show how coordinated
pushing by robots can change the pose (position
and orientation) of objects and then we ask... / robots. In Par a reinforcement learning strategy that focuses on
17 MINERVA: A Second-Generation Museum Tour-Guide Robot - Thrun, Bennewitz, Burgard, Cremers.. (1999)(Correct)
This paper describes an interactive tour-guide robot, which
was successfully exhibited in a Smithsonian museum. During
its two weeks of operation, the robot interacted with
thousands of people, traver... / intents and employs reinforcement learning for tailoring its br Minerva used a memory-based reinforcement learning approach no delayed
17 Evolutionary Artificial Neural Networks - Yao (1993)(Correct)
Evolutionary Artificial Neural Networks (EANNs) can be considered as a combination of artificial neural networks (ANNs) and evolutionary search procedures, such as genetic algorithms (GAs). This paper... / feature selection genetic reinforcement learning initial weight br unsupervised and reinforcement learning. Supervised learning is
17 An incremental approach to developing intelligent neural network.. - Meeden (1995)(Correct)
By beginning with simple reactive behaviors and gradually building up to more memorydependent
behaviors, it may be possible for connectionist systems to eventually achieve the
level of planning. This ... / and a global method of reinforcement learning are contrasted-a special br And Compared. Iii. Reinforcement Learning Methods Robotics
17 Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma - Sandholm, Crites (1995)(Correct)
Reinforcement learning (RL) is based on the idea that the tendency to produce
an action should be strengthened (reinforced) if it produces favorable results, and
weakened if it produces unfavorable ... / Multiagent Reinforcement Learning in the Iterated br - - Abstract Reinforcement learning RL is based on the idea
17 Using Randomization to Break the Curse of Dimensionality - Rust (1996)(Correct)
This paper introduces random versions of successive approximations and multigrid algorithms for computing
approximate solutions to a class of finite and infinite horizon Markovian decision problems ... / convergence of stochastic reinforcement learning algorithms such as real
17 Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)(Correct)
We present new algorithms for reinforcement learning and prove
that they have polynomial bounds on the resources required to achieve
near-optimal return in general Markov decision processes. After o... / Near-Optimal Reinforcement Learning in Polynomial Time br present new algorithms for reinforcement learning and prove that they have
16 Using Marker-Based Genetic Encoding Of Neural Networks To Evolve.. - Fullmer (1991)(Correct)
A new mechanism for genetic encoding of neural
networks is proposed, which is loosely based on the
marker structure of biological DNA. The mechanism
allows all aspects of the network structure, includ... / requirement is relaxed in reinforcement learning where only an estimate
16 Towards Collaborative and Adversarial Learning: A Case Study in.. - Stone, Veloso (1997)(Correct)
Soccer is a rich domain for the study of multiagent learning issues. Not only must the players learn low-level skills, but they must also learn to work together and to adapt to the behaviors of differ... / papers describes a reinforcement learning agent which incorporates br Ford et al. used a Reinforcement Learning RL approach with sensory
16 Modular Neural Networks for Learning Context-Dependent Game Strategies - Boyan (1992)(Correct)
The method of temporal differences (TD) is a learning technique which specialises in predicting
the likely outcome of a sequence over time. Examples of such sequences include speech frame
vectors, who... / . . Reinforcement Learning br paradigm. TS . . Reinforcement Learning Tesauro was right.
16 Incorporating Advice into Agents that Learn from Reinforcements - Maclin (1994)(Correct)
Learning from reinforcements is a promising approach
for creating intelligent agents. However, reinforcement
learning usually requires a large number of training
episodes. We present an approach that ... / agents. However reinforcement learning usually requires a large br function. Subsequent reinforcement learning further integrates and
16 Online Learning with Random Representations - Sutton, Whitehead (1993)(Correct)
We consider the requirements of online learning---learning which must be done incrementally and in realtime, with the results of learning available soon after each new example is acquired. Despite the... / e.g.as components of reinforcement learning systems. Most of these br needed as components of reinforcement learning systems for example to
16 Generalization and Scaling in Reinforcement Learning - David Ackley (1990)(Correct)
In associative reinforcement learning, an environment generates input
vectors, a learning system generates possible output vectors, and a reinforcement
function computes feedback signals from the inpu... / and scaling in reinforcement learning David H. Ackley br ABSTRACT In associative reinforcement learning an environment generates
16 Layered Learning in Multi-Agent Systems - Stone (1998)(Correct)
Multi-agent systems in complex, real-time domains require agents to act e#ectively both autonomously
and as part of a team. This dissertation addresses multi-agent systems consisting
of teams of auton... / hierarchical learning reinforcement learning decision tree learning br a new multi-agent reinforcement learning algorithm namely
16 Problem Solving With Reinforcement Learning - Rummery (1995)(Correct)
This dissertation is submitted for consideration for the dwree
of Doctor' of Philosophy at the Uziver'sity of Cambr'idge
Summary
This thesis is concerned with practical issues surrounding the appli... / Problem Solving With Reinforcement Learning Gavin Adrian Rummery br problem. The resulting reinforcement learning system has the properties
15 Self-fulfilling Bias in Multiagent Learning - Hu, Wellman (1996)(Correct)
Learning in a multiagent environment is complicated by the fact that as other agents learn, the environment effectively changes. Moreover, other agents' actions are often not directly observable, and ... / Russell Norvig In reinforcement learning the initial hypothesis br investigated some form of reinforcement learning Tan Wei
15 Integrated Architectures for Learning, Planning, and Reacting Based.. - Sutton (1990)(Correct)
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforce... / integrate trial-and-error reinforcement learning and execution-time br Q-learning a new kind of reinforcement learning. Dyna-Q uses a less
15 Reinforcement Learning With HIGH-DIMENSIONAL, CONTINUOUS ACTIONS - III, Klopf (1993)(Correct)
Many reinforcement learning systems, such as Q-learning (Watkins, 1989), or advantage updating (Baird,
1993), require that a function f(x,u) be learned, and that the value of arg max
u
f x , u ( ) b... / Reinforcement Learning With br ABSTRACT Many reinforcement learning systems such as
15 Training Second-Order Recurrent Neural Networks using Hints - Omlin, Giles (1992)(Correct)
We investigate a method for inserting rules into discrete-time second-order recurrent neural networks which are trained to recognize regular languages. The rules defining regular languages can be expr... / task. Berenji uses reinforcement learning to refine reasoning-based br Controllers By Reinforcement Learning Proceedings of the
15 Learning Roles: Behavioral Diversity in Robot Teams - Tucker Balch (1997)(Correct)
This paper describes research investigating behavioral
specialization in learning robot teams. Each agent is
provided a common set of skills (motor schema-based
behavioral assemblages) from which it b... / strategy using reinforcement learning. The agents learn br in a task is available reinforcement learning can shift the burden of
15 Model-Based Learning for Mobile Robot Navigation from the Dynamical.. - Tani (1996)(Correct)
This paper discusses how a behavior-based robot can construct a "symbolic process
" that accounts for its deliberative thinking processes using models of the environment.
The paper focuses on two esse... / genetic programming reinforcement learning and others. These br space quantisation for reinforcement learning of collision-free
15 Learning Optimal Dialogue Strategies: A Case Study of a Spoken.. - Walker, Fromer, Narayanan (1998)(Correct)
This paper describes a novel method by which a dialogue
agent can learn to choose an optimal dialogue
strategy. While it is widely agreed that dialogue
strategies should be formulated in terms of comm... / is based on algorithms for reinforcement learning such as dynamic br S i derived Several reinforcement learning algorithms based on
15 Explanation-Based Learning and Reinforcement Learning: A Unified View - Dietterich, Flann (1995)(Correct)
In speedup-learning problems, where full descriptions
of operators are always known,
both explanation-based learning (EBL) and
reinforcement learning (RL) can be applied.
This paper shows that both me... / Learning and Reinforcement Learning A Unified View Thomas br learning EBL and reinforcement learning RL can be applied. This
15 Truncating Temporal Differences: On the Efficient Implementation of.. - Cichosz (1995)(Correct)
Temporal difference (TD) methods constitute a class of methods for learning predictions
in multi-step prediction problems, parameterized by a recency factor . Currently the most
important application ... / of TD for Reinforcement Learning Pawe Cichosz br credit assignment in reinforcement learning. Well known reinforcement
15 Imitative Learning Mechanisms in Robots and Humans - Demiris, Hayes (1996)(Correct)
We do not exist alone. Humans and most other animal species live in societies
where the behaviour of an individual influences and is influenced by other members of the
society. Within societies, an ... / learn any task through reinforcement learning Sutton but the br to learn these skills using reinforcement learning considerably. For
15 Reinforcement Learning Algorithms for Average-Payoff Markovian.. - Singh (1994)(Correct)
Reinforcement learning (RL) has become a central
paradigm for solving learning-control problems in
robotics and artificial intelligence. RL researchers
have focussed almost exclusively on problems whe... / pp. - . Reinforcement Learning Algorithms for br Abstract Reinforcement learning RL has become a central
15 A Comparison between Cellular Encoding and Direct Encoding for.. - Gruau, Whitley, Pyeatt (1996)(Correct)
This paper compares the efficiency of
two encoding schemes for Artificial Neural
Networks optimized by evolutionary
algorithms. Direct Encoding encodes
the weights for an a priori fixed neural
ne... / Introduction For reinforcement learning problems a training set of br Particularly difficult reinforcement learning problems are those that
15 Learning Policies with External Memory - Peshkin, Meuleau, Kaelbling (1999)(Correct)
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in... / Introduction A reinforcement-learning agent must learn a mapping br perform fairly well. Basic reinforcement-learning techniques such as
15 A Multistrategy Learning Scheme For Agent Knowledge Acquisition - Gordon, Subramanian (1993)(Correct)
this paper). Although
room does not permit listing them all, some examples
are:
TF bearing(X,Y) = right AND
turn(X) = left
THEN heading(Y,X) headon unknown Informatica 17 331
A MULTISTRATEGY LEARNI... / learning and reinforcement learning The second more br then refine them with reinforcement learn- ing and
15 Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent.. - Baxter, Weaver (1999)(Correct)
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate
approximations to the performance gradient of parameterized partially observable
Markov decision processes (POMDPs). unkn... / Direct Gradient-Based Reinforcement Learning II. Gradient Ascent br -based approaches to reinforcement learning is that it guarantees
15 Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation.. - Baxter, Bartlett (1999)(Correct)
Despite their many empirical successes, approximate value-function based approaches
to reinforcement learning suffer from a paucity of theoretical guarantees
on the performance of the policy generat... / Direct Gradient-Based Reinforcement Learning I. Gradient Estimation br based approaches to reinforcement learning suffer from a paucity of
14 Complexity Analysis of Real-Time Reinforcement Learning Applied to.. - Koenig, Simmons (1997)(Correct)
This report analyzes the complexity of on-line reinforcement learning algorithms,
namely asynchronous real-time versions of Q-learning and value-iteration, applied to
the problems of reaching any goal... / Analysis of Real-Time Reinforcement Learning Applied to Finding br Machine Learning Reinforcement Learning Learning Adaptation
14 Evolving Optimal Populations with XCS Classifier Systems - Kovacs (1996)(Correct)
This work investigates some uses of self-monitoring in classifier systems (CS) using Wilson's recent XCS system as a framework. XCS is a significant advance in classifier systems technology which shif... / payoff environment in the reinforcement learning tradition in contrast to br . Reinforcement Learning Problems . Payoff
14 Learning from Demonstration - Schaal (1997)(Correct)
By now it is widely accepted that learning a task from scratch, i.e., without
any prior knowledge, is a daunting undertaking. Humans, however, rarely
attempt to learn from scratch. They extract initia... / applied in the context of reinforcement learning. We consider priming the br problems only model-based reinforcement learning shows significant speed-up
14 Hidden State and Reinforcement Learning with Instance-Based State.. - Andrew Mccallum(Correct)
Real robots with real sensors are not omniscient. When a robot's
next course of action depends on information that is hidden from
the sensors because of problems such as occlusion, restricted range,
b... / Hidden State and Reinforcement Learning with Instance-Based State br a new approach to reinforcement learning with state identification
14 Some Studies in Distributed Machine Learning and Organizational Design - Weiss (1994)(Correct)
This article focusses on the intersection of distributed machine learning and organizational design in the context of multi-agent systems. A computational approach to distributed reinforcement learn... / approach to distributed reinforcement learning from experience and br structuring and distributed reinforcement learning from experience and
14 Scaling Up Average Reward Reinforcement Learning by Approximating the .. - Prasad Tadepalli (1996)(Correct)
Almost all the work in Average-reward Reinforcement
Learning (ARL) so far has focused
on table-based methods which do not
scale to domains with large state spaces. In
this paper, we propose two extens... / Scaling Up Average Reward Reinforcement Learning by Approximating the br the work in Average-reward Reinforcement Learning ARL so far has focused
14 Reinforcement Learning in Markovian and Non-Markovian Environments - Jürgen Schmidhuber(Correct)
This work addresses three problems with reinforcement learning and adaptive
neuro-control: 1. Non-Markovian interfaces between learner and environment.
2. On-line learning based on system realization.... / Reinforcement Learning in Markovian and br three problems with reinforcement learning and adaptive
14 Density-Adaptive Learning and Forgetting - Marcos Salganicoff (1993)(Correct)
We describe a density-adaptive reinforcement
learning and a density-adaptive forgetting algorithm.
This learning algorithm uses hybrid
k-D/2
k
-trees to allow for a variable resolution
partitioning... / describe a density-adaptive reinforcement learning and a density-adaptive br Density-Adaptive Reinforcement Learning DARLING for
14 Complexity Analysis of Real-Time Reinforcement Learning - Koenig, Simmons (1997)(Correct)
This paper analyzes the complexity of on-line reinforcement
learning algorithms, namely asynchronous realtime
versions of Q-learning and value-iteration, applied
to the problem of reaching a goal stat... / Analysis of Real-Time Reinforcement Learning Sven Koenig and Reid br the complexity of on-line reinforcement learning algorithms namely
14 Consideration of Risk in Reinforcement Learning - Heger (1994)(Correct)
Most Reinforcement Learning (RL) work supposes policies for sequential decision
tasks to be optimal that minimize the expected total discounted cost (e. g. Q-
Learning [Wat 89], AHC [Bar Sut And 83])... / Consideration of Risk in Reinforcement Learning Matthias Heger br Abstract Most Reinforcement Learning RL work supposes
14 Cellular Encoding Applied to Neurocontrol - Whitley, Gruau, Pyeatt (1995)(Correct)
Neural networks are trained for balancing 1
and 2 poles attached to a cart on a fixed
track. For one variant of the single pole system,
only pole angle and cart position variables
are supplied as ... / training neural networks is reinforcement learning. For these types of br C. Genetic Reinforcement Learning for Neurocontrol
14 Hierarchical Learning with Procedural Abstraction Mechanisms - Rosca (1997)(Correct)
Evolutionary computation (EC) consists of the design and analysis of probabilistic
algorithms inspired by the principles of natural selection and variation. Genetic Programming
(GP) is one subfield of... / . Reinforcement learning offers insights to br Bibliography A Reinforcement Learning B Minimum Description