This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
1514.2 Reinforcement Learning I: Introduction - Sutton, Barto (1998)(Correct)
Introduction
Richard S. Sutton and Andrew G. Barto
c fl All rights reserved
[In which we try to give a basic intuitive sense of what reinforcement
learning is and how it differs and relates to oth... / Course Notes Reinforcement Learning I Introduction br intuitive sense of what reinforcement learning is and how it differs and
684.0 Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)(Correct)
This paper surveys the field of reinforcement learning from a computer-science perspective. It
is written to be accessible to researchers familiar with machine learning. Both the historical basis
of t... / published Reinforcement Learning A Survey Leslie Pack br paper surveys the field of reinforcement learning from a computer-science
518.8 Learning to Act using Real-Time Dynamic Programming - Barto, Bradtke, Singh (1995)(Correct)
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning ... / aspects of other DP-based reinforcement learning methods such as Watkins' br algoithms are examples of reinforcement learning methods by which
514.2 Hierarchical Reinforcement Learning with the MAXQ Value Function.. - Dietterich (2000)(Correct)
This paper presents a new approach to hierarchical reinforcement learning based on decomposing
the target Markov decision process (MDP) into a hierarchy of smaller MDPs
and decomposing the value fun... / Hierarchical Reinforcement Learning with the MAXQ Value br approach to hierarchical reinforcement learning based on decomposing the
361.7 Machine Learning Research: Four Current Directions - Dietterich (1997)(Correct)
Machine Learning research has been making great progress in many directions. This article summarizes four of
these directions and discusses some current open problems. The four directions are (a) impr... / learning algorithms c reinforcement learning and d learning complex br learning algorithms c reinforcement learning and d learning complex
342.8 Actor-Critic Algorithms - Konda, Tsitsiklis (2001)(Correct)
In this paper, we propose and analyze a class of actor-critic algorithms. These are
two-time-scale algorithms in which the critic uses temporal dierence (TD) learning with a linearly
parameterized ... / dicult to identify. Reinforcement Learning RL and Neuro-Dynamic br approximation and reinforcement learning. SIAM Journal on Control
260.8 Improving Elevator Performance Using Reinforcement Learning - Crites, Barto (1996)(Correct)
This paper describes the application of reinforcement learning (RL)
to the difficult real world problem of elevator dispatching. The elevator
domain poses a combination of challenges not seen in most
... / Elevator Performance Using Reinforcement Learning Robert H. Crites br the application of reinforcement learning RL to the difficult
256.7 Markov games as a framework for multi-agent reinforcement learning - Littman (1994)(Correct)
In the Markov decision process (MDP) formalization
of reinforcement learning, a single adaptive
agent interacts with an environment defined by a
probabilistic transition function. In this solipsistic
... / a framework for multi-agent reinforcement learning Michael L. Littman br MDP formalization of reinforcement learning a single adaptive agent
254.5 Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in.. - Sutton, Precup, Singh (1999)(Correct)
Learning, planning, and representing knowledge at multiple levels
of temporal abstraction are key, longstanding challenges for AI. In
this paper we consider how these challenges can be addressed wit... / Temporal Abstraction in Reinforcement Learning Richard S. Sutton br mathematical framework of reinforcement learning and Markov decision
252.1 The Parti-game Algorithm for Variable Resolution Reinforcement.. - Moore, Atkeson (1995)(Correct)
Parti-game is a new algorithm for learning feasible trajectories to goal regions in
high dimensional continuous state-spaces. In high dimensions it is essential that learning does not
plan uniformly... / for Variable Resolution Reinforcement Learning in Multidimensional br few minutes. Keywords Reinforcement Learning Curse of Dimensionality
226.4 Automatic Programming of Behavior-based Robots using Reinforcement.. - Mahadevan, Connell (1991)(Correct)
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two a... / credit assignment in reinforcement learning. PhD thesis University br cooperative mechanisms in reinforcement learning. In Proceedings of the
223.1 Generalization in Reinforcement Learning: Successful Examples Using.. - Sutton (1996)(Correct)
On large problems, reinforcement learning systems must use parameterized
function approximators such as neural networks in order to generalize
between similar situations and actions. In these cases th... / . Generalization in Reinforcement Learning Successful Examples br On large problems reinforcement learning systems must use
218.1 Evolving Artificial Neural Networks - Yao (1999)(Correct)
Learning and evolution are two fundamental forms of adaptation. There has been a great
interest in combining learning and evolution with artificial neural networks (ANNs) in recent
years. This paper (... / unsupervised and reinforcement learning. Supervised learning is br to minimize the error. Reinforcement learning is a special case of
214.4 Prioritized Sweeping: Reinforcement Learning with Less Data and Less.. - Moore, Atkeson (1993)(Correct)
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic
Markov systems. Incremental learning methods such as Temporal Differencing and Qlearning
have fast ... / Prioritized Sweeping Reinforcement Learning with Less Data and Less br Sweeping with other reinforcement learning schemes for a number of
208.6 Generalization in Reinforcement Learning: Safely Approximating the.. - Boyan, Moore (1995)(Correct)
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural
Information Processing Systems 7, MIT Press, Cambridge MA, 1995.
A straightforward approach to the curse of dimension... / Generalization in Reinforcement Learning Safely Approximating the br curse of dimensionality in reinforcement learning and dynamic programming
205.7 Multiagent Reinforcement Learning: Theoretical Framework and an.. - Hu, Wellman (1998)(Correct)
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework... / Multiagent Reinforcement Learning Theoretical Framework and br a framework for multiagent reinforcement learning. Our work extends
200.0 Reinforcement Learning with Hierarchies of Machines - Parr, Russell (1997)(Correct)
We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of ... / Reinforcement Learning with Hierarchies of br present a new approach to reinforcement learning in which the policies
200.0 Learning policies for partially observable environments: Scaling up - Littman, Cassandra, Kaelbling (1995)(Correct)
Partially observable Markov decision processes (pomdp's) model
decision problems in which an agent tries to maximize its reward in
the face of limited and/or noisy sensor feedback. While the study of
... / promise and practice. Using reinforcement-learning techniques and insights br problems addressed in the reinforcement-learning literature Moore
197.9 Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents - Tan (1993)(Correct)
Intelligent human agents exist in a cooperative
social environment that facilitates
learning. They learn not only by trialand
-error, but also through cooperation by
sharing instantaneous information,... / Multi-Agent Reinforcement Learning Independent vs. br Given the same number of reinforcement learning agents will cooperative
188.5 Gradient Descent for General Reinforcement Learning - Baird, Moore (1998)(Correct)
A simple learning rule is derived, the VAPS algorithm, which can
be instantiated to generate a wide range of new reinforcementlearning
algorithms. These algorithms solve a number of open
problems, def... / Descent for General Reinforcement Learning Leemon Baird Andrew br generate a wide range of new reinforcementlearning algorithms. These
188.4 Cooperative Mobile Robotics: Antecedents and Directions - Cao, Fukunaga, Kahng, Meng (1995)(Correct)
There has been increased research interest in systems composed of multiple
autonomous mobile robots exhibiting collective behavior. Groups of
mobile robots are constructed, with an aim to studying suc... / fault-tolerance and reinforcement learning. By contrast DJR br the architecture that uses reinforcement learning to adjust the parameters
181.8 Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)(Correct)
Consider the task of exploring the Web in order to find pages of a
particular kind or on a particular topic. This task arises in the construction
of search engines and Web knowledge bases. This paper ... / Using Reinforcement Learning to Spider the Web br best framed and solved by reinforcement learning a branch of machine
179.7 Residual Algorithms: Reinforcement Learning with Function.. - Leemon Baird (1995)(Correct)
A number of reinforcement learning algorithms have
been developed that are guaranteed to converge to the
optimal solution when used with lookup tables. It is
shown, however, that these algorithms can ... / Residual Algorithms Reinforcement Learning with Function br ABSTRACT A number of reinforcement learning algorithms have been
176.8 Approximating Optimal Policies for Partially Observable Stochastic.. - Parr, Russell (1995)(Correct)
The problem of making optimal decisions in uncertain
conditions is central to Artificial Intelligence.
If the state of the world is known at all times, the
world can be modeled as a Markov Decision Pr... / can be combined with reinforcement learning methods a combination br rule that is amenable to reinforcement learning methods and will permit
173.1 Feudal Reinforcement Learning - Dayan, Hinton (1993)(Correct)
One way to speed up reinforcement learning is to enable learning
to happen simultaneously at multiple resolutions in space and
time. This paper shows how to create a Q-learning managerial
hierarchy... / San Mateo CA Feudal Reinforcement Learning Peter Dayan CNL The br One way to speed up reinforcement learning is to enable learning to
171.4 Approximate Planning in Large POMDPs via Reusable Trajectories - Kearns, Mansour, Ng (2000)(Correct)
We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies in a partially observable Markov decision process (POMDP). We assume we are given the ability ... / to the settings of reinforcement learning and planning. br learning to the settings of reinforcement learning and planning and we give
168.1 Reinforcement Learning Algorithm for Partially Observable Markov.. - Tommi Jaakkola (1995)(Correct)
Increasing attention has been paid to reinforcement learning algorithms
in recent years, partly due to successes in the theoretical
analysis of their behavior in Markov environments. If the Markov
ass... / Reinforcement Learning Algorithm for Partially br attention has been paid to reinforcement learning algorithms in recent
165.9 Reinforcement Learning for Dynamic Channel Allocation in Cellular.. - Satinder Singh (1997)(Correct)
In cellular telephone systems, an important problem is to dynamically
allocate the communication resource (channels) so as to maximize
service in a stochastic caller environment. This problem is
natur... / Reinforcement Learning for Dynamic Channel br problem and we use a reinforcement learning RL method to find
163.6 A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov .. - Kearns, Mansour, Ng (1999)(Correct)
An issue that is critical for the application of
Markov decision processes (MDPs) to realistic
problems is how the complexity of planning
scales with the size of the MDP. In stochastic
environments wi... / traditional planning and reinforcement learning algorithms are often br processes MDPs and reinforcement learning have become a standard
163.6 Simulation-Based Optimization of Markov Reward Processes.. - Marbach, Tsitsiklis (1999)(Correct)
We consider discrete time, finite state space Markov reward processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the... / neuro-dynamic programming reinforcement learning in JSJ we refer to br P. Singh and M. I. Jordan. Reinforcement Learning Algorithm for Partially
161.7 Soccer Server: a tool for research on multi-agent systems - Noda, Matsubara, Hiraki, Frank (1997)(Correct)
This paper describes Soccer Server, a simulator of the game of soccer designed as a test-bench for evaluating multi-agent systems and cooperative algorithms. In real life, successful soccer teams requ... / colleagues have been using reinforcement learning to develop the skills of a br and K. Hosoda. Vision-based reinforcement learning for purposive behavior
159.9 The Dynamics of Reinforcement Learning in Cooperative Multiagent.. - Claus, Boutilier (1998)(Correct)
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dy... / The Dynamics of Reinforcement Learning in Cooperative Multiagent br Abstract Reinforcement learning can provide a robust and
159.9 The MAXQ Method for Hierarchical Reinforcement Learning - Dietterich (1998)(Correct)
This paper presents a new approach to hierarchical
reinforcement learning based on the
MAXQ decomposition of the value function.
The MAXQ decomposition has both a procedural
semantics---as a subroutin... / Method for Hierarchical Reinforcement Learning Thomas G. Dietterich br approach to hierarchical reinforcement learning based on the MAXQ
159.4 Reinforcement Learning with Replacing Eligibility Traces - Singh (1996)(Correct)
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze i... / in The Netherlands. Reinforcement Learning with Replacing br basic mechanisms used in reinforcement learning to handle delayed reward.
157.1 Automating the Construction of Internet Portals with Machine Learning - McCallum, Nigam, Rennie, al. (2000)(Correct)
Domain-specic internet portals are growing in popularity because
they gather content from the Web and organize it for easy access, retrieval and
search. For example, www.campsearch.com allows compl... / We describe new research in reinforcement learning information extraction br spidering crawling reinforcement learning information extraction
156.5 A Reinforcement Learning Approach to Job-shop Scheduling - Zhang (1995)(Correct)
We apply reinforcement learning methods to
learn domain-specific heuristics for job shop
scheduling. A repair-based scheduler starts
with a critical-path schedule and incrementally
repairs constraint ... / A Reinforcement Learning Approach to Job-shop br A. Abstract We apply reinforcement learning methods to learn
154.5 MINERVA: A Second-Generation Museum Tour-Guide Robot - Thrun, Bennewitz, Burgard, Cremers.. (1999)(Correct)
This paper describes an interactive tour-guide robot, which
was successfully exhibited in a Smithsonian museum. During
its two weeks of operation, the robot interacted with
thousands of people, traver... / intents and employs reinforcement learning for tailoring its br Minerva used a memory-based reinforcement learning approach no delayed
151.7 Reinforcement Learning with Perceptual Aliasing: The Perceptual.. - Chrisman (1992)(Correct)
It is known that Perceptual Aliasing may significantly
diminish the effectiveness of reinforcement
learning algorithms [ Whitehead and Ballard,
1991 ] . Perceptual aliasing occurs when multiple
situat... / Reinforcement Learning with Perceptual Aliasing br the effectiveness of reinforcement learning algorithms Whitehead
144.6 Reinforcement Learning in the Multi-Robot Domain - Mataric (1997)(Correct)
This paper describes a formulation of reinforcement learning that
enables learning in noisy, dynamic environemnts such as in the complex
concurrent multi-robot learning domain. The methodology involve... / Reinforcement Learning in the Multi-Robot Domain br describes a formulation of reinforcement learning that enables learning in
143.2 Learning to coordinate without sharing information - Sen, Sekaran, Hale (1994)(Correct)
Researchers in the field of Distributed Artificial Intelligence (DAI) have been developing efficient mechanisms to coordinate the activities of multiple autonomous agents. The need for coordination ar... / coordination. We use reinforcement learning techniques on a block br on similar problems. Reinforcement learning based coordination can be
143.1 Simple Statistical Gradient-Following Algorithms for Connectionist.. - Williams (1992)(Correct)
This article presents a general class of associative reinforcement learning algorithms for
connectionist networks containing stochastic units. These algorithms, called REINFORCE
algorithms, are show... / for Connectionist Reinforcement Learning Ronald J. Williams br class of associative reinforcement learning algorithms for
142.8 Reinforcement Learning in POMDP's via Direct Gradient Ascent - Baxter, Bartlett (2000)(Correct)
This paper discusses theoretical and experimental
aspects of gradient-based approaches to the
direct optimization of policy performance in controlled
POMDPs. We introduce GPOMDP, a
REINFORCE-like... / Reinforcement Learning in POMDP's via Direct br . Introduction Reinforcement learning is used to describe the
142.8 Monte Carlo POMDPs - Thrun (2000)(Correct)
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for re... / for belief propagation. A reinforcement learning algorithm value br and propagation. Reinforcement learning in belief space is
142.8 Solving POMDPs by Searching in Policy Space - Hansen (1998)(Correct)
Most algorithms for solving POMDPs iteratively
improve a value function that implicitly
represents a policy and are said to search
in value function space. This paper presents
an approach to solvi... / using value iteration or reinforcement learning. Because the policy is
139.1 Classifier Fitness Based on Accuracy - Wilson (1995)(Correct)
In many classifier systems, the classifier strength parameter serves as a predictor of
future payoff and as the classifier's fitness for the genetic algorithm. We investigate
a classifier system, XCS,... / for a wide range of reinforcement learning situations where br for a wide range of reinforcement learning situations where
136.3 Learning Policies with External Memory - Peshkin, Meuleau, Kaelbling (1999)(Correct)
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in... / Introduction A reinforcement-learning agent must learn a mapping br perform fairly well. Basic reinforcement-learning techniques such as
136.3 Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation.. - Baxter, Bartlett (1999)(Correct)
Despite their many empirical successes, approximate value-function based approaches
to reinforcement learning suffer from a paucity of theoretical guarantees
on the performance of the policy generat... / Direct Gradient-Based Reinforcement Learning I. Gradient Estimation br based approaches to reinforcement learning suffer from a paucity of
136.3 Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent.. - Baxter, Weaver (1999)(Correct)
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate
approximations to the performance gradient of parameterized partially observable
Markov decision processes (POMDPs). unkn... / Direct Gradient-Based Reinforcement Learning II. Gradient Ascent br -based approaches to reinforcement learning is that it guarantees
136.2 Stable Function Approximation in Dynamic Programming - Gordon (1995)(Correct)
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area ... / Abstract The success of reinforcement learning in practical problems br W. Moore. Generalization in reinforcement learning safely approximating the
131.4 Hierarchical Control and Learning for Markov Decision Processes - Parr (1998)(Correct)
Hierarchical Control and Learning
for
Markov Decision Processes
by
Ronald Edward Parr
Doctor of Philosophy in Computer Science
University of California at Berkeley
Professor Stuart Russell, Cha... / . Reinforcement Learning Methods br . Reinforcement learning with HAMs
130.4 Incremental Multi-Step Q-Learning - Peng, Williams (1996)(Correct)
This paper presents a novel incremental algorithm
that combines Q-learning, a wellknown
dynamic programming-based reinforcement
learning method, with the TD()
return estimation process, which is typic... / dynamic programming-based reinforcement learning method with the TD br dynamic programming-based reinforcement learning method. The parameter
129.3 Artificial Life and Real Robots - Brooks (1992)(Correct)
The first part of this paper explores the general issues in using Artificial Life techniques to program actual mobile robots. In particular it explores the difficulties inherent in transferring progra... / new behaviors using reinforcement learning e.g.Kaelbling and br Behavior-based Robots using Reinforcement Learning Sridhar Mahadevan and
128.5 Learning to Cooperate via Policy Search - Peshkin, Meuleau, Kaelbling (2000)(Correct)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperativ... / structure. Valuebased reinforcement-learning algorithms such as br known to the agents. In reinforcement learning no explicit model
127.5 Algorithms for Sequential Decision Making - Littman (1996)(Correct)
of "Algorithms for Sequential Decision Making"
by Michael Lederman Littman, Ph.D., Brown University, May 1996. unknown Michael Lederman Liftman
Ph.D. Dissertation
Department of Computer Science
Br... / anyone makes the field of reinforcement learning a nice place to work. br Justin Boyan games and reinforcement learning Anne Condon solving
119.9 Simulation-Based Optimization of Markov Reward Processes - Marbach, Tsitsiklis (1998)(Correct)
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Proce... / go under the names of reinforcement learning or neuro-dynamic br Singh and M. I. Jordan Reinforcement Learning Algorithm for Partially
114.2 Autonomous Helicopter Control using Reinforcement Learning Policy.. - Bagnell, Schneider (2001)(Correct)
Many control problems in the robotics field
can be cast as Partially Observed Markovian Decision
Problems (POMDPs), an optimal control formalism.
Finding optimal solutions to such problems in general,... / Helicopter Control using Reinforcement Learning Policy Search Methods br Traditional model-based reinforcement learning algorithms make a
114.2 Stochastic Search for Signal Processing Algorithm Optimization - Singer, Veloso (2001)(Correct)
Many difficult problems can be viewed as search problems. However, given a new task with an embedded
search problem, it is challenging to state and find a truly effective search approach. In this pape... / and Littman use reinforcement learning to learn to select br Algorithm selection using reinforcement learning. In Proceedings of
114.2 Convergence Results for Single-Step On-Policy Reinforcement-Learning.. - Singh, Jaakkola, al. (1998)(Correct)
An important application of reinforcement learning (RL) is to finite-state control
problems and one of the most difficult problems in learning for control is balancing the exploration
/exploitation ... / for Single-Step On-Policy Reinforcement-Learning Algorithms SATINDER br An important application of reinforcement learning RL is to finite-state
113.5 Efficient Algorithms for Minimizing Cross Validation Error - Moore, Lee (1994)(Correct)
Model selection is important in many areas of
supervised learning. Given a dataset and a set
of models for predicting with that dataset, we
must choose the model which is expected to best
predict futu... / exploitation dilemma in reinforcement learning. Greiner and Jurisica
109.0 Approximate Solutions to Markov Decision Processes - Gordon (1999)(Correct)
One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence... / machine learning reinforcement learning dynamic programming br Baird. Residual algorithms Reinforcement learning with function
109.0 Building Domain-Specific Search Engines with Machine Learning.. - McCallum, Nigam, Rennie, Seymore (1999)(Correct)
Domain-specific search engines are becoming increasingly
popular because they offer increased accuracy
and extra features not possible with the
general, Web-wide search engines. For example,
www.camps... / describe new research in reinforcement learning text classification and br engine creation using reinforcement learning text classification and
109.0 General Principles Of Learning-Based Multi-Agent Systems - Wolpert, Wheeler, al. (1999)(Correct)
We consider the problem of how to design large decentralized
multi-agent systems (MAS's) in an automated fashion, with
little or no hand-tuning. Our approach has each agent run
a reinforcement learnin... / has each agent run a reinforcement learning algorithm. This converts br ffl the agents each run reinforcement learning RL algorithms ffl
106.8 Transfer of Learning by Composing Solutions of Elemental Sequential.. - Singh (1992)(Correct)
Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focussed on singl... / tasks most applications of reinforcement learning have focussed on single br application of reinforcement learning to multiple tasks requires
104.3 Purposive Behavior Acquisition for a Real Robot by Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1996)(Correct)
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a ... / Real Robot By Vision-Based Reinforcement Learning Minoru Asada Shoichi br a method of vision-based reinforcement learning by which a robot learns to
102.1 Robot Shaping: Experiment In Behavior Engineering - Dorigo, Colombetti (1997)(Correct)
its performance. In
fact, we use the expression robot shaping to denote the use of learning as a means to translate
suggestions coming from an external trainer into an effective control strategy that... / is an approach based on reinforcement learning with reinforcements br we have experimented with reinforcement learning RL RL can be seen as a
102.1 ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods.. - Menczer (1997)(Correct)
ARACHNID is a distributed algorithm for information discovery in large, dynamic, distributed environments such as the World Wide Web. The approach is based on a distributed, adaptive population of int... / the user on the basis of reinforcement learning Armstrong et al. br agent's reproductive cycle. Reinforcement learning is the natural extension
100.0 Hierarchic Social Entropy: An Information Theoretic Measure of Robot.. - Balch (2000)(Correct)
As research expands in multiagent intelligent systems, investigators need new tools for evaluating
the artificial societies they study. It is impossible, for example, to correlate heterogeneity
with... / similar agents that use reinforcement learning to develop behavioral br policies developed using reinforcement learning techniques since once
98.7 Packet Routing in Dynamically Changing Networks: A Reinforcement.. - Boyan, Littman (1994)(Correct)
This paper describes the Q-routing algorithm for packet routing,
in which a reinforcement learning module is embedded into each
node of a switching network. Only local communication is used
by each no... / Changing Networks A Reinforcement Learning Approach Justin A. Boyan br packet routing in which a reinforcement learning module is embedded into
98.5 Reinforcement Learning with Soft State Aggregation - Singh, Jaakkola, Jordan (1995)(Correct)
It is widely accepted that the use of more compact representations
than lookup tables is crucial to scaling reinforcement learning (RL)
algorithms to real-world problems. Unfortunately almost all of t... / Reinforcement Learning with Soft State br is crucial to scaling reinforcement learning RL algorithms to
97.1 Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)(Correct)
We present new algorithms for reinforcement learning and prove
that they have polynomial bounds on the resources required to achieve
near-optimal return in general Markov decision processes. After o... / Near-Optimal Reinforcement Learning in Polynomial Time br present new algorithms for reinforcement learning and prove that they have
96.5 The Role Of Exploration In Learning Control - Thrun (1992)(Correct)
Introduction
Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to
be combined. On the one hand, the environment must be sufficiently explored in ord... / adaptive neurocontrol and reinforcement learning. In Section we discuss br trade-off Kaelbling reinforcement learning Watkins
92.7 Learning to Use Selective Attention and Short-Term Memory in.. - McCallum (1996)(Correct)
This paper presents U-Tree, a reinforcement learning
algorithm that uses selective attention and shortterm
memory to simultaneously address the intertwined
problems of large perceptual state spaces an... / paper presents U-Tree a reinforcement learning algorithm that uses br question How can a reinforcement learning agent successfully learn
92.7 Instance-Based Utile Distinctions for Reinforcement Learning with.. - Andrew Mccallum (1995)(Correct)
We present Utile Suffix Memory, a reinforcement
learning algorithm that uses short-term memory
to overcome the state aliasing that results from
hidden state. By combining the advantages of
previous wo... / Utile Distinctions for Reinforcement Learning with Hidden State R. br Utile Suffix Memory a reinforcement learning algorithm that uses
92.7 Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)(Correct)
Reinforcement learning addresses the problem of learning to select actions in order to
maximize one's performance in unknown environments. To scale reinforcement learning
to complex real-world tasks, ... / Finding Structure in Reinforcement Learning Sebastian Thrun br eds. Abstract Reinforcement learning addresses the problem of
91.4 Layered Learning in Multi-Agent Systems - Stone (1998)(Correct)
Multi-agent systems in complex, real-time domains require agents to act e#ectively both autonomously
and as part of a team. This dissertation addresses multi-agent systems consisting
of teams of auton... / hierarchical learning reinforcement learning decision tree learning br a new multi-agent reinforcement learning algorithm namely
91.3 Memoryless Policies: Theoretical Limitations and Practical Results - Michael L. Littman (1994)(Correct)
One form of adaptive behavior is "goal-seeking"
in which an agent acts so as to minimize the time
it takes to reach a goal state. This paper presents
some theoretical and empirical findings on algorit... / and more recently by reinforcement learning researchers e.g. br A classic example from the reinforcement learning literature is Sutton's
90.9 Learning with Mixtures of Trees - Meila-Predoviciu (1999)(Correct)
One of the challenges of density estimation as it is used in machine learning is that usually
the data are multivariate and often the dimensionality is large. Operating with joint
distributions over m... / fostered my interest in reinforcement learning statistics graphical
90.9 Multiagent Reinforcement Learning in Stochastic Games - Hu, Wellman (1999)(Correct)
We adopt stochastic games as a general framework for dynamic
noncooperative systems. This framework provides a way of describing
the dynamic interactions of agents in terms of individuals' Markov
deci... / Multiagent Reinforcement Learning in Stochastic Games br we design a multiagent reinforcement learning method which allows
90.9 Experience-weighted Attraction Learning in Normal Form Games - Camerer, Ho (1999)(Correct)
We describe a general model, `experience-weighted attraction' #EWA# learning, which
includes reinforcement learning and a class of weighted #ctitious play belief models as
special cases. In EWA, strat... / learning which includes reinforcement learning and a class of weighted br behavioral game theory reinforcement learning ctitious play.
90.7 An Adaptive Communication Protocol for Cooperating Mobile Robots - Yanco, Stein (1993)(Correct)
We describe mobile robots engaged in a cooperative task that requires communication. The robots are initially given a fixed but uninterpreted vocabulary for communication. In attempting to perform the... / the design of appropriate reinforcement learning algorithms to learn br his symbolic test suite for reinforcement learning algorithms. Work on the
88.6 Overcoming Incomplete Perception with Utile Distinction Memory - McCallum (1993)(Correct)
This paper presents a method by which a
reinforcement learning agent can solve the
incomplete perception problem using memory.
The agent uses a hidden Markov model
(HMM) to represent its internal stat... / a method by which a reinforcement learning agent can solve the br will build a In reinforcement learning good task performance is
86.4 Learning Without State-Estimation in Partially Observable Markovian.. - Singh, Jaakkola, Jordan (1994)(Correct)
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see ... / Abstract Reinforcement learning RL algorithms provide a br state of the environment. Reinforcement learning RL techniques provide a
85.7 Infinite-Horizon Policy-Gradient Estimation - Baxter, Bartlett (2001)(Correct)
Gradient-based approaches to direct policy search in reinforcement learning have received
much recent attention as a means to solve problems of partial observability and to avoid some of
the problem... / to direct policy search in reinforcement learning have received much recent br tend to go by the name Reinforcement Learning and have been
85.7 An Algorithmic Description of XCS - Butz, Wilson (2001)(Correct)
A concise description of the XCS classifier system's parameters, structures, and algorithms is presented as an aid to research. The algorithms are written in modularly structured pseudo code with acco... / Due to the Q-learning-like reinforcement learning in XCS payo does not br selection in the reinforcement learning literature Sutton
85.7 Temporal Abstraction in Reinforcement Learning - Precup (2000)(Correct)
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions r... / Temporal Abstraction In Reinforcement Learning A Dissertation Presented
85.7 Learning Optimal Dialogue Strategies: A Case Study of a Spoken.. - Walker, Fromer, Narayanan (1998)(Correct)
This paper describes a novel method by which a dialogue
agent can learn to choose an optimal dialogue
strategy. While it is widely agreed that dialogue
strategies should be formulated in terms of comm... / is based on algorithms for reinforcement learning such as dynamic br S i derived Several reinforcement learning algorithms based on
82.4 Efficient Learning and Planning Within the Dyna Framework - Peng, Williams (1993)(Correct)
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhanc... / be cast in the form of reinforcement learning tasks. Recent work in br and the creation of new reinforcement learning algorithms such as
81.8 Distributed Value Functions - Schneider, Wong, Moore, Riedmiller (1999)(Correct)
Many interesting problems, such as power
grids, network switches, and traffic flow, that
are candidates for solving with reinforcement
learning (RL), also have properties that make
distributed solutio... / candidates for solving with reinforcement learning RL also have br algorithm for distributed reinforcement learning based on distributing the
81.4 Continual Learning In Reinforcement Environments - Ring (1994)(Correct)
Continual learning is the constant development of complex behaviors with no final end in
mind. It is the process of learning ever more complicated skills by building on those skills already
developed.... / on the sections involving reinforcement learning. Thanks also to Risto br complicated non-Markovian reinforcement-learning tasks and can then
80.4 Tight Performance Bounds on Greedy Policies Based on Imperfect Value.. - Williams (1993)(Correct)
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal va... / result from applying a reinforcement learning algorithm. Unless this br error typically used in reinforcement learning applications. The
80.0 Accelerated Focused Crawling through Online Relevance Feedback - Chakrabarti, Punera, Subramanyam (2002)(Correct)
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that... / Document object model Reinforcement learning. Introduction br paradigm is related to reinforcement learning and AI programs that
78.2 Discovery of Subroutines in Genetic Programming - Rosca, Ballard (1996)(Correct)
Introduction
Hierarchical Genetic Programming (HGP) extensions discover, modify, and exploit subroutines
to accelerate the evolution of programs [Koza 1992, Rosca and Ballard 1994a] .
The use of subr... / in the larger context of reinforcement learning problems. Finally br the fitness of subroutines. Reinforcementlearning RL algorithms such as
78.2 Multiagent coordination with learning classifier systems - Sen, Sekaran (1996)(Correct)
this paper, we evaluate a particular reinforcement
learning methodology, a genetic algorithm based
machine learning mechanism known as classifier systems
[ Holland, 1986 ] for developing action polici... / agents. We have used reinforcement learning Barto et al. br we evaluate a particular reinforcement learning methodology a genetic
78.2 Adaptive Load Balancing: A Study in Multi-Agent Learning - Schaerf, Shoham, Tennenholtz (1995)(Correct)
We study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first defi... / the process of multi-agent reinforcement learning in the context of load br investigates multi-agent reinforcement learning in the context of a
78.2 On the Complexity of Solving Markov Decision Problems - Littman, Dean, Kaelbling (1995)(Correct)
Markov decision problems (MDPs) provide
the foundations for a number of problems
of interest to AI researchers studying automated
planning and reinforcement learning.
In this paper, we summarize resul... / automated planning and reinforcement learning. In this paper we br planning reinforcement learning and other sequential
77.5 Efficient Exploration In Reinforcement Learning - Sebastian B. Thrun (1992)(Correct)
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration
in active learning and describes several local techniques for exploration in finite, di... / Efficient Exploration In Reinforcement Learning Sebastian B. Thrun br domains embedded in a reinforcement learning framework delayed
76.5 ZCS: A Zeroth Level Classifier System - Wilson (1994)(Correct)
A basic classifier system, ZCS, is presented which keeps much of Holland's original
framework but simplifies it to increase understandability and performance.
ZCS's relation to Q-learning is brought o... / on the related field of reinforcement learning Barto efforts to br under the heading of reinforcement learning and appears to provide a
75.3 Average Reward Reinforcement Learning: Foundations, Algorithms, and.. - Mahadevan (1996)(Correct)
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounte... / Average Reward Reinforcement Learning Foundations Algorithms br study of average reward reinforcement learning an undiscounted
75.3 MIMIC: Finding Optima by Estimating Probability Densities - De Bonet, Isbell, Jr., Viola (1996)(Correct)
In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely... / by Sabes and Jordan for a reinforcement learning task Sabes and Jordan br and Jordan M. I. Reinforcement learning by probability matching.
74.2 Using Decision Tree Confidence Factors for Multiagent Control - Stone, Veloso (1998)(Correct)
Although Decision Trees are widely used for classification
tasks, they are typically not used for agent control. This paper presents
a novel technique for agent control in a complex multiagent domai... / module for instance a reinforcement learning module to learn whether br acquired by vision-based reinforcement learning. In Proc. of IEEE RSJ GI
74.2 Adaptive Agent-Driven Routing and Load Balancing in Communication.. - Heusse, Snyers, Guérin, Kuntz (1998)(Correct)
This paper presents an unified overview of a new family of distributed algorithms
for routing and load balancing in dynamic communication networks.
These new algorithms are described as an extension t... / vector algorithm based on reinforcement learning The routing policy br its destination. Following reinforcement learning the estimates can be
72.7 Adaptive Retrieval Agents: Internalizing Local Context and Scaling up .. - Menczer, Belew (1999)(Correct)
This paper focuses on two machine learning abstractions springing from ecological models: (i) evolutionary adaptation by local selection, and (ii) selective query expansion by internalization of env... / selection internalization reinforcement learning Q-learning neural br learned solutions e.g.by reinforcement learning cannot capture global
72.7 Learning Finite-State Controllers for Partially Observable.. - Meuleau, Peshkin, Kim, Kaelbling (1999)(Correct)
Reactive (memoryless) policies are sufficient
in completely observable Markov decision processes
(MDPs), but some kind of memory is
usually necessary for optimal control of a partially
observable... / successful application of reinforcement learning RL to real world br a direct modelfree reinforcement learning RL algorithm learn a
72.7 Neuro-Fuzzy Systems for Function Approximation - Nauck, Kruse (1999)(Correct)
We propose a neuro--fuzzy architecture for function approximation based on supervised
learning. The learning algorithm is able to determine the structure and the
parameters of a fuzzy system. The appr... / and is trained by reinforcement learning based on a fuzzy error br supervised learning i.e. reinforcement learning. On the other hand if
71.4 Practical Reinforcement Learning in Continuous Spaces - Smart, Kaelbling (2000)(Correct)
Dynamic control tasks are good candidates
for the application of reinforcement learning
techniques. However, many of these
tasks inherently have continuous state or action
variables. This can caus... / Practical Reinforcement Learning in Continuous Spaces br for the application of reinforcement learning techniques. However many
69.5 High-Performance Job-Shop Scheduling With A Time-Delay TD(lambda).. - Zhang, Dietterich (1995)(Correct)
Job-shop scheduling is an important task for manufacturing industries.
We are interested in the particular task of scheduling payload
processing for NASA's space shuttle program. This paper summarizes... / task for solution by the reinforcement learning algorithm TD A br Navigation and Planning Reinforcement Learning Presentation
69.1 Temporal Difference Learning of Position Evaluation in the Game of Go - Schraudolph, Dayan, Sejnowski (1994)(Correct)
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. D... / In order to demonstrate reinforcement learning as a viable alternative to br Credit Assignment in Reinforcement Learning. PhD thesis University
68.5 Relational Reinforcement Learning - Dzeroski, De Raedt, Blockeel (1998)(Correct)
Relational reinforcement learning is presented,
a learning technique that combines
reinforcement learning with relational learning
or inductive logic programming. Due to
the use of a more expressi... / Relational Reinforcement Learning Saso Dzeroski br Abstract Relational reinforcement learning is presented a learning
68.0 Towards Collaborative and Adversarial Learning: A Case Study in.. - Stone, Veloso (1997)(Correct)
Soccer is a rich domain for the study of multiagent learning issues. Not only must the players learn low-level skills, but they must also learn to work together and to adapt to the behaviors of differ... / papers describes a reinforcement learning agent which incorporates br Ford et al. used a Reinforcement Learning RL approach with sensory
66.6 A Machine Learning Architecture for Optimizing Web Search Engines - Boyan, Freitag, Joachims (1996)(Correct)
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing... / a novel one inspired by reinforcement learning techniques for propagating br motivated by an analogy to reinforcement learning as studied in artificial
66.6 A Robot Controller Using Learning by Imitation - Hayes, Demiris (1994)(Correct)
Roboticists have already invested considerable energy in building robot controllers which model the
learning capacities of single animals. In this paper we present a new type of controller which dra... / computationally expensive reinforcement learning stage is permissible it br a negotiation strategy. A reinforcement learning module could be useful
65.9 Hierarchical Learning in Stochastic Domains: Preliminary Results - Kaelbling (1993)(Correct)
This paper presents the HDG learning algorithm,
which uses a hierarchical decomposition of the
state space to make learning to achieve goals
more efficient with a small penalty in path quality.
Sp... / INTRODUCTION Reinforcement learning is a general tool for br A crucial problem in reinforcement learning is temporal credit
63.8 Explanation-Based Learning and Reinforcement Learning: A Unified View - Dietterich, al. (1997)(Correct)
In speedup-learning problems, where full descriptions of operators are known, both
explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This
paper shows that both... / Learning and Reinforcement Learning A Unified View THOMAS br learning EBL and reinforcement learning RL methods can be
63.8 Learning Roles: Behavioral Diversity in Robot Teams - Tucker Balch (1997)(Correct)
This paper describes research investigating behavioral
specialization in learning robot teams. Each agent is
provided a common set of skills (motor schema-based
behavioral assemblages) from which it b... / strategy using reinforcement learning. The agents learn br in a task is available reinforcement learning can shift the burden of
63.7 Action Selection methods using Reinforcement Learning - Mark Humphrys University (1996)(Correct)
Action Selection schemes, when translated into
precise algorithms, typically involve considerable
design effort and tuning of parameters. Little
work has been done on solving the problem using
lea... / Selection methods using Reinforcement Learning Mark Humphrys br selection problem using Reinforcement Learning learning from rewards
63.7 Emergent Adaptive Lexicons - Steels (1996)(Correct)
The paper reports experiments to test the hypothesis
that language is an autonomous evolving
adaptive system maintained by a group of distributed
agents without central control. The experiments
show h... / small group of robots using reinforcement learning. Again the size of the
63.7 ALECSYS and the AutonoMouse: Learning to Control a Real Robot by.. - Dorigo (1995)(Correct)
In this article we investigate the feasibility of using learning classifier systems as a tool for building
adaptive control systems for real robots. Their use on real robots imposes efficiency constra... / Classifier Systems Reinforcement Learning Genetic Algorithms br this article belongs to the reinforcement learning research field. Holland's
63.6 Stochastic Dynamic Programming with Factored Representations - Boutilier, Dearden, al. (1999)(Correct)
Markov decision processes(MDPs) haveproven to be popular models for decision-theoretic planning,
but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specificati... / for most work in reinforcement learning MDPs br problem in the context of reinforcement learning in addition Their
63.6 Least-Squares Temporal Difference Learning - Boyan (1999)(Correct)
TD() is a popular family of algorithms
for approximate policy evaluation in large
MDPs. TD() works by incrementally updating
the value function after each observed
transition. It has two major drawbac... / of LSTD as a model-based reinforcement learning technique. BACKGROUND br model-free to a model-based reinforcement learning algorithm. This
63.6 OBDD-based Universal Planning: Specifying and Solving Planning.. - Jensen, Veloso (1999)(Correct)
Recently model checking representation and search techniques
were shown to be efficiently applicable to planning, in particular
to non-deterministic planning. Such planning approaches use Ordered
B... / resembles the outcome of reinforcement learning in that the br valid sequence of actions. Reinforcement Learning RL can also be
63.6 Reinforcement Learning for Spoken Dialogue Systems - Singh, Kearns, Litman, Walker (1999)(Correct)
Recently, a number of authors have proposed treating dialogue systems as Markov
decision processes (MDPs). However, the practical application of MDP algorithms
to dialogue systems faces a number of se... / Preference ORAL Reinforcement Learning for Spoken Dialogue br software tool RLDS for Reinforcement Learning for Dialogue Systems
61.7 Learning To Solve Markovian Decision Processes - Singh (1994)(Correct)
LEARNING TO SOLVE MARKOVIAN DECISION PROCESSES February 1994 Satinder P. Singh B.Tech., INDIAN INSTITUTE OF TECHNOLOGY NEW DELHI M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHU... / researchers have developed reinforcement learning RL algorithms based on br . Why Reinforcement Learning
59.5 Hierarchical Learning with Procedural Abstraction Mechanisms - Rosca (1997)(Correct)
Evolutionary computation (EC) consists of the design and analysis of probabilistic
algorithms inspired by the principles of natural selection and variation. Genetic Programming
(GP) is one subfield of... / . Reinforcement learning offers insights to br Bibliography A Reinforcement Learning B Minimum Description
59.5 Complexity Analysis of Real-Time Reinforcement Learning Applied to.. - Koenig, Simmons (1997)(Correct)
This report analyzes the complexity of on-line reinforcement learning algorithms,
namely asynchronous real-time versions of Q-learning and value-iteration, applied to
the problems of reaching any goal... / Analysis of Real-Time Reinforcement Learning Applied to Finding br Machine Learning Reinforcement Learning Learning Adaptation
59.5 Complexity Analysis of Real-Time Reinforcement Learning - Koenig, Simmons (1997)(Correct)
This paper analyzes the complexity of on-line reinforcement
learning algorithms, namely asynchronous realtime
versions of Q-learning and value-iteration, applied
to the problem of reaching a goal stat... / Analysis of Real-Time Reinforcement Learning Sven Koenig and Reid br the complexity of on-line reinforcement learning algorithms namely
59.5 Learning from Demonstration - Schaal (1997)(Correct)
By now it is widely accepted that learning a task from scratch, i.e., without
any prior knowledge, is a daunting undertaking. Humans, however, rarely
attempt to learn from scratch. They extract initia... / applied in the context of reinforcement learning. We consider priming the br problems only model-based reinforcement learning shows significant speed-up
57.9 Coevolution of a Backgammon Player - Pollack, Blair, Land (1996)(Correct)
One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims's work on artificial ro... / good news about the reinforcement learning method For the idea of br framework for multi-agent reinforcement learning. In Machine Learning
57.9 Reinforcement Learning Methods for Continuous-Time Markov Decision.. - Steven Bradtke (1995)(Correct)
Semi-Markov Decision Problems are continuous time generalizations
of discrete time Markov Decision Problems. A number of
reinforcement learning algorithms have been developed recently
for the solution... / Reinforcement Learning Methods for br Problems. A number of reinforcement learning algorithms have been
57.1 Hierarchical Memory-Based Reinforcement Learning - Hernandez-Gardiol, Mahadevan (2001)(Correct)
A key challenge for reinforcement learning is how to scale up to
large partially observable domains. In this paper, we show how
a hierarchy of behaviors can be used to create and select among
varia... / Hierarchical Memory-Based Reinforcement Learning Natalia br A key challenge for reinforcement learning is how to scale up to
57.1 An Architecture for Action Selection in Robotic Soccer - Stone, McAllester (2001)(Correct)
CMUnited-99 was the 1999 RoboCup robotic soccer simulator
league champion. In the RoboCup-2000 competition,
CMUnited-99 was entered again and despite being publicly
available for the entire year, it s... / rewards in the sense of reinforcement learning One might say for br successfully learned via reinforcement learning. . CONCLUSION
57.1 Speeding up Relational Reinforcement Learning Through the Use of an.. - Driessens, Ramon, Blockeel (2001)(Correct)
Relational reinforcement learning (RRL) is a learning technique
that combines standard reinforcement learning with inductive logic
programming to enable the learning system to exploit structural kno... / Speeding up Relational Reinforcement Learning Through the Use of an br Abstract. Relational reinforcement learning RRL is a learning
57.1 Learning Evaluation Functions to Improve Optimization by Local Search - Boyan, Moore (2000)(Correct)
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome... / its foundations in reinforcement learning and illustrate its br of adaptive local search reinforcement learning and genetic algorithms.
57.1 Introducing a Genetic Generalization Pressure to the Anticipatory.. - Butz, Goldberg, Stolzmann (2000)(Correct)
The Anticipatory Classifier System (ACS) is able to form a complete internal representation
of an environment. Unlike most other classifier system and reinforcement learning approaches,
it is able t... / other classifier system and reinforcement learning approaches it is able to br the external environment. Reinforcement learning approaches like Dyna
57.1 Adaptive Agents with Reinforcement Learning and Internal Memory - Lanzi (2000)(Correct)
Perceptual aliasing is a serious problem for
adaptive agents. Internal memory is a promising
approach to extend reinforcement learning algorithms
to problems involving perceptual aliasing.
In this... / Adaptive Agents with Reinforcement Learning and Internal Memory br approach to extend reinforcement learning algorithms to problems
57.1 Introducing a Genetic Generalization Pressure to the Anticipatory.. - Butz, Goldberg, Stolzmann (2000)(Correct)
The Anticipatory Classifier System is a learning classifier system that is based on the cognitive
mechanism of anticipatory behavioral control. Besides the common reward learning, the
ACS is able to... / which is not possible with reinforcement learning techniques. Furthermore br Sutton R. S. Reinforcement learning architectures for animats.
57.1 Elevator Group Control Using Multiple Reinforcement Learning Agents - Crites, Barto (1998)(Correct)
Recent algorithmic and theoretical advances in reinforcement learning (RL) have
attracted widespread interest. RL algorithms have appeared that approximate dynamic programming
on an incremental basi... / Control Using Multiple Reinforcement Learning Agents ROBERT H. CRITES br and theoretical advances in reinforcement learning RL have attracted
57.1 Programmable Pattern Generators - Schaal, Sternad (1998)(Correct)
This paper explores the idea to create complex
human-like arm movements from movement primitives
based on nonlinear attractor dynamics. Each degree-offreedom
of an arm is assumed to have two indepen... / its modern relative reinforcement learning provide a well founded br recent developments in reinforcement learning increased the range of
56.7 Converges with Probability - Peter Dayan Terrence (1994)(Correct)
The methods of temporal differences (Samuel, 1959; Sutton 1984, 1988)
allow agents to learn accurate predictions about stationary stochastic future
outcomes. The learning is effectively stochastic a... / Probability Keywords reinforcement learning temporal differences br well as other classes of reinforcement learning algorithm.
55.3 Ants and Reinforcement Learning: A Case Study in Routing in Dynamic.. - Subramanian, Druschel, Chen (1997)(Correct)
We investigate two new distributed routing algorithms
for data networks based on simple biological
"ants" that explore the network and
rapidly learn good routes, using a novel variation
of reinforceme... / Ants and Reinforcement Learning A Case Study in Routing br using a novel variation of reinforcement learning. These two algorithms are
54.5 Evolutionary Algorithms for Reinforcement Learning - Moriarty, Schultz, Grefenstette (1999)(Correct)
There are two distinct approaches to solving reinforcement learning problems, namely,
searching in value function space and searching in policy space. Temporal difference methods
and evolutionary algo... / Evolutionary Algorithms for Reinforcement Learning David E. Moriarty br approaches to solving reinforcement learning problems namely