This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
Multiagent Learning Using a Variable Learning Rate - Bowling, Veloso (2002)(Correct)
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the o... / We then contribute a new reinforcement learning technique using a br Multiagent learning reinforcement learning game theory
Pixel-based Behavior Learning - Hugues, Drogoul (2002)(Correct)
In this paper we address the problem of learning behaviors for autonomous mobile robots. We particularly focus on methods which enable a human user to train a robot in its real destination environment... / methods such as reinforcement learning or genetic br behaviorbased robots using reinforcement learning'Artificial
Existence of Multiagent Equilibria with Limited Agents - Bowling, Veloso (2002)(Correct)
Multiagent learning is a neccessary yet challenging problem as multiagent systems become more prevalent and environments
become more dynamic. Much of the groundbreaking work in this area draws on nota... / multiagent learning reinforcement learning multiagent systems br of attention as a way for reinforcement learning to scale to large
Coordinated Reinforcement Learning - Guestrin, Lagoudakis, Parr (2002)(Correct)
We present several new algorithms for multiagent reinforcement
learning. A common feature of these algorithms
is a parameterized, structured representation
of a policy or value function. This struc... / Coordinated Reinforcement Learning Carlos Guestrin br algorithms for multiagent reinforcement learning. A common feature of
MySpiders : Evolve your own intelligent Web crawlers - Pant, Menczer (2002)(Correct)
The dynamic nature of the World Wide Web makes it a challenge to find information that is both relevant and recent. Intelligent agents can complement the power of search engines to meet this challenge... / by evolutionary and reinforcement learning. The goal is to maintain
Differential Join Prices for Parallel Queues: Social Optimality.. - Parijat Dube Vivek (2002)(Correct)
We consider a system of identical parallel queues
served by a single server and distinguished only by the price
charged at entry. A Poisson stream of customers joins the queue
by a greedy policy that ... / programming equation and a reinforcement learning based online pricing br methods based on reinforcement learning see e.g.
Two Views of Classifier Systems - Kovacs (2002)(Correct)
This work suggests two ways of looking at Michigan classifier systems; as Genetic Algorithm-based systems, and as Reinforcement Learning-based systems, and argues that the former is more suitable for ... /
Adaptive Combination of Behaviors in an Agent - Buffet, Dutech, Charpillet (2002)(Correct)
Agents are of interest mainly when confronted with complex tasks. We propose a methodology for the automated design of such agents (in the framework of Markov Decision Processes) in the case where the... / basic behaviors using Reinforcement Learning methods. The main idea is br only local perceptions. Reinforcement Learning RL can be applied in
Connectionist Learning Classifier System - Vasilyev (2002)(Correct)
Impetuous development of artificial neural networks makes it possible to transfer many ideas from this area into adjacent areas. This work investigates an opportunity of mapping learning classifier sy... /
Multi-Robot Task-Allocation through Vacancy Chains - Dahl, Mataric, Sukhatme (2002)(Correct)
This paper presents an algorithm for task allocation
in groups of homogeneous robots. The algorithm is
based on vacancy chains, a resource distribution strategy
common in human and animal societies. W... / and demonstrate how Reinforcement Learning can be used to make br local task selection and Reinforcement Learning RL for estimation of
Decision-Theoretic Robotic Surveillance - Massios (2002)(Correct)
ix
Acknowledgments
First and foremost I would like to express my gratitude to Leo Dorst and Frans
Voorbraak. This thesis is mainly the result of the weekly interaction I had with
them and the expert ... /
Asr System Modeling For Automatic Evaluation And Optimization Of.. - Pietquin, Renals (2002)(Correct)
Though the field of spoken dialogue systems has developed quickly in the last decade, rapid design of dialogue strategies remains uneasy. Several approaches to the problem of automatic strategy learni... / proposed and the use of Reinforcement Learning introduced by Levin and br Processes MDPs and Reinforcement Learning RL was proposed by
Optimizing Dialogue Management with Reinforcement Learning.. - Satinder Singh Diane (2002)(Correct)
Designing the dialogue policy of a spoken dialogue system involves many nontrivial
choices. This paper presents a reinforcement learning approach for automatically optimizing
a dialogue policy, whic... / Dialogue Management with Reinforcement Learning Experiments with the br This paper presents a reinforcement learning approach for automatically
Reinforcement Learning with Long Short-Term Memory - Bakker (2002)(Correct)
This paper presents reinforcement learning with a Long ShortTerm
Memory recurrent neural network: RL-LSTM. Model-free
RL-LSTM using Advantage### learning and directed exploration
can solve non-Mark... / Reinforcement Learning with Long Short-Term br This paper presents reinforcement learning with a Long ShortTerm
Numerical Optimization with Neuroevolution - Greer (2002)(Correct)
Neuroevolution techniques have been successful in
many sequential decision tasks such as robot control and game
playing. This paper aims at establishing whether they can be
useful in numerical optimiz... / more powerful than standard reinforcement learning techniques in many br tasks faster than standard reinforcement learning methods and other
Synthesis of Robot's Behaviors from few Examples - Hugues, Drogoul (2002)(Correct)
This paper addresses the problem of acquiring robot's behaviors for real environments. It insists on the interest of learning behaviors during robot's interaction with the environment under the contro... / methods such as reinforcement learning Mahadevan and Connell br behavior-based robots using reinforcement learning. Artificial
Sparse Coding In The Primate Cortex - Földiak (2002)(Correct)
INTRODUCTION
Brain function can be seen as computation, i.e. the manipulation of information necessary for survival.
Computation itself is an abstract process but it must be performed or implemented ... /
Tournamentselection in XCS - Butz, Sastry, Goldberg (2002)(Correct)
Selection in the accuracy-based learning classifier system XCS, introduced by Wilson in 1995, has always been done by the means of proportionate selection. Although it is known from GA literature that... /
Bornholm Web Mining Techniques - Nielsen (2002)(Correct)
he classi
er for the document
relevance with downloaded documents
Some interesting document might be separated by
non-relevant documents, e.g.,
{ Reinforcement learning (Rennie and McCallum,
1999... / documents e.g.Reinforcement learning Rennie and McCallum br McCallum A. Using reinforcement learning to spider the web
Model-Based Reinforcement Learning in Dynamic Environments - Wiering (2002)(Correct)
We study using reinforcement learning in particular dynamic environments. Our environments can contain... unknown Model-Based Reinforcement Learning
in Dynamic Environments
Marco A. Wiering
marco@c... / Model-Based Reinforcement Learning in Dynamic Environments br Abstract We study using reinforcement learning in particular dynamic
Adaptive Dialogue Systems - Interaction with Interact - Jokinen, Kerminen, Kaipainen.. (2002)(Correct)
Technological development has made computer interaction more common and also commercially feasible, and the number of interactive systems has grown rapidly. At the same time, the systems should be abl... / evaluators are based on reinforcement learning. Simple examples of br management comes from the reinforcement learning algorithm of this
Reinforcement Learning Using Neural Networks, with Applications to.. - Coulom (2002)(Correct)
This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space... / . Reinforcement Learning using Neural Networks . br neural networks and reinforcement learning-can help to solve such
Towards a Game Agent - Niederberger, Gross (2002)(Correct)
The objective of this report is to give the reader a survey on state-of-the-art techniques and academic research in the field of artificial life where the simulation of complex and emergent behavior i... /
Event-Learning And Robust Policy Heuristics - Lörincz, Pólik, Szita (2001)(Correct)
In this paper we introduce a novel form of reinforcement learning called event-learning or E-learning. Events are ordered pairs of consecutive states. We define the corresponding event-value functio... / introduce a novel form of reinforcement learning called event-learning or br Key words and phrases. reinforcement learning robust control event
Web Interaction and the Navigation Problem in Hypertext written for.. - Levene, Loizou (2001)(Correct)
The web has become a ubiquitous tool, used in day-to-day work, to find information
and conduct business, and it is revolutionising the role and availability of information. One
of the problems encou... / method we describe is a reinforcement learning algorithm that attaches br a web view is a reinforcement learning algorithm that attaches
Scaling Reinforcement Learning toward RoboCup Soccer - Stone, Sutton (2001)(Correct)
RoboCup simulated soccer presents many
challenges to reinforcement learning methods,
including a large state space, hidden
and uncertain state, multiple agents, and
long and variable delays in the... / Scaling Reinforcement Learning toward RoboCup Soccer br many challenges to reinforcement learning methods including a
Ant Colony Control for Autonomous Decentralized Shop Floor Routing - Cicirello, Smith (2001)(Correct)
In this paper, we introduce a new approach to autonomous
decentralized shop floor routing. Our system,
which we call Ant Colony Control (AC
2
), applies the analogy
of a colony of ants foraging for ... / of a dispatch policy using reinforcement learning techniques incorporating br and M. Riedmiller. A neural reinforcement learning approach to learn local
Hierarchical Multi Agent Reinforcement Learning - Makar, Mahadevan (2001)(Correct)
Hierarchical reinforcement learning methods have previously been
shown to speed up learning primarily in single-agent domains. In
this paper we explore the use of this spatio-temporal abstraction
m... / Hierarchical Multi Agent Reinforcement Learning Rajbala Makar br Abstract Hierarchical reinforcement learning methods have previously
Evolving Neural Networks through Augmenting Topologies - Stanley, Miikkulainen (2001)(Correct)
An important question in neuroevolution is how to gain an advantage from evolving neural network topologies along with weights. We present a method, NeuroEvolution of Augmenting Topologies (NEAT) that... / on a challenging benchmark reinforcement learning task. We claim that the br great promise in complex reinforcement learning tasks Gomez and
Variable Resolution Discretization in Optimal Control - Munos, Moore (2001)(Correct)
The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction... / in optimal control reinforcement learning and Markov decision br Keywords Optimal control reinforcement learning variable resolution
Actor-Critic Algorithms - Konda, Tsitsiklis (2001)(Correct)
In this paper, we propose and analyze a class of actor-critic algorithms. These are
two-time-scale algorithms in which the critic uses temporal dierence (TD) learning with a linearly
parameterized ... / dicult to identify. Reinforcement Learning RL and Neuro-Dynamic br approximation and reinforcement learning. SIAM Journal on Control
DEDUCTIVE VERSUS INDUCTIVE EQUILIBRIUM Selection: Experimental Results - Haruvy, Stahl (2001)(Correct)
The debate in equilibrium selection appears to have culminated in the formation of two
schools of thought: those that favor equilibrium selection based on rational coordination and
those that favor ze... / race among seven action-reinforcement learning models and found that a br A's Roth-Erev reinforcement learning predicts A's and
Economic Value of EWA Lite: A Functional Theory of Learning in Games - Ho, Camerer, Chong (2001)(Correct)
This paper describes a theory
of learning in decisions and games called EWA Lite, with only one parameter. EWA
Lite predicts the time path of individual behavior in any normal-form game (given initial... / best but one kind of reinforcement learning predicts well in games br versions of belief and reinforcement learning and quantal response
Hierarchical Multi-Agent Reinforcement Learning - Makar, Mahadevan, al. (2001)(Correct)
In this paper we investigate the use of hierarchical reinforcement
learning to speed up the acquisition of cooperative
multi-agent tasks. We extend the MAXQ framework
to the multi-agent case. Each age... / Hierarchical Multi-Agent Reinforcement Learning Rajbala Makar br the use of hierarchical reinforcement learning to speed up the
Pigs and People - Pauls (2001)(Correct)
Pigs and people' is a simulated environment, in which action selection mechanisms can be evaluated and compared. Action selection mechanisms attempt to solve the action selection problem faced by b... / . . Reinforcement br efforts aiming to introduce reinforcement learning see section into
Towards Bounded-Rationality in Multi-Agent Systems: A.. - Raja, Lesser (2001)(Correct)
Sophisticated agents operating in open environments must make complex real-time
control decisions on scheduling and coordination of domain activities. These decisions
are made in the context of limi... / in Multi-Agent Systems A Reinforcement-Learning Based Approach Anita br made by these agents using reinforcement learning methods. Our approach is
Learning in Worlds with Objects - Kaelbling, Oates, Hernandez, Finney (2001)(Correct)
Introduction
Weareinterested in building systems that learn to interact
with complex real world environments, by representing
the dynamics of the world with models that
allow strong generalization th... / Most work on reinforcement learning assumes that the agent br expected value. Basic reinforcement-learning methods such as
Cooperative Coevolution of Multi-Agent Systems - Yong, Miikkulainen (2001)(Correct)
In certain tasks such as pursuit and evasion, multiple agents need to coordinate their behavior to achieve
a common goal. An interesting question is, how can such behavior best be evolved? When the ag... / efficient in single-agent reinforcement learning tasks is first extended br by robot teams using reinforcement learning. He found that when the
Genetic Algorithms And Reinforcement Learning For The Tactical Fixed.. - Santos, Jr., ZHONG (2001)(Correct)
this paper, we explore unknown International Journal on Articial Intelligence Tools
Vol.10, No.1-2 (2001) 000|000 c World Scientic Publishing Company
GENETIC ALGORITHMS AND REINFORCEMENT LEARNING
... / Genetic Algorithms And Reinforcement Learning For The Tactical Fixed br kinds of problems. We use a reinforcement learning system to adaptively
Reinforcement Learning with Function Approximation Converges to a.. - Gordon (2001)(Correct)
Many algorithms for approximate reinforcement learning are not
known to converge. In fact, there are counterexamples showing
that the adjustable weights in some algorithms may oscillate within
a re... / Reinforcement Learning with Function br algorithms for approximate reinforcement learning are not known to
Population rule learning in symmetric normal-form games: theory and.. - Stahl (2001)(Correct)
A model of population rule learning is formulated and estimated using experimental data. When
predicting the population distribution of choices and accounting for the number of parameters, the
populat... / and Ho consider both reinforcement learning and belief learning they br how people play games reinforcement learning in experimental games with
Goal Directed Adaptive Behavior in Second-Order Neural Networks: The.. - Crabbe, Dyer (2001)(Correct)
The paper presents a neural network architecture (MAXSON) based on second-order connections that can
learn a multiple goal approach/avoid task using reinforcement from the environment. It also enables... / autonomous agents reinforcement learning vicarious learning. br faster than traditional reinforcement learning approaches generates and
Autonomous Helicopter Control using Reinforcement Learning Policy.. - Bagnell, Schneider (2001)(Correct)
Many control problems in the robotics field
can be cast as Partially Observed Markovian Decision
Problems (POMDPs), an optimal control formalism.
Finding optimal solutions to such problems in general,... / Helicopter Control using Reinforcement Learning Policy Search Methods br Traditional model-based reinforcement learning algorithms make a
Solving Hidden-Mode Markov Decision Problems - Choi, Zhang, al. (2001)(Correct)
Hidden-Mode Markov decision processes
(HM-MDPs) are a novel mathematical framework
for a subclass of nonstationary reinforcement
learning problems where environment
dynamics change over time accor... / a subclass of nonstationary reinforcement learning problems where br subclass of nonstationary reinforcement learning problems. Unlike
A Multi-Agent, Policy-Gradient approach to Network Routing - Tao, Baxter, Weaver (2001)(Correct)
Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. Olpomdp, a policy-g... / Olpomdp a policy-gradient reinforcement learning algorithm was br is treated as a multi-agent reinforcement learning problem. Each router is
Emotion-triggered Learning in Autonomous Robot Control - Gadanho, Hallam (2001)(Correct)
The fact that emotions are considered to be essential to human reasoning suggests that
they might play an important role in autonomous robots as well. In particular, the decision
of when to interrup... / and integrated in a reinforcementlearning framework. Robot br to its environment using reinforcement learning. The work was done under
Learning Markov Processes - Murphy (2001)(Correct)
this article, we restrict our attention to discrete time dynamical
systems.) Typically we do not know the exact dynamics of the system, so instead we consider a probabilistic
state transition function... / Of Our Actions as In Reinforcement Learning We Represent Our br mdp Widely Used In Reinforcement Learning. In An Mdp The State
Bounds on sample size for policy evaluation in Markov environments - Peshkin, Mukherjee (2001)(Correct)
Reinforcement learning means nding the optimal course of
action in Markovian environments without knowledge of the environment
's dynamics. Stochastic optimization algorithms used in the eld
re... / Abstract. Reinforcement learning means nding the optimal br Introduction Research in reinforcement learning focuses on designing
Market-Based Reinforcement Learning in Partially Observable Worlds - Kwee, Hutter, Schmidhuber (2001)(Correct)
Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to lea... / Market-Based Reinforcement Learning in Partially Observable br Unlike traditional reinforcement learning RL market-based RL is
Automated State Abstraction for Options using the U-Tree Algorithm - Jonsson, Barto (2001)(Correct)
Learning a complex task can be significantly facilitated by defining a
hierarchy of subtasks. An agent can learn to choose between various
temporally abstract actions, each solving an assigned subta... / Researchers in the field of reinforcement learning have recently focused br which extends the theory of reinforcement learning to include temporally
Learning State Grounding for Optimal Visual Servo Control of Dynamic.. - Nikovski, Nourbakhsh (2001)(Correct)
We present an experiment in sequential visual servo control
of a dynamic manipulation task with unknown equations of
motion and feedback from an uncalibrated camera. Our algorithm
constructs a mode... / the solution. The field of reinforcement learning is specifically concerned br on a real rig using a reinforcement learning controller. Technical
Using Background Knowledge to Speed Reinforcement Learning in.. - Shapiro, Langley, Shachter (2001)(Correct)
This paper describes Icarus, an agent architecture that embeds a hierarchical reinforcement learning algorithm within a language for specifying agent behavior. An Icarus program expresses an approxima... / Knowledge to Speed Reinforcement Learning in Physical Agents br that embeds a hierarchical reinforcement learning algorithm within a
Using the Web to Create Minority Language Corpora - Ghani, Jones, Mladenic (2001)(Correct)
The Web is a valuable source of language specific resources
but the process of collecting, organizing and utilizing these
resources is difficult. We describe CorpusBuilder, an approach
for automatical... / and McCallum use reinforcement learning to help a crawler br et al.s WebSail uses reinforcement learning based on feedback from
Hierarchical Memory-Based Reinforcement Learning - Hernandez-Gardiol, Mahadevan (2001)(Correct)
A key challenge for reinforcement learning is how to scale up to
large partially observable domains. In this paper, we show how
a hierarchy of behaviors can be used to create and select among
varia... / Hierarchical Memory-Based Reinforcement Learning Natalia br A key challenge for reinforcement learning is how to scale up to
An Architecture for Action Selection in Robotic Soccer - Stone, McAllester (2001)(Correct)
CMUnited-99 was the 1999 RoboCup robotic soccer simulator
league champion. In the RoboCup-2000 competition,
CMUnited-99 was entered again and despite being publicly
available for the entire year, it s... / rewards in the sense of reinforcement learning One might say for br successfully learned via reinforcement learning. . CONCLUSION
Learning rates for Q-Learning - Even-Dar, Mansour (2001)(Correct)
In this paper we derive convergence rates for Q-learning. We show an interesting
relationship between the convergence rate and the learning rate used in the Q-learning.
For a polynomial learning rat... / Introduction In Reinforcement Learning an agent wanders in an br the dominating approach in Reinforcement Learning SB BT An MDP
Grounding the Unobservable in the Observable: The Role and.. - Morrison, Oates, al. (2001)(Correct)
Introduction
One of the great mysteries of human cognition is how we
learn to discover meaningful and useful categories and concepts
about the world based on the data flowing from our
sensors. Why do... / in predicted reward in a reinforcement learning setting to refine action br McCallum A. K. . Reinforcement Learning with Selective Perception
Goal Directed Adaptive Behavior in Second-Order Neural Networks.. - Crabbe, Dyer (2001)(Correct)
The paper presents a neural network architecture (MAXSON) based on second-order connections unknown Goal Directed Adaptive Behavior in Second-Order Neural
Networks: Leaning and Evolving in the MAXSON... / faster than traditional reinforcement learning approaches generates and br and perform cross-modal reinforcement learning. Both these thresholds
Layered Learning in Genetic Programming for a Cooperative Robot.. - Gustafson, Hsu (2001)(Correct)
We present an alternative to standard genetic programming (GP) that applies layered learning techniques to decompose a problem. GP is applied to subproblems sequentially, where the population in the l... / It was applied with reinforcement learning for robotic soccer and the br challenges for researchers. Reinforcement learning hierarchical sensing
Multi-Layer Methods and the Optimal Optimizer - de Jong (2001)(Correct)
Multi-Layer Methods are methods that act on several layers simultaneously.
Examples of multi-layer methods are found in multi-agent systems
(global and per-agent behavior), in learning (e.g. boostin... / that if the agents all use reinforcement learning to optimize their own br the world utility. The reinforcement learning agents by themselves
Personalized Web-Document Filtering Using Reinforcement Learning - Byoung-Tak Zhang And (2001)(Correct)
Document filtering is increasingly deployed in Web environments to reduce information overload
of users. We formulate online information filtering as a reinforcement learning problem, i.e.
TD(0). The ... / Filtering Using Reinforcement Learning Byoung-Tak Zhang and br information filtering as a reinforcement learning problem i.e. TD The
Multi-Agent Systems by Incremental Gradient Reinforcement Learning - Dutech, Buffet, Charpillet (2001)(Correct)
A new reinforcement learning (RL) methodology is proposed to design multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordin... / by Incremental Gradient Reinforcement Learning Alain Dutech Olivier br France Abstract A new reinforcement learning RL methodology is
Continuous-Time Hierarchical Reinforcement Learning - Ghavamzadeh, Mahadevan (2001)(Correct)
Hierarchical reinforcement learning (RL) is
a general framework which studies how to
exploit the structure of actions and tasks
to accelerate policy learning in large domains.
Prior work in hierarchic... / Hierarchical Reinforcement Learning Mohammad Ghavamzadeh br Abstract Hierarchical reinforcement learning RL is a general
Direct Policy Search using Paired Statistical Tests - Strens, Moore (2001)(Correct)
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objectiv... / a practical way to solve reinforcement learning problems involving br . Introduction Reinforcement Learning is a problem description
Convergent Reinforcement Learning with Value Function Interpolation - Szepesvári (2001)(Correct)
We consider the convergence of a class of
reinforcement learning algorithms combined
with value function interpolation methods using
the methods developed in (Littman &
Szepesvari, 1996). As a sp... / Convergent Reinforcement Learning with Value Function br convergence of a class of reinforcement learning algorithms combined with
Towards Automatic Shaping in Robot Navigation - Peterson, Owens, Carroll (2001)(Correct)
Shaping is a potentially powerful tool in reinforcement learning applications. Shaping often fails to function e#ectively because of a lack of understanding about its e#ects when applied in reinforcem... / powerful tool in reinforcement learning applications. Shaping br its e ects when applied in reinforcement learning settings and the use of
Mining the Web to Create Minority Language Corpora - Ghani, Jones, Mladenic (2001)(Correct)
The Web is a valuable source of language specific resources
but the process of collecting, organizing and utilizing these
resources is difficult. We describe CorpusBuilder, an approach
for automatical... / and McCallum use reinforcement learning to help a crawler br et al.s WebSail uses reinforcement learning based on feedback from
An Improved Grid-Based Approximation Algorithm for POMDPs - Zhou, Hansen (2001)(Correct)
Although a partially observable Markov decision
process (POMDP) provides an appealing model for
problems of planning under uncertainty, exact algorithms
for POMDPs are intractable. This motivates
... / and related problems of reinforcement learning. A standard approach to
A Social Reinforcement Learning Agent - Charles Lee Isbell (2001)(Correct)
We report on our reinforcement learning work on Cobot, a
software agent that resides in the well-known online chat
community LambdaMOO. Our initial work on Cobot (Isbell
et al., 2000) provided him wit... / A Social Reinforcement Learning Agent Charles Lee br We report on our reinforcement learning work on Cobot a software
Off-Policy Temporal-Difference Learning with Function Approximation - Precup, Sutton, Dasgupta (2001)(Correct)
We introduce the rst algorithm for o-policy
temporal-dierence learning that is stable
with linear function approximation. O-
policy learning is of interest because it forms
the basis for popul... / the basis for popular reinforcement learning methods such as br the most popular of all reinforcement learning algorithms it has been
Individual Action and Collective Function: from Sociology to.. - Sun (2001)(Correct)
How do we characterize the process and the dynamics of co-learning, conceptually, mathematically,
or computationally?
How do social structures and relations interact with co-learning of multiple ... / deals with value function reinforcement learning in certain types of br The basic approach is reinforcement learning through estimating
A New Control Scheme For Combustion Processes Using Reinforcement.. - Stephan, Debes, Gross (2001)(Correct)
Introduction
Since the immediate objective of a power plant is the production of energy, the
plant operator is trying to maximize the eciency factor. Simultaneously, both
the system-constraints and g... / Combustion Processes Using Reinforcement Learning Based On Neural Networks br in a power plant based on reinforcement-learning in combination with neural
Automatic Discovery of Subgoals in Reinforcement Learning using.. - McGovern, Barto (2001)(Correct)
This paper presents a method by which a reinforcement
learning agent can automatically discover
certain types of subgoals online. By creating
useful new subgoals while learning, the agent
is able ... / Discovery of Subgoals in Reinforcement Learning using Diverse Density br a method by which a reinforcement learning agent can automatically
Direct value-approximation for factored MDPs - Schuurmans, Patrascu (2001)(Correct)
We present a simple approach for computing near-optimal policies unknown Direct value-approximation for factored MDPs
Dale Schuurmans and Relu Patrascu
Department of Computer Science
University of ... / stochastic environments and reinforcement learning. Standard methods such as br T. Dietterich. Hierarchical reinforcement learning with the MAXQ value
Policy Improvement for POMDPs using Normalized Importance Sampling - Christian Shelton Artificial (2001)(Correct)
We present a new method for estimating the unknown Policy Improvement for POMDPs
using Normalized Importance Sampling
Christian R. Shelton
Artificial Intelligence Lab
Massachusetts Institute of Te... / We assume a standard reinforcement learning setup an agent interacts br before in conjunction with reinforcement learning. In particular Precup
Focused Web Crawling: A Generic Framework for Specifying the User.. - Ester, Gross, Kriegel (2001)(Correct)
Compared to the standard web search engines, focused
crawlers yield good recall as well as good
precision by restricting themselves to a limited domain.
In this paper, we do not introduce another f... / of tunneling. RC uses reinforcement learning to train a crawler how to br J.McCallum A.Using Reinforcement Learning to Spider the Web
Lyapunov-Constrained Action Sets for Reinforcement Learning - Perkins, Barto (2001)(Correct)
Lyapunov analysis is a standard approach to
studying the stability of dynamical systems and
to designing controllers. We propose to design
the actions of a reinforcement learning (RL)
agent to be ... / Action Sets for Reinforcement Learning Theodore J. Perkins br to design the actions of a reinforcement learning RL agent to be
Learning Preconditions for Control Policies in Reinforcement Learning - Tohgoroh Matsui Graduate (2001)(Correct)
This paper describes a method which senses changing
environment by collecting failed instances, uses concept
learning for acquiring a precondition for a control policy,
and modifies the policy partial... / for Control Policies in Reinforcement Learning Tohgoroh Matsui br the policy partially in reinforcement learning. The precondition of a
Rational and Convergent Learning in Stochastic Games - Bowling, Veloso (2001)(Correct)
This paper investigates the problem of policy learning
in multiagent environments using the stochastic
game framework, which we briefly overview. We
introduce two properties as desirable for a lear... / We examine existing reinforcement learning algorithms according to br of single agent learning. Reinforcement learning Sutton and Barto
On Verifying Game Designs and Playing Strategies using Reinforcement.. - Kalles, Kanellopoulos (2001)(Correct)
In this paper we elaborate on the application of reinforcement learning to the details of the design and the verification of a new strategy game. We deal with playability and learning issues, using a ... / Playing Strategies using Reinforcement Learning Dimitrios Kalles br Playing Strategies using Reinforcement Learning - Abstract In
Decision-Theoretic Planning with Concurrent Temporally Extended.. - Rohanimanesh, Mahadevan (2001)(Correct)
We investigate a model for planning under unknown Decision-Theoretic Planning with Concurrent Temporally Extended
Actions
Khashayar Rohanimanesh
Department of Computer Science
Michigan State Unive... / action in the context of reinforcement learning Sutton et al. br A s in the standard reinforcement learning framework in which A s
Personalized Webdocument Filtering Using Reinforcement Learning - Zhang, Seo (2001)(Correct)
ch as AltaVista,
Yahoo, and Excite. The other is to manually f ollow or browse the hyperlinks
of the documents by a user himself. However, these methods have some
drawbacks. Since Web-index services a... / Filtering Using Reinforcement Learning Byoung-Tak Zhang And br information ltering as a reinforcement learning problem i.e.TD The
Continuous State Space Q-Learning for Control of Nonlinear Systems - Hagen (2001)(Correct)
Contents 1 Introduction 1 1.1 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Designing the state feedback controller . . . . . . . . . . . . . . . . 3 1.1.2 ... / . . Reinforcement Learning . br . Reinforcement Learning . A Discrete
Specifying Rational Agents with Statecharts and Utility Functions - Obst (2001)(Correct)
To aid the development of the robotic soccer simulation league team RoboLog-2000, a method for the specification of multi-agent teams by statecharts has been introduced. The results in the last years... / written or are subject to reinforcement learning. The option evaluation
Creating Melodies with Evolving Recurrent Neural Networks - Chen, Miikkulainen (2001)(Correct)
Music composition is a domain well-suited for evolutionary
reinforcement learning. Instead of applying explicit composition
rules, a neural network is used to generate melodies.
An evolutionary algori... / for evolutionary reinforcement learning. Instead of applying br R. Efficient reinforcement learning through symbiotic
Accelerating Reinforcement Learning through the Discovery of Useful.. - McGovern, Barto (2001)(Correct)
An ability to adjust to changing environments and unforeseen
circumstances is likely to be an important component
of a successful autonomous space robot. This paper
shows how to augment reinforcement ... / Accelerating Reinforcement Learning through the Discovery of br Keywords Abstraction Reinforcement Learning RL Subgoals Mobile
Parameterized Logic Programs - Where Computing Meets (2001)(Correct)
In this paper, we describe recent attempts to incorporate
learning into logic programs as a step toward adaptive software that can
learn from an environment. Although there are a variety of types of... / algorithm and the other for reinforcement learning by learning automatons. br learning by incorporating reinforcement learning. Reinforcement learning is
Relational Reinforcement Learning - Driessens (2001)(Correct)
This paper presents an introduction to reinforcement learning
and relational reinforcement learning at a level to be understood by
students and researchers with dierent backgrounds. unknown Relatio... / Relational Reinforcement Learning Kurt Driessens br presents an introduction to reinforcement learning and relational
A Neuroevolution Method for Dynamic Resource Allocation on a Chip.. - Gomez, Burger, Miikkulainen (2001)(Correct)
Technology-driven limitations will soon force microprocessor chips to contain multiple processing cores, as the scalability of individual cores peaks but transistor counts continue to increase. To obt... / characteristics of dicult reinforcement learning tasks a sequence of br for the application of reinforcement learning techniques such as arti
Multiple Goal Q-Learning: Issues and Functions - Crabbe (2001)(Correct)
This paper addresses the concerns of agents using reinforcement learning to learn to achieve
multiple simultaneous goals. It proves that an algorithm based on acting upon the maximal
goal at any one t... / concerns of agents using reinforcement learning to learn to achieve br necessary for the agent's reinforcement learning system and concludes that
Model-Free Least-Squares Policy Iteration - Lagoudakis, Parr (2001)(Correct)
We propose a new approach to reinforcement learning which combines
least squares function approximation with policy iteration. Our
method is model-free and completely off policy. We are motivated
by t... / propose a new approach to reinforcement learning which combines least br in the context of reinforcement learning. While their ability to
Experience Stack Reinforcement Learning - Reynolds (2001)(Correct)
Experience Stack Reinforcement Learning is a novel, o-
policy, online algorithm for learning optimal policies with respect to a
reward signal. From existing methods it combines TD() style return
... / Experience Stack Reinforcement Learning Stuart I. Reynolds br Abstract. Experience Stack Reinforcement Learning is a novel opolicy
Learning Agents in a Homo Egualis Society - Nowé, Verbeeck, Lenaerts (2001)(Correct)
Coordination is an important issue in multi-agent systems.
A possible approach to tackle coordination, that recently received
quite a lot of attention, is to learn the eects of interaction
in the joi... / - homo egualis -reinforcement learning -periodic policies. br be achieved by a classical reinforcement learning approach provided
Incentives for Sharing in Peer-to-Peer Networks - Golle, Leyton-Brown, Mironov.. (2001)(Correct)
We consider the free-rider problem in peer-to-peer file sharing
networks such as Napster: that individual users are provided with
no incentive for adding value to the network. We examine the design
... / with a multi-agent reinforcement learning model. Introduction br we use a multi-agent reinforcement learning model to validate our
Inference Using Formal Logics - McAllester (2001)(Correct)
Introduction
Logic is fundamental to a variety of disciplines. Logic provides insight into
the nature of mathematics and human mathematical reasoning. Logic provides
insight into the syntax and seman... / genetic algorithms and reinforcement learning have also failed to
Reinforcement Learning for Weakly-Coupled MDPs and an Application to.. - Daniel Bernstein And (2001)(Correct)
Weakly-coupled Markov decision processes can be decomposed into
subprocesses that interact only through a small set of bottleneck states. We study
a hierarchical reinforcement learning algorithm des... / Reinforcement Learning for Weakly-Coupled MDPs br We study a hierarchical reinforcement learning algorithm designed to take
Switch Packet Arbitration via Queue-Learning - Brown (2001)(Correct)
In packet switches, packets queue at switch inputs and contend for outputs. unknown Switch Packet Arbitration via Queue-Learning
Timothy X Brown
Electrical and Computer Engineering
University of Co... / of the switch. We present a reinforcement learning formulation of the br Introduction Reinforcement learning RL has been applied to
Adaptive Representation Methods for Reinforcement Learning - Reynolds (2001)(Correct)
ween Q-function and policy representations and the RL algorithms used
to generate them.
Recursive Partitioning
One method to improve the Q-function representation is to begin learning with a
coarse,... / Representation Methods for Reinforcement Learning Stuart I. Reynolds br representation methods in reinforcement learning RL Reinforcement
A Reinforcement Learning Model of Selective Visual Attention - Silviu Minut Autonomous (2001)(Correct)
This paper proposes a model of selective attention for visual
search tasks, based on a framework for sequential decisionmaking.
The model is implemented using a xed pan-tiltzoom
camera in a visually ... / A Reinforcement Learning Model of Selective Visual br two interacting modules. A reinforcement learning module learns a policy on
Cellular Channel Assignment: a New Localized and Distributed Strategy - Battiti, Bertossi, Brunato (2001)(Correct)
As the use of mobile communications systems grows, the need arises for new and more ecient channel
allocation techniques. The total number of available channels on a real-world network is in fact a sc... / simulated annealing or reinforcement learning. These strategies usually br and Dimitri Bertsekas. Reinforcement learning for dynamic channel
Planetary Rover Control as a Markov Decision Process - Bernstein, Zilberstein, Washington.. (2001)(Correct)
Planetary rovers must be eective in gathering scienti
c data despite uncertainty and limited resources.
One step toward achieving this goal is to construct a
high-level mathematical model of the prob... / Markov decision process reinforcement learning Abstract Planetary br problem. We use Monte Carlo reinforcement learning techniques to obtain a
Corpus-based dialogue simulation for automatic strategy learning and.. - Scheffler, Young (2001)(Correct)
This paper describes a method for simulating mixed
initiative human-machine dialogues using data collected
by a prototype dialogue system. The behaviour
of the user population is modelled probabilisti... / of dialogue strategy by reinforcement learning has been proposed by br used with a model-based reinforcement learning algorithm such as dynamic
Active Learning with Adaptive Grids - Milano, Schmidhuber, Koumoutsakos (2001)(Correct)
Given some optimization problem and a series of typically
expensive trials of solution candidates taken from a search space, how
can we eciently select the next candidate? We address this fundamenta... / evolution strategies reinforcement learning algorithms tabu search
Gradient-based Reinforcement Planning in Policy-Search Methods - Kwee, Hutter, Schmidhuber (2001)(Correct)
We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that pl... / improve convergence in reinforcement learning RL Sutton Barto br reward function In reinforcement learning RL the objective is to
Auvs' Dynamics Modeling, Position Control, And Path Planning Using.. - Sayyaadi, Ura (2001)(Correct)
Accurate identification of nonlinear time variant MIMO systems, especially in
case of AUVs is essential for implementation of control algorithms and navigation
purposes. Control problems of AUVs h... / control scheme and Reinforcement Learning is used for adjusting the br problems based on the Reinforcement Learning method. This algorithm was
Parallel Cortico-Basal Ganglia Mechanisms for Acquisition and.. - Hiroyuki Nakahara Kenji (2001)(Correct)
Experimental studies have suggested that many brain areas, including the basal ganglia,
contribute to procedural learning. Focusing on the basal ganglia-thalamocortical #BG-TC#
system, we propose a ... / the BG-TC loops work as a reinforcement learning system for learning br functional components. Reinforcement learning actor-critic architecture
ISocRob 2001 Team Description - Lima, Custódio, Damas, Lopes, .. (2001)(Correct)
This paper describes the ISocRob team current status, new
features planned to be demonstrated in RoboCup 2001, and the project
long term scientic goals, as of March 2001. An evolution of the team... / and Object Location Reinforcement Learning and Stochastic Games. br left free based on reinforcement learning applied to stochastic
How XCS Evolves Accurate Classifiers - Butz, Kovacs, Lanzi, Wilson (2001)(Correct)
Due to the accuracy based fitness approach, the ultimate goal for XCS is the evolution of unknown How XCS Evolves Accurate Classifiers
Martin V. Butz, Tim Kovacs,
Pier Luca Lanzi, and Stewart W. Wil... / As in all LCSs and reinforcement learning methods the XCS acts as a br S.Barto A. G. Reinforcement Learning An Introduction.
Stochastic Search for Signal Processing Algorithm Optimization - Singer, Veloso (2001)(Correct)
Many difficult problems can be viewed as search problems. However, given a new task with an embedded
search problem, it is challenging to state and find a truly effective search approach. In this pape... / and Littman use reinforcement learning to learn to select br Algorithm selection using reinforcement learning. In Proceedings of
An Analysis of the Dynamics of Adaptive Multiagent Systems, with an.. - Vidal (2001)(Correct)
Introduction
In the past twenty years we have seen an increasing emphasis on the study of the dynamics of
complex systems such as the human immune system and the economy. Some of this work is
reflect... / to be given to individual reinforcement learning agents in order to speed br Smith protocol and reinforcement learning with learning rate of
Adaptive Behavior Navigation of a Mobile Robot - Zalama, Gomez, Paul, Peran (2001)(Correct)
This paper describes a neural network model for the reactive behavioral navigation of a
mobile robot. From the information received through the sensors the robot can elicit one of
several behaviors ... / and learning operation. Reinforcement learning improves the navigation of br introduces new knowledge. Reinforcement learning has been one of the
CoPS-Team Description - Lafrenz, Becht, Buchheim, Burger.. (2001)(Correct)
The control software of the robot soccer team CoPS is designed as a multi-agent-system. The basis for a cooperation between the robots is a suitable environment model based on uncertain sensory data a... / and we plan to include reinforcement learning in our system.
Self-Organization of Place Cells and Reward-Based Navigation for a.. - Takahashi, Tanaka, Kurita (2001)(Correct)
We investigate a method to navigate a mobile robot by using self-organizing map and reinforcement learning. Modeling hippocampal place cells, the map consists of units activated at specified locations... / self-organizing map and reinforcement learning. Modeling hippocampal br to a specific goal by using reinforcement learning on actor-critic model.Th
A Reinforcement Learning Intelligent Agent - Serban (2001)(Correct)
The field of Reinforcement Learning, a subfield of machine learning, represents an important direction for research in Artificial Intelligence, the way for improving an agent's behavior, given a certa... / Xlvi Number A Reinforcement Learning Intelligent Agent br Abstract. The eld of Reinforcement Learning a sub- eld of machine
Infinite-Horizon Policy-Gradient Estimation - Baxter, Bartlett (2001)(Correct)
Gradient-based approaches to direct policy search in reinforcement learning have received
much recent attention as a means to solve problems of partial observability and to avoid some of
the problem... / to direct policy search in reinforcement learning have received much recent br tend to go by the name Reinforcement Learning and have been
Universal Sequential Decisions in Unknown Environments - Hutter (2001)(Correct)
Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solv... / Y and n are nite. Reinforcement learning for unknown environment br change if is unknown. Reinforcement learning algorithms are commonly