Home     Top: Machine Learning: Reinforcement Learning    [Case-based Learning   Fuzzy Systems   Genetic Algorithms   Neural Networks   Pattern Recognition   Reinforcement Learning   Rule Based Systems   Vision]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the expected number of citations based on the year of publication

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

1514.2   Reinforcement Learning I: Introduction - Sutton, Barto (1998)   (Correct)
Introduction Richard S. Sutton and Andrew G. Barto c fl All rights reserved [In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to oth... / Course Notes Reinforcement Learning I Introduction br intuitive sense of what reinforcement learning is and how it differs and

684.0   Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)   (Correct)
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of t... / published Reinforcement Learning A Survey Leslie Pack br paper surveys the field of reinforcement learning from a computer-science

518.8   Learning to Act using Real-Time Dynamic Programming - Barto, Bradtke, Singh (1995)   (Correct)
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning ... / aspects of other DP-based reinforcement learning methods such as Watkins' br algoithms are examples of reinforcement learning methods by which

514.2   Hierarchical Reinforcement Learning with the MAXQ Value Function.. - Dietterich (2000)   (Correct)
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value fun... / Hierarchical Reinforcement Learning with the MAXQ Value br approach to hierarchical reinforcement learning based on decomposing the

361.7   Machine Learning Research: Four Current Directions - Dietterich (1997)   (Correct)
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) impr... / learning algorithms c reinforcement learning and d learning complex br learning algorithms c reinforcement learning and d learning complex

342.8   Actor-Critic Algorithms - Konda, Tsitsiklis (2001)   (Correct)
In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal dierence (TD) learning with a linearly parameterized ... / dicult to identify. Reinforcement Learning RL and Neuro-Dynamic br approximation and reinforcement learning. SIAM Journal on Control

342.8   Policy Gradient Methods for Reinforcement Learning with Function.. - Sutton, McAllester, Singh, Mansour (2000)   (Correct)
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable.... / Gradient Methods for Reinforcement Learning with Function br is essential to reinforcement learning but the standard

260.8   Improving Elevator Performance Using Reinforcement Learning - Crites, Barto (1996)   (Correct)
This paper describes the application of reinforcement learning (RL) to the difficult real world problem of elevator dispatching. The elevator domain poses a combination of challenges not seen in most ... / Elevator Performance Using Reinforcement Learning Robert H. Crites br the application of reinforcement learning RL to the difficult

256.7   Markov games as a framework for multi-agent reinforcement learning - Littman (1994)   (Correct)
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic ... / a framework for multi-agent reinforcement learning Michael L. Littman br MDP formalization of reinforcement learning a single adaptive agent

254.5   Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in.. - Sutton, Precup, Singh (1999)   (Correct)
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed wit... / Temporal Abstraction in Reinforcement Learning Richard S. Sutton br mathematical framework of reinforcement learning and Markov decision

252.1   The Parti-game Algorithm for Variable Resolution Reinforcement.. - Moore, Atkeson (1995)   (Correct)
Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly... / for Variable Resolution Reinforcement Learning in Multidimensional br few minutes. Keywords Reinforcement Learning Curse of Dimensionality

226.4   Automatic Programming of Behavior-based Robots using Reinforcement.. - Mahadevan, Connell (1991)   (Correct)
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two a... / credit assignment in reinforcement learning. PhD thesis University br cooperative mechanisms in reinforcement learning. In Proceedings of the

223.1   Generalization in Reinforcement Learning: Successful Examples Using.. - Sutton (1996)   (Correct)
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases th... / . Generalization in Reinforcement Learning Successful Examples br On large problems reinforcement learning systems must use

218.1   Evolving Artificial Neural Networks - Yao (1999)   (Correct)
Learning and evolution are two fundamental forms of adaptation. There has been a great interest in combining learning and evolution with artificial neural networks (ANNs) in recent years. This paper (... / unsupervised and reinforcement learning. Supervised learning is br to minimize the error. Reinforcement learning is a special case of

214.4   Prioritized Sweeping: Reinforcement Learning with Less Data and Less.. - Moore, Atkeson (1993)   (Correct)
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Qlearning have fast ... / Prioritized Sweeping Reinforcement Learning with Less Data and Less br Sweeping with other reinforcement learning schemes for a number of

208.6   Generalization in Reinforcement Learning: Safely Approximating the.. - Boyan, Moore (1995)   (Correct)
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimension... / Generalization in Reinforcement Learning Safely Approximating the br curse of dimensionality in reinforcement learning and dynamic programming

205.7   Multiagent Reinforcement Learning: Theoretical Framework and an.. - Hu, Wellman (1998)   (Correct)
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework... / Multiagent Reinforcement Learning Theoretical Framework and br a framework for multiagent reinforcement learning. Our work extends

200.0   Reinforcement Learning with Hierarchies of Machines - Parr, Russell (1997)   (Correct)
We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of ... / Reinforcement Learning with Hierarchies of br present a new approach to reinforcement learning in which the policies

200.0   Learning policies for partially observable environments: Scaling up - Littman, Cassandra, Kaelbling (1995)   (Correct)
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of ... / promise and practice. Using reinforcement-learning techniques and insights br problems addressed in the reinforcement-learning literature Moore

197.9   Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents - Tan (1993)   (Correct)
Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information,... / Multi-Agent Reinforcement Learning Independent vs. br Given the same number of reinforcement learning agents will cooperative

188.5   Gradient Descent for General Reinforcement Learning - Baird, Moore (1998)   (Correct)
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, def... / Descent for General Reinforcement Learning Leemon Baird Andrew br generate a wide range of new reinforcementlearning algorithms. These

188.4   Cooperative Mobile Robotics: Antecedents and Directions - Cao, Fukunaga, Kahng, Meng (1995)   (Correct)
There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying suc... / fault-tolerance and reinforcement learning. By contrast DJR br the architecture that uses reinforcement learning to adjust the parameters

187.6   On the Convergence of Stochastic Iterative Dynamic Programming.. - Jaakkola, Jordan, Singh (1994)   (Correct)
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD() algorit... / developments in the area of reinforcement learning have yielded a number of

181.8   Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)   (Correct)
Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper ... / Using Reinforcement Learning to Spider the Web br best framed and solved by reinforcement learning a branch of machine

179.7   Residual Algorithms: Reinforcement Learning with Function.. - Leemon Baird (1995)   (Correct)
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can ... / Residual Algorithms Reinforcement Learning with Function br ABSTRACT A number of reinforcement learning algorithms have been

176.8   Approximating Optimal Policies for Partially Observable Stochastic.. - Parr, Russell (1995)   (Correct)
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence. If the state of the world is known at all times, the world can be modeled as a Markov Decision Pr... / can be combined with reinforcement learning methods a combination br rule that is amenable to reinforcement learning methods and will permit

173.1   Feudal Reinforcement Learning - Dayan, Hinton (1993)   (Correct)
One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-learning managerial hierarchy... / San Mateo CA Feudal Reinforcement Learning Peter Dayan CNL The br One way to speed up reinforcement learning is to enable learning to

171.4   Approximate Planning in Large POMDPs via Reusable Trajectories - Kearns, Mansour, Ng (2000)   (Correct)
We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies  in a partially observable Markov decision process (POMDP). We assume we are given the ability ... / to the settings of reinforcement learning and planning. br learning to the settings of reinforcement learning and planning and we give

168.1   Reinforcement Learning Algorithm for Partially Observable Markov.. - Tommi Jaakkola (1995)   (Correct)
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov ass... / Reinforcement Learning Algorithm for Partially br attention has been paid to reinforcement learning algorithms in recent

165.9   Reinforcement Learning for Dynamic Channel Allocation in Cellular.. - Satinder Singh (1997)   (Correct)
In cellular telephone systems, an important problem is to dynamically allocate the communication resource (channels) so as to maximize service in a stochastic caller environment. This problem is natur... / Reinforcement Learning for Dynamic Channel br problem and we use a reinforcement learning RL method to find

163.6   A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov .. - Kearns, Mansour, Ng (1999)   (Correct)
An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments wi... / traditional planning and reinforcement learning algorithms are often br processes MDPs and reinforcement learning have become a standard

163.6   Simulation-Based Optimization of Markov Reward Processes.. - Marbach, Tsitsiklis (1999)   (Correct)
We consider discrete time, finite state space Markov reward processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the... / neuro-dynamic programming reinforcement learning in JSJ we refer to br P. Singh and M. I. Jordan. Reinforcement Learning Algorithm for Partially

161.7   Soccer Server: a tool for research on multi-agent systems - Noda, Matsubara, Hiraki, Frank (1997)   (Correct)
This paper describes Soccer Server, a simulator of the game of soccer designed as a test-bench for evaluating multi-agent systems and cooperative algorithms. In real life, successful soccer teams requ... / colleagues have been using reinforcement learning to develop the skills of a br and K. Hosoda. Vision-based reinforcement learning for purposive behavior

159.9   The Dynamics of Reinforcement Learning in Cooperative Multiagent.. - Claus, Boutilier (1998)   (Correct)
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dy... / The Dynamics of Reinforcement Learning in Cooperative Multiagent br Abstract Reinforcement learning can provide a robust and

159.9   The MAXQ Method for Hierarchical Reinforcement Learning - Dietterich (1998)   (Correct)
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics---as a subroutin... / Method for Hierarchical Reinforcement Learning Thomas G. Dietterich br approach to hierarchical reinforcement learning based on the MAXQ

159.4   Reinforcement Learning with Replacing Eligibility Traces - Singh (1996)   (Correct)
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze i... / in The Netherlands. Reinforcement Learning with Replacing br basic mechanisms used in reinforcement learning to handle delayed reward.

157.1   Automating the Construction of Internet Portals with Machine Learning - McCallum, Nigam, Rennie, al. (2000)   (Correct)
Domain-speci c internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows compl... / We describe new research in reinforcement learning information extraction br spidering crawling reinforcement learning information extraction

156.5   A Reinforcement Learning Approach to Job-shop Scheduling - Zhang (1995)   (Correct)
We apply reinforcement learning methods to learn domain-specific heuristics for job shop scheduling. A repair-based scheduler starts with a critical-path schedule and incrementally repairs constraint ... / A Reinforcement Learning Approach to Job-shop br A. Abstract We apply reinforcement learning methods to learn

154.5   MINERVA: A Second-Generation Museum Tour-Guide Robot - Thrun, Bennewitz, Burgard, Cremers.. (1999)   (Correct)
This paper describes an interactive tour-guide robot, which was successfully exhibited in a Smithsonian museum. During its two weeks of operation, the robot interacted with thousands of people, traver... / intents and employs reinforcement learning for tailoring its br Minerva used a memory-based reinforcement learning approach no delayed

151.7   Reinforcement Learning with Perceptual Aliasing: The Perceptual.. - Chrisman (1992)   (Correct)
It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situat... / Reinforcement Learning with Perceptual Aliasing br the effectiveness of reinforcement learning algorithms Whitehead

144.6   Reinforcement Learning in the Multi-Robot Domain - Mataric (1997)   (Correct)
This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environemnts such as in the complex concurrent multi-robot learning domain. The methodology involve... / Reinforcement Learning in the Multi-Robot Domain br describes a formulation of reinforcement learning that enables learning in

143.2   Learning to coordinate without sharing information - Sen, Sekaran, Hale (1994)   (Correct)
Researchers in the field of Distributed Artificial Intelligence (DAI) have been developing efficient mechanisms to coordinate the activities of multiple autonomous agents. The need for coordination ar... / coordination. We use reinforcement learning techniques on a block br on similar problems. Reinforcement learning based coordination can be

143.1   Simple Statistical Gradient-Following Algorithms for Connectionist.. - Williams (1992)   (Correct)
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are show... / for Connectionist Reinforcement Learning Ronald J. Williams br class of associative reinforcement learning algorithms for

142.8   Reinforcement Learning in POMDP's via Direct Gradient Ascent - Baxter, Bartlett (2000)   (Correct)
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCE-like... / Reinforcement Learning in POMDP's via Direct br . Introduction Reinforcement learning is used to describe the

142.8   Monte Carlo POMDPs - Thrun (2000)   (Correct)
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for re... / for belief propagation. A reinforcement learning algorithm value br and propagation. Reinforcement learning in belief space is

142.8   Solving POMDPs by Searching in Policy Space - Hansen (1998)   (Correct)
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solvi... / using value iteration or reinforcement learning. Because the policy is

139.1   Classifier Fitness Based on Accuracy - Wilson (1995)   (Correct)
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS,... / for a wide range of reinforcement learning situations where br for a wide range of reinforcement learning situations where

136.3   Learning Policies with External Memory - Peshkin, Meuleau, Kaelbling (1999)   (Correct)
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in... / Introduction A reinforcement-learning agent must learn a mapping br perform fairly well. Basic reinforcement-learning techniques such as

136.3   Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation.. - Baxter, Bartlett (1999)   (Correct)
Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy generat... / Direct Gradient-Based Reinforcement Learning I. Gradient Estimation br based approaches to reinforcement learning suffer from a paucity of

136.3   Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent.. - Baxter, Weaver (1999)   (Correct)
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). unkn... / Direct Gradient-Based Reinforcement Learning II. Gradient Ascent br -based approaches to reinforcement learning is that it guarantees

136.2   Stable Function Approximation in Dynamic Programming - Gordon (1995)   (Correct)
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area ... / Abstract The success of reinforcement learning in practical problems br W. Moore. Generalization in reinforcement learning safely approximating the

131.9   On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach - Salzberg (1997)   (Correct)
An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, co... / error back propagation reinforcement learning and rule learning. Over

131.4   Hierarchical Control and Learning for Markov Decision Processes - Parr (1998)   (Correct)
Hierarchical Control and Learning for Markov Decision Processes by Ronald Edward Parr Doctor of Philosophy in Computer Science University of California at Berkeley Professor Stuart Russell, Cha... / . Reinforcement Learning Methods br . Reinforcement learning with HAMs

130.4   Incremental Multi-Step Q-Learning - Peng, Williams (1996)   (Correct)
This paper presents a novel incremental algorithm that combines Q-learning, a wellknown dynamic programming-based reinforcement learning method, with the TD() return estimation process, which is typic... / dynamic programming-based reinforcement learning method with the TD br dynamic programming-based reinforcement learning method. The parameter

129.3   Artificial Life and Real Robots - Brooks (1992)   (Correct)
The first part of this paper explores the general issues in using Artificial Life techniques to program actual mobile robots. In particular it explores the difficulties inherent in transferring progra... / new behaviors using reinforcement learning e.g.Kaelbling and br Behavior-based Robots using Reinforcement Learning Sridhar Mahadevan and

128.5   Learning to Cooperate via Policy Search - Peshkin, Meuleau, Kaelbling (2000)   (Correct)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperativ... / structure. Valuebased reinforcement-learning algorithms such as br known to the agents. In reinforcement learning no explicit model

127.5   Algorithms for Sequential Decision Making - Littman (1996)   (Correct)
of "Algorithms for Sequential Decision Making" by Michael Lederman Littman, Ph.D., Brown University, May 1996. unknown Michael Lederman Liftman Ph.D. Dissertation Department of Computer Science Br... / anyone makes the field of reinforcement learning a nice place to work. br Justin Boyan games and reinforcement learning Anne Condon solving

119.9   Simulation-Based Optimization of Markov Reward Processes - Marbach, Tsitsiklis (1998)   (Correct)
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Proce... / go under the names of reinforcement learning or neuro-dynamic br Singh and M. I. Jordan Reinforcement Learning Algorithm for Partially

114.2   Autonomous Helicopter Control using Reinforcement Learning Policy.. - Bagnell, Schneider (2001)   (Correct)
Many control problems in the robotics field can be cast as Partially Observed Markovian Decision Problems (POMDPs), an optimal control formalism. Finding optimal solutions to such problems in general,... / Helicopter Control using Reinforcement Learning Policy Search Methods br Traditional model-based reinforcement learning algorithms make a

114.2   Stochastic Search for Signal Processing Algorithm Optimization - Singer, Veloso (2001)   (Correct)
Many difficult problems can be viewed as search problems. However, given a new task with an embedded search problem, it is challenging to state and find a truly effective search approach. In this pape... / and Littman use reinforcement learning to learn to select br Algorithm selection using reinforcement learning. In Proceedings of

114.2   Scaling Reinforcement Learning toward RoboCup Soccer - Stone (2001)   (Correct)
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the e... /

114.2   Convergence Results for Single-Step On-Policy Reinforcement-Learning.. - Singh, Jaakkola, al. (1998)   (Correct)
An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration /exploitation ... / for Single-Step On-Policy Reinforcement-Learning Algorithms SATINDER br An important application of reinforcement learning RL is to finite-state

113.5   Efficient Algorithms for Minimizing Cross Validation Error - Moore, Lee (1994)   (Correct)
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict futu... / exploitation dilemma in reinforcement learning. Greiner and Jurisica

109.0   Approximate Solutions to Markov Decision Processes - Gordon (1999)   (Correct)
One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence... / machine learning reinforcement learning dynamic programming br Baird. Residual algorithms Reinforcement learning with function

109.0   Building Domain-Specific Search Engines with Machine Learning.. - McCallum, Nigam, Rennie, Seymore (1999)   (Correct)
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.camps... / describe new research in reinforcement learning text classification and br engine creation using reinforcement learning text classification and

109.0   General Principles Of Learning-Based Multi-Agent Systems - Wolpert, Wheeler, al. (1999)   (Correct)
We consider the problem of how to design large decentralized multi-agent systems (MAS's) in an automated fashion, with little or no hand-tuning. Our approach has each agent run a reinforcement learnin... / has each agent run a reinforcement learning algorithm. This converts br ffl the agents each run reinforcement learning RL algorithms ffl

106.8   Transfer of Learning by Composing Solutions of Elemental Sequential.. - Singh (1992)   (Correct)
Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focussed on singl... / tasks most applications of reinforcement learning have focussed on single br application of reinforcement learning to multiple tasks requires

104.3   Purposive Behavior Acquisition for a Real Robot by Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1996)   (Correct)
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a ... / Real Robot By Vision-Based Reinforcement Learning Minoru Asada Shoichi br a method of vision-based reinforcement learning by which a robot learns to

102.1   Robot Shaping: Experiment In Behavior Engineering - Dorigo, Colombetti (1997)   (Correct)
its performance. In fact, we use the expression robot shaping to denote the use of learning as a means to translate suggestions coming from an external trainer into an effective control strategy that... / is an approach based on reinforcement learning with reinforcements br we have experimented with reinforcement learning RL RL can be seen as a

102.1   ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods.. - Menczer (1997)   (Correct)
ARACHNID is a distributed algorithm for information discovery in large, dynamic, distributed environments such as the World Wide Web. The approach is based on a distributed, adaptive population of int... / the user on the basis of reinforcement learning Armstrong et al. br agent's reproductive cycle. Reinforcement learning is the natural extension

100.0   Hierarchic Social Entropy: An Information Theoretic Measure of Robot.. - Balch (2000)   (Correct)
As research expands in multiagent intelligent systems, investigators need new tools for evaluating the artificial societies they study. It is impossible, for example, to correlate heterogeneity with... / similar agents that use reinforcement learning to develop behavioral br policies developed using reinforcement learning techniques since once

99.9   A Machine Learning Approach to Building Domain-Specific Search Engines - McCallum, Nigam, Rennie, Seymore (1999)   (Correct)
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are al... / describe new research in reinforcement learning text classification and br the spidering task in a reinforcement learning framework Kaelbling

98.7   Packet Routing in Dynamically Changing Networks: A Reinforcement.. - Boyan, Littman (1994)   (Correct)
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each no... / Changing Networks A Reinforcement Learning Approach Justin A. Boyan br packet routing in which a reinforcement learning module is embedded into

98.5   Reinforcement Learning with Soft State Aggregation - Singh, Jaakkola, Jordan (1995)   (Correct)
It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of t... / Reinforcement Learning with Soft State br is crucial to scaling reinforcement learning RL algorithms to

97.1   Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)   (Correct)
We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After o... / Near-Optimal Reinforcement Learning in Polynomial Time br present new algorithms for reinforcement learning and prove that they have

96.5   The Role Of Exploration In Learning Control - Thrun (1992)   (Correct)
Introduction Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be sufficiently explored in ord... / adaptive neurocontrol and reinforcement learning. In Section we discuss br trade-off Kaelbling reinforcement learning Watkins

92.7   Learning to Use Selective Attention and Short-Term Memory in.. - McCallum (1996)   (Correct)
This paper presents U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces an... / paper presents U-Tree a reinforcement learning algorithm that uses br question How can a reinforcement learning agent successfully learn

92.7   Instance-Based Utile Distinctions for Reinforcement Learning with.. - Andrew Mccallum (1995)   (Correct)
We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous wo... / Utile Distinctions for Reinforcement Learning with Hidden State R. br Utile Suffix Memory a reinforcement learning algorithm that uses

92.7   Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)   (Correct)
Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, ... / Finding Structure in Reinforcement Learning Sebastian Thrun br eds. Abstract Reinforcement learning addresses the problem of

91.4   Layered Learning in Multi-Agent Systems - Stone (1998)   (Correct)
Multi-agent systems in complex, real-time domains require agents to act e#ectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of auton... / hierarchical learning reinforcement learning decision tree learning br a new multi-agent reinforcement learning algorithm namely

91.3   Memoryless Policies: Theoretical Limitations and Practical Results - Michael L. Littman (1994)   (Correct)
One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time it takes to reach a goal state. This paper presents some theoretical and empirical findings on algorit... / and more recently by reinforcement learning researchers e.g. br A classic example from the reinforcement learning literature is Sutton's

90.9   Learning with Mixtures of Trees - Meila-Predoviciu (1999)   (Correct)
One of the challenges of density estimation as it is used in machine learning is that usually the data are multivariate and often the dimensionality is large. Operating with joint distributions over m... / fostered my interest in reinforcement learning statistics graphical

90.9   Multiagent Reinforcement Learning in Stochastic Games - Hu, Wellman (1999)   (Correct)
We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov deci... / Multiagent Reinforcement Learning in Stochastic Games br we design a multiagent reinforcement learning method which allows

90.9   Experience-weighted Attraction Learning in Normal Form Games - Camerer, Ho (1999)   (Correct)
We describe a general model, `experience-weighted attraction' #EWA# learning, which includes reinforcement learning and a class of weighted #ctitious play belief models as special cases. In EWA, strat... / learning which includes reinforcement learning and a class of weighted br behavioral game theory reinforcement learning ctitious play.

90.7   An Adaptive Communication Protocol for Cooperating Mobile Robots - Yanco, Stein (1993)   (Correct)
We describe mobile robots engaged in a cooperative task that requires communication. The robots are initially given a fixed but uninterpreted vocabulary for communication. In attempting to perform the... / the design of appropriate reinforcement learning algorithms to learn br his symbolic test suite for reinforcement learning algorithms. Work on the

88.8   Purposive Behavior Acquisition on a Real Robot by a Vision-Based.. - Asada, Noda, Tawaratsumida, Hosoda (1994)   (Correct)
In [1], we have presented the soccer robot which had learned to shoot a ball into the goal using the Q-learning. In this paper, we discuss several issues in applying the Qlearning method to a real rob... / Robot By A Vision-Based Reinforcement Learning Minoru Asada Shoichi br method for robot learning reinforcement learning has recently been

88.6   Overcoming Incomplete Perception with Utile Distinction Memory - McCallum (1993)   (Correct)
This paper presents a method by which a reinforcement learning agent can solve the incomplete perception problem using memory. The agent uses a hidden Markov model (HMM) to represent its internal stat... / a method by which a reinforcement learning agent can solve the br will build a In reinforcement learning good task performance is

86.4   Learning Without State-Estimation in Partially Observable Markovian.. - Singh, Jaakkola, Jordan (1994)   (Correct)
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see ... / Abstract Reinforcement learning RL algorithms provide a br state of the environment. Reinforcement learning RL techniques provide a

85.7   Infinite-Horizon Policy-Gradient Estimation - Baxter, Bartlett (2001)   (Correct)
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problem... / to direct policy search in reinforcement learning have received much recent br tend to go by the name Reinforcement Learning and have been

85.7   An Algorithmic Description of XCS - Butz, Wilson (2001)   (Correct)
A concise description of the XCS classifier system's parameters, structures, and algorithms is presented as an aid to research. The algorithms are written in modularly structured pseudo code with acco... / Due to the Q-learning-like reinforcement learning in XCS payo does not br selection in the reinforcement learning literature Sutton

85.7   Temporal Abstraction in Reinforcement Learning - Precup (2000)   (Correct)
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions r... / Temporal Abstraction In Reinforcement Learning A Dissertation Presented

85.7   Learning Optimal Dialogue Strategies: A Case Study of a Spoken.. - Walker, Fromer, Narayanan (1998)   (Correct)
This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of comm... / is based on algorithms for reinforcement learning such as dynamic br S i derived Several reinforcement learning algorithms based on

82.4   Efficient Learning and Planning Within the Dyna Framework - Peng, Williams (1993)   (Correct)
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhanc... / be cast in the form of reinforcement learning tasks. Recent work in br and the creation of new reinforcement learning algorithms such as

81.8   Distributed Value Functions - Schneider, Wong, Moore, Riedmiller (1999)   (Correct)
Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutio... / candidates for solving with reinforcement learning RL also have br algorithm for distributed reinforcement learning based on distributing the

81.4   Continual Learning In Reinforcement Environments - Ring (1994)   (Correct)
Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed.... / on the sections involving reinforcement learning. Thanks also to Risto br complicated non-Markovian reinforcement-learning tasks and can then

80.4   Tight Performance Bounds on Greedy Policies Based on Imperfect Value.. - Williams (1993)   (Correct)
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal va... / result from applying a reinforcement learning algorithm. Unless this br error typically used in reinforcement learning applications. The

80.0   Accelerated Focused Crawling through Online Relevance Feedback - Chakrabarti, Punera, Subramanyam (2002)   (Correct)
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that... / Document object model Reinforcement learning. Introduction br paradigm is related to reinforcement learning and AI programs that

78.2   Discovery of Subroutines in Genetic Programming - Rosca, Ballard (1996)   (Correct)
Introduction Hierarchical Genetic Programming (HGP) extensions discover, modify, and exploit subroutines to accelerate the evolution of programs [Koza 1992, Rosca and Ballard 1994a] . The use of subr... / in the larger context of reinforcement learning problems. Finally br the fitness of subroutines. Reinforcementlearning RL algorithms such as

78.2   Multiagent coordination with learning classifier systems - Sen, Sekaran (1996)   (Correct)
this paper, we evaluate a particular reinforcement learning methodology, a genetic algorithm based machine learning mechanism known as classifier systems [ Holland, 1986 ] for developing action polici... / agents. We have used reinforcement learning Barto et al. br we evaluate a particular reinforcement learning methodology a genetic

78.2   Adaptive Load Balancing: A Study in Multi-Agent Learning - Schaerf, Shoham, Tennenholtz (1995)   (Correct)
We study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first defi... / the process of multi-agent reinforcement learning in the context of load br investigates multi-agent reinforcement learning in the context of a

78.2   On the Complexity of Solving Markov Decision Problems - Littman, Dean, Kaelbling (1995)   (Correct)
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize resul... / automated planning and reinforcement learning. In this paper we br planning reinforcement learning and other sequential

77.5   Efficient Exploration In Reinforcement Learning - Sebastian B. Thrun (1992)   (Correct)
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, di... / Efficient Exploration In Reinforcement Learning Sebastian B. Thrun br domains embedded in a reinforcement learning framework delayed

76.5   Symbiotic Evolution of Neural Networks in Sequential Decision Tasks - Moriarty (1997)   (Correct)
viii Chapter 1 Introduction 1 1.1 Sequential Decision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Examples of Sequential Decision Tasks . . . . . . . . . . . . . . ... / Reinforcements . Reinforcement Learning vs. Supervised Learning . br . Temporal Difference Reinforcement Learning .

76.5   Coordination Of Multiple Behaviors Acquired By A Vision-Based.. - Asada, Uchibe, Noda, Tawaratsumida.. (1994)   (Correct)
A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors whi... / Acquired By A Vision-Based Reinforcement Learning Minoru Asada Eiji br acquired by a vision-based reinforcement learning. First individual

76.5   ZCS: A Zeroth Level Classifier System - Wilson (1994)   (Correct)
A basic classifier system, ZCS, is presented which keeps much of Holland's original framework but simplifies it to increase understandability and performance. ZCS's relation to Q-learning is brought o... / on the related field of reinforcement learning Barto efforts to br under the heading of reinforcement learning and appears to provide a

75.3   Average Reward Reinforcement Learning: Foundations, Algorithms, and.. - Mahadevan (1996)   (Correct)
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounte... / Average Reward Reinforcement Learning Foundations Algorithms br study of average reward reinforcement learning an undiscounted

75.3   MIMIC: Finding Optima by Estimating Probability Densities - De Bonet, Isbell, Jr., Viola (1996)   (Correct)
In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely... / by Sabes and Jordan for a reinforcement learning task Sabes and Jordan br and Jordan M. I. Reinforcement learning by probability matching.

74.2   Building Agent Teams Using an Explicit Teamwork Model and Learning - Tambe, Adibi, Al-Onaizan, Kaminka.. (1998)   (Correct)
Multi-agent collaboration or teamwork and learning are two critical research challenges in a large number of multi-agent applications. These research challenges are highlighted in RoboCup, an internat... / off-line and on-line reinforcement learning. One of the key surprises br to intercept a ball using reinforcement learning . Learning to Shoot

74.2   Using Decision Tree Confidence Factors for Multiagent Control - Stone, Veloso (1998)   (Correct)
Although Decision Trees are widely used for classification tasks, they are typically not used for agent control. This paper presents a novel technique for agent control in a complex multiagent domai... / module for instance a reinforcement learning module to learn whether br acquired by vision-based reinforcement learning. In Proc. of IEEE RSJ GI

74.2   Adaptive Agent-Driven Routing and Load Balancing in Communication.. - Heusse, Snyers, Guérin, Kuntz (1998)   (Correct)
This paper presents an unified overview of a new family of distributed algorithms for routing and load balancing in dynamic communication networks. These new algorithms are described as an extension t... / vector algorithm based on reinforcement learning The routing policy br its destination. Following reinforcement learning the estimates can be

72.7   Adaptive Retrieval Agents: Internalizing Local Context and Scaling up .. - Menczer, Belew (1999)   (Correct)
This paper focuses on two machine learning abstractions springing from ecological models: (i) evolutionary adaptation by local selection, and (ii) selective query expansion by internalization of env... / selection internalization reinforcement learning Q-learning neural br learned solutions e.g.by reinforcement learning cannot capture global

72.7   Learning Finite-State Controllers for Partially Observable.. - Meuleau, Peshkin, Kim, Kaelbling (1999)   (Correct)
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable... / successful application of reinforcement learning RL to real world br a direct modelfree reinforcement learning RL algorithm learn a

72.7   Call Admission Control and Routing in Integrated Services Networks.. - Marbach, Mihatsch, Tsitsiklis (1999)   (Correct)
We consider the problem of call admission control and routing in an integrated services network that handles several classes of calls of different value and with different resource requirements. The p... / neuro-dynamic programming reinforcement learning together with a br programming also called reinforcement learning RL or neuro-dynamic

72.7   Neuro-Fuzzy Systems for Function Approximation - Nauck, Kruse (1999)   (Correct)
We propose a neuro--fuzzy architecture for function approximation based on supervised learning. The learning algorithm is able to determine the structure and the parameters of a fuzzy system. The appr... / and is trained by reinforcement learning based on a fuzzy error br supervised learning i.e. reinforcement learning. On the other hand if

72.7   Call Admission Control and Routing in Integrated Services Networks.. - Marbach, Mihatsch, Tsitsiklis (1999)   (Correct)
We consider the problem of call admission control and routing in an integrated services network that handles several classes of calls of different value and with different resource requirements. The p... / neuro-dynamic programming reinforcement learning together with a br programming also called reinforcement learning RL or neuro-dynamic

71.4   Practical Reinforcement Learning in Continuous Spaces - Smart, Kaelbling (2000)   (Correct)
Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can caus... / Practical Reinforcement Learning in Continuous Spaces br for the application of reinforcement learning techniques. However many

70.1   Adding learning to the cellular development of neural networks.. - Gruau, Whitley, Pyeatt (1993)   (Correct)
A grammar tree is used to encode a cellular developmental process that can generate whole families of Boolean neural networks for computing parity and symmetry. The development process resembles bio... / supervised learning and for reinforcement learning applications. Genetic

69.5   High-Performance Job-Shop Scheduling With A Time-Delay TD(lambda).. - Zhang, Dietterich (1995)   (Correct)
Job-shop scheduling is an important task for manufacturing industries. We are interested in the particular task of scheduling payload processing for NASA's space shuttle program. This paper summarizes... / task for solution by the reinforcement learning algorithm TD A br Navigation and Planning Reinforcement Learning Presentation

69.1   Temporal Difference Learning of Position Evaluation in the Game of Go - Schraudolph, Dayan, Sejnowski (1994)   (Correct)
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. D... / In order to demonstrate reinforcement learning as a viable alternative to br Credit Assignment in Reinforcement Learning. PhD thesis University

68.5   Relational Reinforcement Learning - Dzeroski, De Raedt, Blockeel (1998)   (Correct)
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressi... / Relational Reinforcement Learning Saso Dzeroski br Abstract Relational reinforcement learning is presented a learning

68.0   Towards Collaborative and Adversarial Learning: A Case Study in.. - Stone, Veloso (1997)   (Correct)
Soccer is a rich domain for the study of multiagent learning issues. Not only must the players learn low-level skills, but they must also learn to work together and to adapt to the behaviors of differ... / papers describes a reinforcement learning agent which incorporates br Ford et al. used a Reinforcement Learning RL approach with sensory

66.6   A Machine Learning Architecture for Optimizing Web Search Engines - Boyan, Freitag, Joachims (1996)   (Correct)
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing... / a novel one inspired by reinforcement learning techniques for propagating br motivated by an analogy to reinforcement learning as studied in artificial

66.6   A Robot Controller Using Learning by Imitation - Hayes, Demiris (1994)   (Correct)
Roboticists have already invested considerable energy in building robot controllers which model the learning capacities of single animals. In this paper we present a new type of controller which dra... / computationally expensive reinforcement learning stage is permissible it br a negotiation strategy. A reinforcement learning module could be useful

65.9   Hierarchical Learning in Stochastic Domains: Preliminary Results - Kaelbling (1993)   (Correct)
This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Sp... / INTRODUCTION Reinforcement learning is a general tool for br A crucial problem in reinforcement learning is temporal credit

63.8   Explanation-Based Learning and Reinforcement Learning: A Unified View - Dietterich, al. (1997)   (Correct)
In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both... / Learning and Reinforcement Learning A Unified View THOMAS br learning EBL and reinforcement learning RL methods can be

63.8   Learning Roles: Behavioral Diversity in Robot Teams - Tucker Balch (1997)   (Correct)
This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it b... / strategy using reinforcement learning. The agents learn br in a task is available reinforcement learning can shift the burden of

63.7   Action Selection methods using Reinforcement Learning - Mark Humphrys University (1996)   (Correct)
Action Selection schemes, when translated into precise algorithms, typically involve considerable design effort and tuning of parameters. Little work has been done on solving the problem using lea... / Selection methods using Reinforcement Learning Mark Humphrys br selection problem using Reinforcement Learning learning from rewards

63.7   Emergent Adaptive Lexicons - Steels (1996)   (Correct)
The paper reports experiments to test the hypothesis that language is an autonomous evolving adaptive system maintained by a group of distributed agents without central control. The experiments show h... / small group of robots using reinforcement learning. Again the size of the

63.7   ALECSYS and the AutonoMouse: Learning to Control a Real Robot by.. - Dorigo (1995)   (Correct)
In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constra... / Classifier Systems Reinforcement Learning Genetic Algorithms br this article belongs to the reinforcement learning research field. Holland's

63.6   Stochastic Dynamic Programming with Factored Representations - Boutilier, Dearden, al. (1999)   (Correct)
Markov decision processes(MDPs) haveproven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specificati... / for most work in reinforcement learning MDPs br problem in the context of reinforcement learning in addition Their

63.6   Least-Squares Temporal Difference Learning - Boyan (1999)   (Correct)
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbac... / of LSTD as a model-based reinforcement learning technique. BACKGROUND br model-free to a model-based reinforcement learning algorithm. This

63.6   Coordinating Mobile Robot Group Behavior Using a Model of Interaction .. - Dani Goldberg (1999)   (Correct)
In this paper we show how various levels of coordinated behavior may be achieved in a group of mobile robots by using a model of the interaction dynamics between a robot and its environment. We presen... / Markov decision processes reinforcement learning intractable on mobile

63.6   OBDD-based Universal Planning: Specifying and Solving Planning.. - Jensen, Veloso (1999)   (Correct)
Recently model checking representation and search techniques were shown to be efficiently applicable to planning, in particular to non-deterministic planning. Such planning approaches use Ordered B... / resembles the outcome of reinforcement learning in that the br valid sequence of actions. Reinforcement Learning RL can also be

63.6   Reinforcement Learning for Spoken Dialogue Systems - Singh, Kearns, Litman, Walker (1999)   (Correct)
Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application of MDP algorithms to dialogue systems faces a number of se... / Preference ORAL Reinforcement Learning for Spoken Dialogue br software tool RLDS for Reinforcement Learning for Dialogue Systems

61.7   Learning To Solve Markovian Decision Processes - Singh (1994)   (Correct)
LEARNING TO SOLVE MARKOVIAN DECISION PROCESSES February 1994 Satinder P. Singh B.Tech., INDIAN INSTITUTE OF TECHNOLOGY NEW DELHI M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHU... / researchers have developed reinforcement learning RL algorithms based on br . Why Reinforcement Learning

59.5   Hierarchical Learning with Procedural Abstraction Mechanisms - Rosca (1997)   (Correct)
Evolutionary computation (EC) consists of the design and analysis of probabilistic algorithms inspired by the principles of natural selection and variation. Genetic Programming (GP) is one subfield of... / . Reinforcement learning offers insights to br Bibliography A Reinforcement Learning B Minimum Description

59.5   Complexity Analysis of Real-Time Reinforcement Learning Applied to.. - Koenig, Simmons (1997)   (Correct)
This report analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous real-time versions of Q-learning and value-iteration, applied to the problems of reaching any goal... / Analysis of Real-Time Reinforcement Learning Applied to Finding br Machine Learning Reinforcement Learning Learning Adaptation

59.5   Complexity Analysis of Real-Time Reinforcement Learning - Koenig, Simmons (1997)   (Correct)
This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal stat... / Analysis of Real-Time Reinforcement Learning Sven Koenig and Reid br the complexity of on-line reinforcement learning algorithms namely

59.5   Learning from Demonstration - Schaal (1997)   (Correct)
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initia... / applied in the context of reinforcement learning. We consider priming the br problems only model-based reinforcement learning shows significant speed-up

57.9   Coevolution of a Backgammon Player - Pollack, Blair, Land (1996)   (Correct)
One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims's work on artificial ro... / good news about the reinforcement learning method For the idea of br framework for multi-agent reinforcement learning. In Machine Learning

57.9   Reinforcement Learning Methods for Continuous-Time Markov Decision.. - Steven Bradtke (1995)   (Correct)
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution... / Reinforcement Learning Methods for br Problems. A number of reinforcement learning algorithms have been

57.1   Hierarchical Memory-Based Reinforcement Learning - Hernandez-Gardiol, Mahadevan (2001)   (Correct)
A key challenge for reinforcement learning is how to scale up to large partially observable domains. In this paper, we show how a hierarchy of behaviors can be used to create and select among varia... / Hierarchical Memory-Based Reinforcement Learning Natalia br A key challenge for reinforcement learning is how to scale up to

57.1   An Architecture for Action Selection in Robotic Soccer - Stone, McAllester (2001)   (Correct)
CMUnited-99 was the 1999 RoboCup robotic soccer simulator league champion. In the RoboCup-2000 competition, CMUnited-99 was entered again and despite being publicly available for the entire year, it s... / rewards in the sense of reinforcement learning One might say for br successfully learned via reinforcement learning. . CONCLUSION

57.1   Speeding up Relational Reinforcement Learning Through the Use of an.. - Driessens, Ramon, Blockeel (2001)   (Correct)
Relational reinforcement learning (RRL) is a learning technique that combines standard reinforcement learning with inductive logic programming to enable the learning system to exploit structural kno... / Speeding up Relational Reinforcement Learning Through the Use of an br Abstract. Relational reinforcement learning RRL is a learning

57.1   Determination Of Sensory Motor Coordination Parameters For A Robot.. - Mark Edward Cambron (2001)   (Correct)
This paper proposes a method for the determination of Sensory-Motor Coordination (SMC) parameters through the teleoperation of a humanoid robot designed for human-robot interaction. It is argued that ... / One is to set up a reinforcement learning scheme which allows the

57.1   Learning Evaluation Functions to Improve Optimization by Local Search - Boyan, Moore (2000)   (Correct)
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome... / its foundations in reinforcement learning and illustrate its br of adaptive local search reinforcement learning and genetic algorithms.

57.1   Introducing a Genetic Generalization Pressure to the Anticipatory.. - Butz, Goldberg, Stolzmann (2000)   (Correct)
The Anticipatory Classifier System (ACS) is able to form a complete internal representation of an environment. Unlike most other classifier system and reinforcement learning approaches, it is able t... / other classifier system and reinforcement learning approaches it is able to br the external environment. Reinforcement learning approaches like Dyna

57.1   Adaptive Agents with Reinforcement Learning and Internal Memory - Lanzi (2000)   (Correct)
Perceptual aliasing is a serious problem for adaptive agents. Internal memory is a promising approach to extend reinforcement learning algorithms to problems involving perceptual aliasing. In this... / Adaptive Agents with Reinforcement Learning and Internal Memory br approach to extend reinforcement learning algorithms to problems

57.1   Introducing a Genetic Generalization Pressure to the Anticipatory.. - Butz, Goldberg, Stolzmann (2000)   (Correct)
The Anticipatory Classifier System is a learning classifier system that is based on the cognitive mechanism of anticipatory behavioral control. Besides the common reward learning, the ACS is able to... / which is not possible with reinforcement learning techniques. Furthermore br Sutton R. S. Reinforcement learning architectures for animats.

57.1   Incorporating Prior Knowledge and Previously Learned Information into .. - Dixon, Malak, Khosla (2000)   (Correct)
Reinforcement learning has received much attention in the past decade. The primary thrust of this research has focused on tabula rasa learning methods. That is, the learning agent is initially unawar... /

57.1   Elevator Group Control Using Multiple Reinforcement Learning Agents - Crites, Barto (1998)   (Correct)
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basi... / Control Using Multiple Reinforcement Learning Agents ROBERT H. CRITES br and theoretical advances in reinforcement learning RL have attracted

57.1   Programmable Pattern Generators - Schaal, Sternad (1998)   (Correct)
This paper explores the idea to create complex human-like arm movements from movement primitives based on nonlinear attractor dynamics. Each degree-offreedom of an arm is assumed to have two indepen... / its modern relative reinforcement learning provide a well founded br recent developments in reinforcement learning increased the range of

56.7   Converges with Probability - Peter Dayan Terrence (1994)   (Correct)
The methods of temporal differences (Samuel, 1959; Sutton 1984, 1988) allow agents to learn accurate predictions about stationary stochastic future outcomes. The learning is effectively stochastic a... / Probability Keywords reinforcement learning temporal differences br well as other classes of reinforcement learning algorithm.

55.3   Ants and Reinforcement Learning: A Case Study in Routing in Dynamic.. - Subramanian, Druschel, Chen (1997)   (Correct)
We investigate two new distributed routing algorithms for data networks based on simple biological "ants" that explore the network and rapidly learn good routes, using a novel variation of reinforceme... / Ants and Reinforcement Learning A Case Study in Routing br using a novel variation of reinforcement learning. These two algorithms are

54.5   Evolutionary Algorithms for Reinforcement Learning - Moriarty, Schultz, Grefenstette (1999)   (Correct)
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algo... / Evolutionary Algorithms for Reinforcement Learning David E. Moriarty br approaches to solving reinforcement learning problems namely

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute