Results 11  20
of
569
Scaling Reinforcement Learning toward RoboCup Soccer
, 2001
"... RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tilecoding funct ..."
Abstract

Cited by 120 (23 self)
 Add to MetaCart
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tilecoding function approximation and variable to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, \the keepers," tries to keep control of the ball for as long as possible despite the eorts of \the takers." The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ballholder and when to cover possible passing lanes. Our agents learned policies that signi cantly outperformed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including dierent eld sizes and dierent numbers of players on each team.
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract

Cited by 116 (1 self)
 Add to MetaCart
(Show Context)
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
Linear Time Inference in Hierarchical HMMs
 In Proceedings of Neural Information Processing Systems
, 2001
"... The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original inference algorithm is rather complicated, and takes O(T ) time, where T is the length of the s ..."
Abstract

Cited by 108 (3 self)
 Add to MetaCart
The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original inference algorithm is rather complicated, and takes O(T ) time, where T is the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes O(T ) time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference.
Learning movement primitives
 International Symposium on Robotics Research (ISRR2003
, 2004
"... Abstract. This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic ..."
Abstract

Cited by 102 (14 self)
 Add to MetaCart
(Show Context)
Abstract. This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Modelbased control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, online modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including selfimprovement of the movement patterns towards energy efficiency through resonance tuning. 1
State Abstraction for Programmable Reinforcement Learning Agents
 In Proceedings of the Eighteenth National Conference on Artificial Intelligence
, 2002
"... Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where ..."
Abstract

Cited by 101 (3 self)
 Add to MetaCart
Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where learned behaviors must conform to a given partial, hierarchical program. Unlike previous approaches to this problem, our methods yield significant state abstraction while maintaining hierarchical optimality, i.e., optimality among all policies consistent with the partial program. We show how to achieve this for a partial programming language that is essentially Lisp augmented with nondeterministic constructs. We demonstrate our methods on two variants of Dietterich's taxi domain, showing how state abstraction and hierarchical optimality result in faster learning of better policies and enable the transfer of learned skills from one problem to another.
A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning
, 2010
"... We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased se ..."
Abstract

Cited by 91 (11 self)
 Add to MetaCart
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments—active user modelling with preferences, and hierarchical reinforcement learning— and a discussion of the pros and cons of Bayesian optimization based on our experiences.
Multiple modelbased reinforcement learning
 Neural Computation
, 2002
"... We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple modelbased reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environme ..."
Abstract

Cited by 85 (5 self)
 Add to MetaCart
We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple modelbased reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The 1 system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The “responsibility signal,” which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discretetime, finite state case and continuoustime, continuous state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters. 1
Using relative novelty to identify useful temporal abstractions in reinforcement learning
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract

Cited by 78 (11 self)
 Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.
Eligibility Traces for OffPolicy Policy Evaluation
 Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporaldifference methods. Here we generalize eligibility traces to offpolicy learning, in which one learns about a policy different from the ..."
Abstract

Cited by 73 (10 self)
 Add to MetaCart
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporaldifference methods. Here we generalize eligibility traces to offpolicy learning, in which one learns about a policy different from the policy that generates the data. Offpolicy methods can greatly multiply learning, as many policies can be learned about from the same data stream, and have been identified as particularly useful for learning about subgoals and temporally extended macroactions. In this paper we consider the offpolicy version of the policy evaluation problem, for which only one eligibility trace algorithm is known, a Monte Carlo method. We analyze and compare this and four new eligibility trace algorithms, emphasizing their relationships to the classical statistical technique known as importance sampling. Our main results are 1) to establish the consistency and bias properties of the new methods and 2) to empirically rank the new methods, showing improvement over onestep and Monte Carlo methods. Our results are restricted to modelfree, tablelookup methods and to offline updating (at the end of each episode) although several of the algorithms could be applied more generally. 1.
Learning Predictive State Representations
 In The Twentieth International Conference on Machine Learning (ICML2003
, 2003
"... We introduce the rst algorithm for learning predictive state representations (PSRs), which are a way of representing the state of a controlled dynamical system. The state representation in a PSR is a vector of predictions of tests, where tests are sequences of actions and observations said to ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
We introduce the rst algorithm for learning predictive state representations (PSRs), which are a way of representing the state of a controlled dynamical system. The state representation in a PSR is a vector of predictions of tests, where tests are sequences of actions and observations said to be true if and only if all the observations occur given that all the actions are taken. The problem of nding a good PSRone that is a su cient statistic for the dynamical systemcan be divided into two parts: 1) discovery of a good set of tests, and 2) learning to make accurate predictions for those tests. In this paper, we present detailed empirical results using a gradientbased algorithm for addressing the second problem. Our results demonstrate several sample systems in which the algorithm learns to make correct predictions and several situations in which the algorithm is less successful. Our analysis reveals challenges that will need to be addressed in future PSR learning algorithms.