Results 1 -
6 of
6
Competition-Based Learning
, 1992
"... This paper summarizes recent research on competition-based learning procedures performed by the Navy Center for Applied Research in Artificial Intelligence at the Naval Research Laboratory. We have focused on a particularly interesting class of competition-based techniques called genetic algorithms. ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
This paper summarizes recent research on competition-based learning procedures performed by the Navy Center for Applied Research in Artificial Intelligence at the Naval Research Laboratory. We have focused on a particularly interesting class of competition-based techniques called genetic algorithms. Genetic algorithms are adaptive search algorithms based on principles derived from the mechanisms of biological evolution. Recent results on the analysis of the implicit parallelism of alternative selection algorithms are summarized, along with an analysis of alternative crossover operators. Applications of these results in practical learning systems for sequential decision problems and for concept classification are also presented. INTRODUCTION One approach to the design of more flexible computer systems is to extract heuristics from existing adaptive systems. We have focused on a class of learning systems that use competition-based procedures, called genetic algorithms (GAs). GAs are ba...
Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression
- In Proceedings of the 20th National Conference on Artificial Intelligence
, 2005
"... We present a novel formulation for providing advice to a reinforcement learner that employs supportvector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action ..."
Abstract
-
Cited by 37 (13 self)
- Add to MetaCart
We present a novel formulation for providing advice to a reinforcement learner that employs supportvector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action’s value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR (Pref-KBKR), the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning.
Using advice to transfer knowledge acquired in one reinforcement learning task to another
- In Proceedings of the Sixteenth European Conference on Machine Learning
, 2005
"... Abstract. We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
Abstract. We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to the new task to guide this knowledge transfer. Advice is incorporated into our problem solver using a knowledge-based support vector regression method that we previously developed. This advice-taking approach allows the problem solver to refine or even discard the transferred knowledge based on its subsequent experiences. We empirically demonstrate the effectiveness of our approach with two games from the RoboCup soccer simulator: KeepAway and BreakAway. Our results demonstrate that a problem solver learning to play BreakAway using advice extracted from KeepAway outperforms a problem solver learning without the benefit of such advice. 1
A corpus collection and annotation framework for learning multimodal clarification strategies
- in Proc. of the 6th SIGdial Workshop
, 2005
"... Current dialogue systems are fairly poor in generating the wide range of clarification strategies as found in human-human dialogue. The overall aim of this work is to learn when and how to best employ different types of clarification strategies in multimodal dialogue systems. This paper describes a ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Current dialogue systems are fairly poor in generating the wide range of clarification strategies as found in human-human dialogue. The overall aim of this work is to learn when and how to best employ different types of clarification strategies in multimodal dialogue systems. This paper describes a framework for learning multimodal clarification strategies for an in-car MP3 music player dialogue system. The framework consists of three major parts. First we collect data on multimodal clarification strategies in a wizard-of-oz study. Second we extract feature in the stateaction space to learn an initial policy from this data. Third we specify a reward function to refine that policy using extensions of existing evaluation schemes. 1
The influence of reward on the speed of reinforcement learning: An analysis of shaping
- IN PROC. 20TH INT’L CONF. ON MACHINE LEARNING
, 2003
"... Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with the property that any policy that performs well on the critical region also performs well on the MDP. We provide a simple algorithm and prove that its learning time is polynomial in the size of the critical region and, crucially, independent of the size of the MDP. This identifies low reward horizons with easy-to-learn MDPs. Shaping rewards, which encode our prior knowledge about the relative merits of decisions, can be seen as artificially reducing the MDP's natural reward horizon. We demonstrate empirically the effects of using shaping to reduce the reward horizon.
Automatic shaping and decomposition of reward functions
, 2007
"... This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we co ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we consider decomposition of the per-timestep reward in multieffector problems, in which the overall agent can be decomposed into multiple units that are concurrently carrying out various tasks. We show by example that to find a good reward decomposition, it is often necessary to first shape the rewards appropriately. We then give a function approximation algorithm for solving both problems together. Standard reinforcement learning algorithms can be augmented with our methods, and we show experimentally that in each case, significantly faster learning results.

