Results 11  20
of
111
Value Functions for RLBased Behavior Transfer: A Comparative Study
 In Proceedings of the 20th National Conference on Artificial Intelligence
, 2005
"... Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit so ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
(Show Context)
Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCupsoccer keepaway domain.
RTS games and realtime AI research
 In Proceedings of the Behavior Representation in Modeling and Simulation Conference (BRIMS
, 2004
"... This article 1 motivates AI research in the area of real–time strategy (RTS) games and describes the current status of the ORTS project whose goals are to implement an RTS game programming environment and to build AI systems that eventually can outperform human experts in this popular and challengin ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
This article 1 motivates AI research in the area of real–time strategy (RTS) games and describes the current status of the ORTS project whose goals are to implement an RTS game programming environment and to build AI systems that eventually can outperform human experts in this popular and challenging domain. Keywords: Real–time AI, simulation, multi–player games 1
Concurrent hierarchical reinforcement learning
, 2005
"... We describe a language for partially specifying policies in domains consisting of multiple subagents working together to maximize a common reward function. The language extends ALisp with constructs for concurrency and dynamic assignment of subagents to tasks. During learning, the subagents learn a ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
We describe a language for partially specifying policies in domains consisting of multiple subagents working together to maximize a common reward function. The language extends ALisp with constructs for concurrency and dynamic assignment of subagents to tasks. During learning, the subagents learn a distributed representation of the Qfunction for this partial policy. They then coordinate at runtime to find the best joint action at each step. We give examples showing that programs in this language are natural and concise. We also describe online and batch learning algorithms for learning a linear approximation to the Qfunction, which make use of the coordination structure of the problem.
Improving action selection in MDP’s via knowledge transfer
 In Proceedings of the Twentieth National Conference on Artificial Intelligence
, 2005
"... Temporaldifference reinforcement learning (RL) has been successfully applied in several domains with large state sets. Large action sets, however, have received considerably less attention. This paper demonstrates the use of knowledge transfer between related tasks to accelerate learning with large ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Temporaldifference reinforcement learning (RL) has been successfully applied in several domains with large state sets. Large action sets, however, have received considerably less attention. This paper demonstrates the use of knowledge transfer between related tasks to accelerate learning with large action sets. We introduce action transfer, a technique that extracts the actions from the (near)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal actions make up a small fraction of the domain’s action set, action transfer can substantially reduce the number of actions and thus the complexity of the problem. However, action transfer between dissimilar tasks can be detrimental. To address this difficulty, we contribute randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative source tasks. We motivate RTP action transfer with a detailed theoretical analysis featuring a formalism of related tasks and a bound on the suboptimality of action transfer. The empirical results in this paper show the potential of RTP action transfer to substantially expand the applicability of RL to problems with large action sets.
Solving factored MDPs with hybrid state and action variables
 J. Artif. Intell. Res. (JAIR
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model tha ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scaleup potential on several hybrid optimization problems. 1.
Practical linear valueapproximation techniques for firstorder MDPs
 Proc. of the Conference on Uncertainty in Artificial Intelligence
, 2006
"... Recent work on approximate linear programming (ALP) techniques for firstorder Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of firstorder basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantag ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Recent work on approximate linear programming (ALP) techniques for firstorder Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of firstorder basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the firstorder value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the firstorder ALP framework to approximate policy iteration and if so, how do these two algorithms compare? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on problems from the ICAPS 2004 Probabilistic Planning Competition. 1
First order decision diagrams for relational MDPs
 In Proceedings of the International Joint Conference of Artificial Intelligence
, 2007
"... Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational struc ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy. 1.
Practical solution techniques for firstorder mdps
 Artificial Intelligence
"... Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a firstorder MDP (FOMDP) specification and then derives a domainindependent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proofofconcept results of our firstorder approximate linear programming (FOALP) planner on problems from the probabilistic track
Temporalrelational classifiers for prediction in evolving domains
 In Proceedings of the IEEE International Conference on Data Mining
, 2008
"... Many relational domains contain temporal information and dynamics that are important to model (e.g., social networks, protein networks). However, past work in relational learning has focused primarily on modeling static “snapshots” of the data and has largely ignored the temporal dimension of these ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
Many relational domains contain temporal information and dynamics that are important to model (e.g., social networks, protein networks). However, past work in relational learning has focused primarily on modeling static “snapshots” of the data and has largely ignored the temporal dimension of these data. In this work, we extend relational techniques to temporallyevolving domains and outline a representational framework that is capable of modeling both temporal and relational dependencies in the data. We develop efficient learning and inference techniques within the framework by considering a restricted set of temporalrelational dependencies and using parametertying methods to generalize across relationships and entities. More specifically, we model dynamic relational data with a twophase process, first summarizing the temporalrelational information with kernel smoothing, and then moderating attribute dependencies with the summarized relational information. We develop a number of novel temporalrelational models using the framework and then show that the current approaches to modeling static relational data are special cases within the framework. We compare the new models to the competing static relational methods on three realworld datasets and show that the temporalrelational models consistently outperform the relational models that ignore temporal information—achieving significant reductions in error ranging from 15 % to 70%. 1
ReTrASE: Integrating Paradigms for Approximate Probabilistic Planning
"... Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) d ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) domains such as those in the ICAPS planning competitions. 3) Replanning by applying a classical planner to a determinized domain model can generate approximate policies for very large problems but has trouble handling probabilistic subtlety [Little and Thiebaux, 2007]. This paper presents RETRASE, a novel MDP solver, which combines decision theory, function approximation and classical planning in a new way. RETRASE uses classical planning to create basis functions for valuefunction approximation and applies expectedutility analysis to this compact space. Our algorithm is memoryefficient and fast (due to its compact, approximate representation), returns highquality solutions (due to the decisiontheoretic framework) and does not require additional knowledge from domain engineers (since we apply classical planning to automatically construct the basis functions). Experiments demonstrate that RETRASE outperforms winners from the past three probabilisticplanning competitions on many hard problems.