Results 1  10
of
18
Automatic basis function construction for approximate dynamic programming and reinforcement learning
 In Cohen and Moore (2006
, 2006
"... We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. ..."
Abstract

Cited by 75 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a highdimensional state space to a lowdimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lowerdimensional space. These are added as new features for the linear function approximator. This approach is applied to a highdimensional inventory control problem. 1.
Solving factored MDPs with hybrid state and action variables
 J. Artif. Intell. Res. (JAIR
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model tha ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scaleup potential on several hybrid optimization problems. 1.
Practical solution techniques for firstorder mdps
 Artificial Intelligence
"... Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a firstorder MDP (FOMDP) specification and then derives a domainindependent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proofofconcept results of our firstorder approximate linear programming (FOALP) planner on problems from the probabilistic track
Adaptive Tile Coding for Value Function Approximation
"... Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many realworld problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depe ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many realworld problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents adaptive tile coding, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
 AUTON AGENT MULTIAGENT SYST
, 2009
"... ..."
Solving factored MDPs with exponentialfamily transition models
 In Proceedings of the 16th International Conference on Automated Planning and Scheduling (ICAPS
, 2006
"... Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a linear combination of basis functions and optimize it b ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a linear combination of basis functions and optimize it by linear programming. In this paper, we extend the existing HALP paradigm beyond the mixture of beta transition model. As a result, we permit modeling of other transition functions, such as normal and gamma densities, without approximating them. To allow for efficient solutions to the expectation terms in HALP, we identify a rich class of conjugate basis functions. Finally, we demonstrate the generalized HALP framework on a rover planning problem, which exhibits continuous time and resource uncertainty.
Learning Basis Functions in Hybrid Domains
 In Proceedings of the 2006 National Conference on Artificial Intelligence (AAAI
, 2006
"... Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a set of basis functions and optimize their weights by li ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming. The quality of this approximation naturally depends on its basis functions. However, basis functions leading to good approximations are rarely known in advance. In this paper, we propose a new approach that discovers these functions automatically. The method relies on a class of parametric basis function models, which are optimized using the dual formulation of a relaxed HALP. We demonstrate the performance of our method on two hybrid optimization problems and compare it to manually selected basis functions.
Empirical Studies in Action Selection with Reinforcement Learning
 ADAPTIVE BEHAVIOR 2007; 15; 33
, 2007
"... ..."
Fuzzy Partition Optimization for Approximate Fuzzy Qiteration
, 2008
"... Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorithms for approximate RL rely on basisfunction representations of the value function ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorithms for approximate RL rely on basisfunction representations of the value function (or of the Qfunction). Designing a good set of basis functions without any prior knowledge of the value function (or of the Qfunction) can be a difficult task. In this paper, we propose instead a technique to optimize the shape of a constant number of basis functions for the approximate, fuzzy Qiteration algorithm. In contrast to other approaches to adapt basis functions for RL, our optimization criterion measures the actual performance of the computed policies in the task, using simulation from a representative set of initial states. A complete algorithm, using crossentropy optimization of triangular fuzzy membership functions, is given and applied to the caronthehill example.