Results 1  10
of
117
Approximate Policy Iteration with a Policy Language Bias
 Journal of Artificial Intelligence Research
, 2003
"... We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. ..."
Abstract

Cited by 140 (18 self)
 Add to MetaCart
(Show Context)
We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Valuedirected Compression of POMDPs
 In NIPS 15
, 2002
"... We examine the problem of generating statespace compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not af ..."
Abstract

Cited by 72 (4 self)
 Add to MetaCart
(Show Context)
We examine the problem of generating statespace compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality.
Towards a Unified Theory of State Abstraction for MDPs
 In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics
, 2006
"... extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A nu ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
(Show Context)
extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A number of abstractions have been proposed and studied in the reinforcementlearning and planning literatures, and positive and negative results are known. We provide a unified treatment of state abstraction for Markov decision processes. We study five particular abstraction schemes, some of which have been proposed in the past in di#erent forms, and analyze their usability for planning and learning.
Grasping POMDPs
 in Proc. IEEE Int. Conf. on Robotics and Automation (ICRA
, 2007
"... Abstract — We provide a method for planning under uncertainty for robotic manipulation by partitioning the configuration space into a set of regions that are closed under compliant motions. These regions can be treated as states in a partially observable Markov decision process (POMDP), which can be ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
(Show Context)
Abstract — We provide a method for planning under uncertainty for robotic manipulation by partitioning the configuration space into a set of regions that are closed under compliant motions. These regions can be treated as states in a partially observable Markov decision process (POMDP), which can be solved to yield optimal control policies under uncertainty. We demonstrate the approach on simple grasping problems, showing that it can construct highly robust, efficiently executable solutions. I.
SMDP Homomorphisms: An Algebraic Approach to Abstraction in SemiMarkov Decision Processes
, 2003
"... To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions.
Relational reinforcement learning: An overview
 IN: PROCEEDINGS OF THE ICML’04 WORKSHOP ON RELATIONAL REINFORCEMENT LEARNING
, 2004
"... Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead.
Feature reinforcement learning: Part I. Unstructured MDPs
 Journal of General Artificial Intelligence
, 2009
"... www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
(Show Context)
www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies