Results 1  10
of
15
Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width
"... Conformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a pathfinding problem in belief space where good belief representations and heuristics are critical for ..."
Abstract

Cited by 51 (16 self)
 Add to MetaCart
(Show Context)
Conformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a pathfinding problem in belief space where good belief representations and heuristics are critical for scaling up. In this work, a different formulation is introduced for conformant problems with deterministic actions where they are automatically converted into classical ones and solved by an offtheshelf classical planner. The translation maps literals L and sets of assumptions t about the initial situation, into new literals KL/t that represent that L must be true if t is initially true. We lay out a general translation scheme that is sound and establish the conditions under which the translation is also complete. We show that the complexity of the complete translation is exponential in a parameter of the problem called the conformant width, which for most benchmarks is bounded. The planner based on this translation exhibits good performance in comparison with existing planners, and is the basis for T0, the best performing planner in the Conformant Track of the 2006 International Planning Competition. 1.
SixthSense: Fast and reliable recognition of dead ends in MDPs
 In submission
, 2010
"... The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are pa ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
(Show Context)
The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are particularly challenging; existing MDP solvers spend much time and memory identifying these states. As a first attempt to address this issue, we propose a machine learning algorithm called SIXTHSENSE. SIXTHSENSE helps existing MDP solvers by finding nogoods, conjunctions of literals whose truth in a state implies that the state is a dead end. Importantly, our learned nogoods are sound, and hence the states they identify are true dead ends. SIXTHSENSE is very fast, needs little training data, and takes only a small fraction of total planning time. While IPPC problems may have millions of dead ends, they may typically be represented with only a dozen or two nogoods. Thus, nogood learning efficiently produces a quick and reliable means for deadend recognition. Our experiments show that the nogoods found by SIXTHSENSE routinely reduce planning space and time on IPPC domains, enabling some planners to solve problems they could not previously handle.
Heuristic search for generalized stochastic shortest path MDPs
 In ICAPS’11
, 2011
"... Research in efficient methods for solving infinitehorizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V ∗ that is the unique solution of Bellman equation and 2) optimal poli ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
Research in efficient methods for solving infinitehorizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V ∗ that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V ∗. This paper’s main contribution is the description of a new class of MDPs, that have welldefined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positivebounded, negative, and discountedreward models. While existing efficient heuristic search algorithms like LAO ∗ and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristicsearchbased family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.
Simple and fast strong cyclic planning for fullyobservable nondeterministic planning problems
 in Proceedings of the TwentySecond international joint conference on Artificial Intelligence  Volume Three
, 2011
"... We address a difficult, yet underinvestigated class of planning problems: fullyobservable nondeterministic (FOND) planning problems with strong cyclic solutions. The difficulty of these strong cyclic FOND planning problems stems from the large size of the state space. Hence, to achieve efficient p ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We address a difficult, yet underinvestigated class of planning problems: fullyobservable nondeterministic (FOND) planning problems with strong cyclic solutions. The difficulty of these strong cyclic FOND planning problems stems from the large size of the state space. Hence, to achieve efficient planning, a planner has to cope with the explosion in the size of the state space by planning along the directions that allow the goal to be reached quickly. A major challenge is: how would one know which states and search directions are relevant before the search for a solution has even begun? We first describe an NDPmotivated strong cyclic algorithm that, without addressing the above challenge, can already outperform stateoftheart FOND planners, and then extend this NDPmotivated planner with a novel heuristic that addresses the challenge. 1
A Theory of GoalOriented MDPs with Dead Ends
"... Stochastic Shortest Path (SSP) MDPs is a problem class widely studied in AI, especially in probabilistic planning. They describe a wide range of scenarios but make the restrictive assumption that the goal is reachable from any state, i.e., that deadend states do not exist.Because of this, SSPs are ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Stochastic Shortest Path (SSP) MDPs is a problem class widely studied in AI, especially in probabilistic planning. They describe a wide range of scenarios but make the restrictive assumption that the goal is reachable from any state, i.e., that deadend states do not exist.Because of this, SSPs are unable to model various scenarios that may have catastrophic events (e.g., an airplane possibly crashing if it flies into a storm). Even though MDP algorithms have been used for solving problems with dead ends, a principled theory of SSP extensions that would allow dead ends, including theoretically sound algorithms for solving such MDPs, has been lacking. In this paper, we propose three new MDP classes that admit dead ends under increasingly weaker assumptions. We present Value Iterationbased as well as the more efficient heuristic search algorithms for optimally solving each class, and explore theoretical relationships between these classes. We also conduct a preliminary empirical study comparing the performance of our algorithms on different MDP classes, especially on scenarios with unavoidable dead ends. 1
Reverse iterative deepening for finitehorizon MDPs with large branching factors
 In ICAPS’12
, 2012
"... In contrast to previous competitions, where the problems were goalbased, the 2011 International Probabilistic Planning Competition (IPPC2011) emphasized finitehorizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented chal ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
In contrast to previous competitions, where the problems were goalbased, the 2011 International Probabilistic Planning Competition (IPPC2011) emphasized finitehorizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous stateoftheart planners (e.g., those from IPPC2008), which were primarily based on domain determinization — a technique more suited to goaloriented MDPs with small branching factors. Moreover, large branching factors render the existing implementations of RTDP and LAO ∗style algorithms inefficient as well. In this paper we present GLUTTON, our planner at IPPC2011 that performed well on these challenging MDPs. The main algorithm used by GLUTTON is LR 2 TDP, an LRTDPbased optimal algorithm for finitehorizon problems centered around the novel idea of reverse iterative deepening. We detail LR 2 TDP itself as well as a series of optimizations included in GLUTTON that help LR 2 TDP achieve competitive performance on difficult problems with large branching factors — subsampling the transition function, separating out natural dynamics, caching transition function samples, and others. Experiments show that GLUTTON and PROST, the IPPC2011 winner, have complementary strengths, with GLUTTON demonstrating superior performance on problems with few highreward terminal states.
Stochastic enforced hillclimbing
, 2008
"... Abstract Enforced hillclimbing is an effective deterministic hillclimbing technique that deals with local optima using breadthfirst search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hillclimbing for online use in goaloriented ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract Enforced hillclimbing is an effective deterministic hillclimbing technique that deals with local optima using breadthfirst search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hillclimbing for online use in goaloriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadthfirst search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a heuristicbased Markov decision process (MDP) model of the basin in order to find a good escape policy exiting the local optimum. We note that building this model involves integrating the heuristic into the MDP problem because the local goal is to improve the heuristic. We evaluate our proposal in twentyfour recent probabilistic planningcompetition benchmark domains and twelve probabilistically interesting problems from recent literature. For evaluation, we show that stochastic enforced hillclimbing (SEH) produces better policies than greedy heuristic following for value/cost functions derived in two very different ways: one type derived by using deterministic heuristics on a deterministic relaxation and a second type derived by automatic learning of Bellmanerror features from domainspecific experience. Using the first type of heuristic, SEH is shown to generally outperform all planners from the first three international probabilistic planning competitions.
Discovering Hidden Structure in Factored MDPs
"... Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today’s solution techniques scale poorly, limiting MDPs ’ practical applicability. In this work, we propose algorithms that automatically dis ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today’s solution techniques scale poorly, limiting MDPs ’ practical applicability. In this work, we propose algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with less memory than stateoftheart techniques. Our algorithms discover two complementary state abstractions – basis functions and nogoods. A basis function is a conjunction of literals; if the conjunction holds true in a state, this guarantees the existence of at least one trajectory to the goal. Conversely, a nogood is a conjunction whose presence implies the nonexistence of any such trajectory, meaning the state is a dead end. We compute basis functions by regressing goal descriptions through a determinized version of the MDP. Nogoods are constructed with a novel machine learning algorithm that uses basis functions as training data. Our state abstractions can be leveraged in several ways. We describe three diverse approaches — GOTH, a heuristic function for use in heuristic search algorithms such as RTDP; RETRASE, an MDP solver that performs modified Bellman backups on basis functions instead of states; and SIXTHSENSE, a method to quickly detect deadend states. In essence, our work integrates ideas from deterministic planning and basis functionbased approximation, leading to methods that outperform existing approaches by a wide margin.
Scalable Methods and Expressive Models for Planning Under Uncertainty
, 2013
"... The ability to plan in the presence of uncertainty about the effects of one’s own actions and the events of the environment is a core skill of a truly intelligent agent. This type of sequential decisionmaking has been modeled by Markov Decision Processes (MDPs), a framework known since at least the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The ability to plan in the presence of uncertainty about the effects of one’s own actions and the events of the environment is a core skill of a truly intelligent agent. This type of sequential decisionmaking has been modeled by Markov Decision Processes (MDPs), a framework known since at least the 1950’s [45, 3]. The importance of MDPs is not merely philosophic — they have been applied to several impactful realworld scenarios, from inventory management to military operations planning [80, 1]. Nonetheless, the adoption of MDPs in practice is greatly hampered by two aspects. First, modern algorithms for solving them are still not scalable enough to handle many realisticallysized problems. Second, the MDP classes we know how to solve tend to be restrictive, often failing to model significant aspects of the planning task at hand. As a result, many probabilistic scenarios fall outside of MDPs ’ scope. The research presented in this dissertation addresses both of these challenges. Its first contribution is several highly scalable approximation algorithms for existing MDP classes that combine two major planning paradigms, dimensionality reduction and deterministic relaxation. These approaches automatically extract humanunderstandable causal structure from an MDP and use this structure to efficiently compute a good MDP policy. Besides enabling us to handle larger planning
Proceedings of the TwentyFourth AAAI Conference on Artificial Intelligence (AAAI10) SixthSense: Fast and Reliable Recognition of Dead Ends in MDPs
"... The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are pa ..."
Abstract
 Add to MetaCart
(Show Context)
The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are particularly challenging; existing MDP solvers spend much time and memory identifying these states. As a first attempt to address this issue, we propose a machine learning algorithm called SIXTHSENSE. SIXTHSENSE helps existing MDP solvers by finding nogoods, conjunctions of literals whose truth in a state implies that the state is a dead end. Importantly, our learned nogoods are sound, and hence the states they identify are true dead ends. SIXTHSENSE is very fast, needs little training data, and takes only a small fraction of total planning time. While IPPC problems may have millions of dead ends, they may typically be represented with only a dozen or two nogoods. Thus, nogood learning efficiently produces a quick and reliable means for deadend recognition. Our experiments show that the nogoods found by SIXTHSENSE routinely reduce planning space and time on IPPC domains, enabling some planners to solve problems they could not previously handle.