Results 1 - 10
of
48
Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback
- Proc. 18th International Joint Conf. on Artificial Intelligence
, 2003
"... Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introdu ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introduce and analyze three new HS/DP algorithms.
Exploiting First-Order Regression in Inductive Policy Selection
- Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’04
, 2004
"... We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function usi ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decisiontheoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver’s attention on concepts that are specifically relevant to the optimal value function for the domain considered. 1
Decision-Theoretic Military Operations Planning
, 2004
"... Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different cos ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different costs. The military domain is particularly suited to automated methods because hundreds of tasks, specified by many planning staff, need to be quickly and robustly coordinated. The authors
mGPT: A probabilistic planner based on heuristic search
- Journal of Artificial Intelligence Research
, 2005
"... We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from det ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations of the mdp where alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms, on the other hand, use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state with the greedy policy.
Prottle: A probabilistic temporal planner
- In AAAI’05
, 2005
"... Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty. We present a general framework for probabilistic temporal planning in which effects, the time at which they occur, and action durations are all probabilistic. This framework includes a search space that is designed for solving probabilistic temporal planning problems via heuristic search, an algorithm that has been tailored to work with it, and an effective heuristic based on an extension of the planning graph data structure. Prottle is a planner that implements this framework, and can solve problems expressed in an extension of PDDL.
Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs
- In Proceedings of ICAPS’06
, 2006
"... Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general dynamic programming formulation with the power of heuristic-search techniques for developing an algorithmic framework, that we call Learning in Depth-First Search, that aims to be both general and effective. The basic LDFS algorithm searches for solutions by combining iterative, bounded depth-first searches, with learning in the sense of Korf’s LRTA * and Barto’s et al. RTDP. In each iteration, if there is a solution with cost not exceeding a lower bound, then the solution is found, else the process restarts with the lower bound and the value function updated. LDFS reduces to IDA * with Transposition Tables over deterministic models, but solves also non-deterministic, probabilistic, and game tree models, over which a slight variation reduces to the stateof-the-art MTD algorithm. Over Max AND/OR graphs, on the other hand, LDFS is a new algorithm which appears to be quite competitive with AO*.
Solving Concurrent Markov Decision Processes
, 2004
"... Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, a ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, samplingbased algorithm, which produces close-to-optimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.
The Joy of Forgetting: Faster Anytime Search via Restarting
"... {jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
{jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we first show that this low-h bias can cause poor performance if the heuristic is inaccurate. Building on this observation, we then present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains. We show that it is particularly useful for hard optimisation problems like planning where heuristics may be quite inaccurate and inadmissible, and where the greedy solution makes early mistakes.
ReTrASE: Integrating Paradigms for Approximate Probabilistic Planning
"... Past approaches for solving MDPs have several weaknesses: 1) Decision-theoretic computation over the state space can yield optimal results but scales poorly. 2) Value-function approximation typically requires human-specified basis functions and has not been shown successful on nominal (“discrete”) d ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
Past approaches for solving MDPs have several weaknesses: 1) Decision-theoretic computation over the state space can yield optimal results but scales poorly. 2) Value-function approximation typically requires human-specified basis functions and has not been shown successful on nominal (“discrete”) domains such as those in the ICAPS planning competitions. 3) Replanning by applying a classical planner to a determinized domain model can generate approximate policies for very large problems but has trouble handling probabilistic subtlety [Little and Thiebaux, 2007]. This paper presents RETRASE, a novel MDP solver, which combines decision theory, function approximation and classical planning in a new way. RETRASE uses classical planning to create basis functions for value-function approximation and applies expected-utility analysis to this compact space. Our algorithm is memory-efficient and fast (due to its compact, approximate representation), returns high-quality solutions (due to the decisiontheoretic framework) and does not require additional knowledge from domain engineers (since we apply classical planning to automatically construct the basis functions). Experiments demonstrate that RETRASE outperforms winners from the past three probabilistic-planning competitions on many hard problems.
Properties of Planning with Non-Markovian Rewards
- Journal of Artificial Intelligence Research
, 2006
"... We examine technologies designed to solve decision processes with non-Markovian rewards (NMRDPs). More specifically, target decision processes exhibit Markovian dynamics, called grounded dynamics, and desirable behaviours are modelled as state trajectories specified in a temporal logic. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We examine technologies designed to solve decision processes with non-Markovian rewards (NMRDPs). More specifically, target decision processes exhibit Markovian dynamics, called grounded dynamics, and desirable behaviours are modelled as state trajectories specified in a temporal logic.

