Results 1  10
of
10
Deadlineaware search using online measures of behavior
 In SOCS
, 2011
"... In many applications of heuristic search, insufficient time is available to find provably optimal solutions. We consider the contract search problem: finding the best solution possible within a given time limit. The conventional approach to this problem is to use an interruptible anytime algorithm. ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
In many applications of heuristic search, insufficient time is available to find provably optimal solutions. We consider the contract search problem: finding the best solution possible within a given time limit. The conventional approach to this problem is to use an interruptible anytime algorithm. Such algorithms return a sequence of improving solutions until interuppted and do not consider the approaching deadline during the course of the search. We propose a new approach, Deadline Aware Search, that explicitly takes the deadline into account and attempts to use all available time to find a single highquality solution. This algorithm is simple and fully general: it modifies bestfirst search with online pruning. Empirical results on variants of gridworld navigation, the sliding tile puzzle, and dynamic robot navigation show that our method can surpass the leading anytime algorithms across a wide variety of deadlines.
Action selection for MDPs: Anytime AO* vs. UCT
 In AAAI
"... In the presence of nonadmissible heuristics, A * and other bestfirst algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for bestfirst algorithms over AND/OR graphs, t ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
In the presence of nonadmissible heuristics, A * and other bestfirst algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for bestfirst algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO * must thus address an explorationexploitation tradeoff: they cannot just ”exploit”, they must keep exploring as well. In this work, we develop one such variant of AO * and apply it to finitehorizon MDPs. This Anytime AO * algorithm eventually delivers an optimal policy while using nonadmissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO * for action selection over large infinitehorizon MDPs that cannot be solved with existing offline heuristic search and dynamic programming algorithms, and compare it with UCT.
ANA*: Anytime Nonparametric A*
"... Anytime variants of Dijkstra’s and A * shortest path algorithms quickly produce a suboptimal solution and then improve it over time. For example, ARA * introduces a weighting value (ε) to rapidly find an initial suboptimal path and then reduces ε to improve path quality over time. In ARA*, ε is base ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Anytime variants of Dijkstra’s and A * shortest path algorithms quickly produce a suboptimal solution and then improve it over time. For example, ARA * introduces a weighting value (ε) to rapidly find an initial suboptimal path and then reduces ε to improve path quality over time. In ARA*, ε is based on a linear trajectory with adhoc parameters chosen by each user. We propose a new Anytime A * algorithm, Anytime Nonparametric A * (ANA*), that does not require adhoc parameters, and adaptively reduces ε to expand the most promising node per iteration, adapting the greediness of the search as path quality improves. We prove that each node expanded by ANA * provides an upper bound on the suboptimality of the currentbest solution. We evaluate the performance of ANA * with experiments in the domains of robot motion planning, gridworld planning, and multiple sequence alignment. The results suggest that ANA * is as efficient as ARA * and in most cases: (1) ANA * finds an initial solution faster, (2) ANA * spends less time between solution improvements, (3) ANA * decreases the suboptimality bound of the currentbest solution more gradually, and (4) ANA * finds the optimal solution faster. ANA * is freely available from Maxim Likhachev’s Searchbased Planning Library (SBPL).
Action Selection for MDPs: Anytime AO * Versus UCT
 PROCEEDINGS OF THE TWENTYSIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2012
"... In the presence of nonadmissible heuristics, A* and other bestfirst algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for bestfirst algorithms over AND/OR graphs, th ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In the presence of nonadmissible heuristics, A* and other bestfirst algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for bestfirst algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO * must thus address an explorationexploitation tradeoff: they cannot just ”exploit”, they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finitehorizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using nonadmissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinitehorizon MDPs that cannot be solved with existing offline heuristic search and dynamic programming algorithms, and compare it with UCT.
Heuristic Search When Time Matters
"... eaburns at cs.unh.edu ruml at cs.unh.edu minh.b.do at nasa.gov In many applications of shortestpath algorithms, it is impractical to find a provably optimal solution; one can only hope to achieve an appropriate balance between search time and solution cost that respects the user’s preferences. Pref ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
eaburns at cs.unh.edu ruml at cs.unh.edu minh.b.do at nasa.gov In many applications of shortestpath algorithms, it is impractical to find a provably optimal solution; one can only hope to achieve an appropriate balance between search time and solution cost that respects the user’s preferences. Preferences come in many forms; we consider utility functions that linearly tradeoff search time and solution cost. Many natural utility functions can be expressed in this form. For example, when solution cost represents the makespan of a plan, equally weighting search time and plan makespan minimizes the time from the arrival of a goal until it is achieved. Current stateoftheart approaches to optimizing utility functions rely on anytime algorithms, and the use of extensive training data to compute a termination policy. We propose a more direct approach, called Bugsy, that incorporates the utility function directly into the search, obviating the need for a separate termination policy. We describe a new method based on offline parameter tuning and a novel benchmark domain for planning under time pressure based on platformstyle video games. We then present what we believe to be the first empirical study of applying anytime monitoring to heuristic search, and we compare it with our proposals. Our results suggest that the parameter tuning technique can give the bestperformanceifarepresentativesetoftraininginstancesisavailable. Ifnot,thenBugsy is the algorithm of choice, as it performs well and does not require any offline training. This work extends the tradition of research on metareasoning for search by illustrating the benefits of embedding lightweight reasoning about time into the search algorithm itself. 1.
Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence Bounded Suboptimal Search: A Direct Approach Using Inadmissible Estimates
"... Bounded suboptimal search algorithms offer shorter solving times by sacrificing optimality and instead guaranteeing solution costs within a desired factor of optimal. Typically these algorithms use a single admissible heuristic both for guiding search and bounding solution cost. In this paper, we pr ..."
Abstract
 Add to MetaCart
Bounded suboptimal search algorithms offer shorter solving times by sacrificing optimality and instead guaranteeing solution costs within a desired factor of optimal. Typically these algorithms use a single admissible heuristic both for guiding search and bounding solution cost. In this paper, we present a new approach to bounded suboptimal search, Explicit Estimation Search, that separates these roles, consulting potentially inadmissible information to determine search order and using admissible information to guarantee the cost bound. Unlike previous proposals, it successfully combines estimates of solution length and solution cost to predict which node will lead most quickly to a solution within the suboptimality bound. An empirical evaluation across six diverse benchmark domains shows that Explicit Estimation Search is competitive with the previous state of the art in domains with unitcost actions and substantially outperforms previously proposed techniques for domains in which solution cost and length can differ. 1
HEURISTIC SEARCH UNDER A DEADLINE BY
"... I would like to sincerely thank Professor Wheeler Ruml for all of his patience, teaching, and help throughout all of my work from my undergraduate research up to this thesis. I would also like to thank Jordan Thayer for his thoughtful ideas and his effort in assisting with much of this research. Som ..."
Abstract
 Add to MetaCart
(Show Context)
I would like to sincerely thank Professor Wheeler Ruml for all of his patience, teaching, and help throughout all of my work from my undergraduate research up to this thesis. I would also like to thank Jordan Thayer for his thoughtful ideas and his effort in assisting with much of this research. Some parts of this thesis are taken from our combined work [15]. We gratefully acknowledge support from NSF (grant IIS0812141) and the DARPA CSSG program (grant N10AP20029). I would also like to thank my wife for her support and for not divorcing me while I spent all my free time working on schoolwork. iii CONTENTS ACKNOWLEDGMENTS............................... LIST OF FIGURES.................................. ABSTRACT...................................... iii
Classifications of Different Trends
"... contains only the start state and CLOSED is empty. At every iteration, the algorithm chooses the state in OPEN that minimizes the cost function f(n) = g(n)+W ·h(n), where g(n) is the cost of the lowest cost path found so far, from the start state to n, h(n) is an admissible heuristic estimate of t ..."
Abstract
 Add to MetaCart
(Show Context)
contains only the start state and CLOSED is empty. At every iteration, the algorithm chooses the state in OPEN that minimizes the cost function f(n) = g(n)+W ·h(n), where g(n) is the cost of the lowest cost path found so far, from the start state to n, h(n) is an admissible heuristic estimate of the cost of reaching the goal from n, and W is a parameter. It is wellknown that A ∗ with a consistent heuristic (and thus the fcost is monotonically increasing) expands a state only after the lowest cost path to it was found. This is not the case for nonmonotonic ffunctions such as the one used by WA∗. In this case, a state n may be generated with a smaller gvalue, after it has been already expanded. At this point, it can be removed from CLOSED and put back in OPEN – an action called reopening. Reopening is optional; for every closed state which is seen with a smaller gvalue, a decision needs to be made whether to reopen it or not. In the case of not reopening, the newly seen state is not placed back in OPEN but remains in CLOSED. However, it is a common practice to update its gvalue as well as its parent pointer. In this paper we focus on the two extreme reopening polices: always reopen (AR) and never reopen (NR). Related work The notion of reopening has been first introduced by (Pohl 1970) and discussed in a variety of papers there