Results 1  10
of
99
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 212 (8 self)
 Add to MetaCart
(Show Context)
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Mapping Abstract Complex Workflows onto Grid Environments
"... In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator ..."
Abstract

Cited by 200 (18 self)
 Add to MetaCart
In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator CWG) maps an abstract workflow defined in terms of applicationlevel components to the set of available Grid resources. The second generator (Abstract and Concrete Workflow Generator, ACWG) takes a wider perspective and not only performs the abstract to concrete mapping but also enables the construction of the abstract workflow based on the available components. This system operates in the application domain and chooses application components based on the application metadata attributes. We describe our current ACWG based on AI planning technologies and outline how these technologies can play a crucial role in developing complex application workflows in Grid environments. Although our work is preliminary, CWG has already been used to map high energy physics applications onto the Grid. In one particular experiment, a set of production runs lasted 7 days and resulted in the generation of 167,500 events by 678 jobs. Additionally, ACWG was used to map gravitational physics workflows, with hundreds of nodes onto the available resources, resulting in 975 tasks, 1365 data transfers and 975 output files produced.
On the complexity of solving Markov decision problems
 IN PROC. OF THE ELEVENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1995
"... Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argu ..."
Abstract

Cited by 159 (12 self)
 Add to MetaCart
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.
Computing optimal policies for partially observable decision processes using compact representations
 In Proceedings of the Thirteenth National Conference on Artificial Intelligence
, 1996
"... Abstract: Partiallyobservable Markov decision processes provide a very general model for decisiontheoretic planning problems, allowing the tradeoffs between various courses of actions to be determined under conditions of uncertainty, and incorporating partial observations made by an agent. Dynami ..."
Abstract

Cited by 130 (15 self)
 Add to MetaCart
(Show Context)
Abstract: Partiallyobservable Markov decision processes provide a very general model for decisiontheoretic planning problems, allowing the tradeoffs between various courses of actions to be determined under conditions of uncertainty, and incorporating partial observations made by an agent. Dynamic programming algorithms based on the information or belief state of an agent can be used to construct optimal policies without explicit consideration of past history, but at high computational cost. In this paper, we discuss how structured representations of the system dynamics can be incorporated in classic POMDP solution algorithms. We use Bayesian networks with structured conditional probability matrices to represent POMDPs, and use this representation to structure the belief space for POMDP algorithms. This allows irrelevant distinctions to be ignored. Apart from speeding up optimal policy construction, we suggest that such representations can be exploited to great extent in the development of useful approximation methods. We also briefly discuss the difference in perspective adopted by influence diagram solution methods vis à vis POMDP techniques.
Labeled RTDP: Improving the convergence of realtime dynamic programming
 In ICAPS’03, 12–21
"... RTDP is a recent heuristicsearch DP algorithm for solving nondeterministic planning problems with full observability. In relation to other dynamic programming methods, RTDP has two benefits: first, it does not have to evaluate the entire state space in order to deliver an optimal policy, and secon ..."
Abstract

Cited by 130 (10 self)
 Add to MetaCart
RTDP is a recent heuristicsearch DP algorithm for solving nondeterministic planning problems with full observability. In relation to other dynamic programming methods, RTDP has two benefits: first, it does not have to evaluate the entire state space in order to deliver an optimal policy, and second, it can often deliver good policies pretty fast. On the other hand, RTDP final convergence is slow. In this paper we introduce a labeling scheme into RTDP that speeds up its convergence while retaining its good anytime behavior. The idea is to label a state s as solved when the heuristic values, and thus, the greedy policy defined by them, have converged over s and the states that can be reached from s with the greedy policy. While due to the presence of cycles, these labels cannot be computed in a recursive, bottomup fashion in general, we show nonetheless that they can be computed quite fast, and that the overhead is compensated by the recomputations avoided. In addition, when the labeling procedure cannot label a state as solved, it improves the heuristic value of a relevant state. This results in the number of Labeled RTDP trials needed for convergence, unlike the number of RTDP trials, to be bounded. From a practical point of view, Labeled RTDP (LRTDP) converges orders of magnitude faster than RTDP, and faster also than another recent heuristicsearch DP algorithm, LAO*. Moreover, LRTDP often converges faster than value iteration, even with the heuristic h =0, thus suggesting that LRTDP has a quite general scope.
Recent Advances in AI Planning
 AI MAGAZINE
, 1999
"... The past five years have seen dramatic advances in planning algorithms, with an emphasis on propositional methods such as Graphplan and compilers that convert planning problems into propositional CNF formulae for solution via systematic or stochastic SAT methods. Related work on the Deep Space O ..."
Abstract

Cited by 127 (0 self)
 Add to MetaCart
The past five years have seen dramatic advances in planning algorithms, with an emphasis on propositional methods such as Graphplan and compilers that convert planning problems into propositional CNF formulae for solution via systematic or stochastic SAT methods. Related work on the Deep Space One spacecraft control algorithms advances our understanding of interleaved planning and execution. In this survey,we explain the latest techniques and suggest areas for future research.
Model Minimization in Markov Decision Processes
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being ..."
Abstract

Cited by 121 (8 self)
 Add to MetaCart
We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially ...
Planning, learning and coordination in multiagent decision processes
 In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96
, 1996
"... There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniq ..."
Abstract

Cited by 120 (1 self)
 Add to MetaCart
(Show Context)
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decisiontheoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special nperson cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation. 1
Probabilistic Propositional Planning: Representations and Complexity
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... Many representations for probabilistic propositional planning problems have been studied. This paper reviews several such representations and shows that, in spite of superficial differences between the representations, they are "expressively equivalent," meaning that planning problems ..."
Abstract

Cited by 88 (11 self)
 Add to MetaCart
Many representations for probabilistic propositional planning problems have been studied. This paper reviews several such representations and shows that, in spite of superficial differences between the representations, they are "expressively equivalent," meaning that planning problems specified in one representation can be converted to equivalent planning problems in any of the other representations with at most a polynomial increase in the resulting representation and the number of steps needed to reach the goal with sufficient probability. The paper proves that the computational complexity of determining whether a successful plan exists for planning problems expressed in any of these representations is EXPTIMEcomplete and PSPACEcomplete when plans are restricted to take a polynomial number of steps. Introduction In recent years, there has been an interest in solving planning problems that contain some degree of uncertainty. One form that this uncertainty has taken ...
BoundedParameter Markov Decision Processes
 Artificial Intelligence
, 2000
"... Abstract In this paper, we introduce the notion of a boundedparameter Markov decision process (BMDP)as a generalization of the familiar exact MDP. A boundedparameter MDP is a set of exact MDPs ..."
Abstract

Cited by 88 (0 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we introduce the notion of a boundedparameter Markov decision process (BMDP)as a generalization of the familiar exact MDP. A boundedparameter MDP is a set of exact MDPs