| Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, 1997. |
....discussion. state action space. These implicit models can be pieced together and used for planning on a subset of the global space. One natural approach in the large state space setting is aggregate state methods which group states together and assume Markov dynamics on these aggregate states [12, 13]. Clearly, this approach is useful only if a compatible set of aggregate states can be found which preserve the Markov dynamics on these aggregate states and where the size the aggregate state space is considerably smaller than that of the underlying state space. A benefit of this approach is that ....
....dynamics could be different in different parts of the state space. For instance, a skier moving down a hill has dynamics dependent on the terrain conditions, such as slope, snow type, and other factors. Incidentally, Figure 2 illustrates the reason why standard state space aggregation techniques [12] do not work here. In particular, for partitioning induced by a cover on a Euclidean spaces there exist corners where 3 (or more) sets meet. When taking actions toward this corner from within one of the sets, the distribution over the next aggregate state set is inherently unstable. The ....
T. Dean and R. Given. "Model Minimization in Markov Decision Processes". In AAAI, 1997.
....a set of sub MDPs over the x label region classes so that each sub MDP is allowed to reach only one possible exit state. We have implicitly introduced state abstraction across the region class by ignoring the y labels. This first type of state abstraction is similar to model minimisation [8] for each region. The decomposition of the value function in HEXQ was inspired by MAXQ. For HEXQ the local reward functions for sub MDPs do not include the primitive reward received on exiting the sub MDP. The HEXQ partitioning can be applied recursively resulting in a hierarchy of sub MDPs not ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In AAAI/IAAI, pages 106--111, 1997.
....eliminates state distinctions while preserving the system s dynamics. We present a definition of homomorphism that is appropriate for SMDPs. In ref. 2] we developed an MDP abstraction framework based on MDP homomorphisms. This extended the MDP minimization framework proposed by Dean and Givan [11] and enabled the accommodation of redundancies arising from symmetric equivalence of the kind illustrated in Figure 1. We then extend the notion of SMDP homomorphism to hierarchical systems. In particular, we apply homomorphisms in the options framework introduced by Sutton, Precup and Singh [4] ....
....by a suitably defined homomorphism [1] Therefore we can model symmetric equivalence as a special case of homomorphic equivalence. The notion of homomorphic equivalence immediately gives us an SMDP minimization framework. In ref. 1] we extended the minimization framework of Dean and Givan [11, 13] to include statedependent action recoding and showed that if two state action pairs have the same image under a homomorphism, then they have the same optimal value. We also showed that when # is a homomorphic image of an MDP a policy in # can induce a policy in that is closely ....
T. Dean and R. Givan. Model minimization in Markov decision processes. In Proceedings of AAAI-97, pages 106--111. AAAI, 1997.
....of interest. For example, the state transition function in a MDP is a good model of the environment if it accurately reflects the probabilistic behaviour of the environment. Additional homomorphic reductions may be possible to further simplify these models. Dean and Givan s model minimisation [3] is such a homomorphism (stochastic bisimulation homogeneity) This type of homomorphism is related to algorithms by Boutilier et al. 4] that use structure in factored MDPs represented as two stage temporal Bayes nets to effect the reduction. Additional abstraction opportunities my be available ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In AAAI/IAAI, pages 106--111, 1997.
....there are many possible ways of clustering states. Therefore the real challenge lies in grouping states in such a way that we can a) plan over clusters with minimal loss of performance, compared to planning over the entire state space, and b) significantly reduce planning time. Dean et al. [3] proposed a simple three step model minimization algorithm for standard (nonhierarchical) MDP state abstraction. To infer z(s) C j , a function mapping states s 2 S to the (expanding) set of clusters C: Step I Initialize state clustering: Let z(s i ) z(s j ) if R(s i ; a) R(s j ; a) ....
....Steps II and III are then applied iteratively, gradually splitting clusters according to salient differences in model parameters, until intra cluster differences are sufficiently small to ignore. This algorithm exhibits the following desirable properties (which we state without proof, see [3, 4] for details) 1. Planning over clusters converges to the optimal solution. 2. The algorithm can be relaxed to allow approximate ( stable) state abstraction. 3. Assuming a factorized MDP, all steps can be implemented such that we can avoid fully enumerating the state space. This last point is ....
T. Dean and R. Givan. Model minimization in Markov decision processes. In Proceedings of the 1997.
....the system s dynamics. We present a definition of homomorphisms that is appropriate for SMDPs. In earlier work [Ravindran and Barto, 2002] we developed an MDP abstraction framework based on our notion of an MDP homomorphism. This framework extended the MDP minimization framework proposed by [Dean and Givan, 1997] and enabled the accommodation of redundancies arising from symmetric equivalence of the kind illustrated in Figure 1. While we can derive reduced models with a smaller state set by applying minimization ideas, we do not necessarily simplify the description of the problem in terms of the number of ....
....2001] Therefore we can model symmetric equivalence as a special case of homomorphic equivalence. 4 Minimization The notion of homomorphic equivalence immediately gives us an SMDP minimization framework. In [Ravindran and Barto, 2001] we extended the minimization framework of Dean and Givan [Dean and Givan, 1997; Givan et al. 2003] to include state dependent action recoding and showed that if two state action pairs have the same image under a homomorphism, then they have the same optimal value. We also showed that when # is a homomorphic image of an MDP a policy in # can induce a policy in ....
T. Dean and R. Givan. Model minimization in Markov decision processes. In Proceedings of AAAI-97, pages 106--111. AAAI, 1997.
....it allows to easily de ne control parameters for obtaining di erent behaviors of the reinforcement learning technique. The main two parameters that have to be de ned are number of quantization levels, and average distortion (similarity metrics) Other approaches to this problem were proposed in [3] and [1] where bayesian networks are used. 8 Conclusions and Future Work We have shown how vector quantization and the generalized Lloyd algorithm allows us to dramatically reduce the number of states needed to represent a continuous environment. Furthermore, this technique gives us more quality ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the American Association of Arti cial Intelligence (AAAI97) . AAAI Press, 1997.
....processes (MDP) minimization framework we developed earlier [15] that allows us to abstract away redundancy in the problem definition. We then apply these ideas to hierarchical Reinforcement Learning (RL) Our framework is an extension of a MDP minimization framework developed by Dean and Givan [4, 6]. Model minimization methods attempt to abstract away redundancy in an MDP model and derive an equivalent smaller model. To illustrate model min imization, consider the simple gridworld shown in Figure l(a) The goal state is labelled G. The gridworld is symmetric about the NE SW diagonal. ....
....model minimization problem can now be stated as: find a minimal image of the given MDP . Since this can be computationally prohibitive, we frequently settle for a reasonably reduced model, even if it is not a minimal MDP. This minimization framework extends the approach proposed by Dean and Givan [4, 6]. They employ stochastic bisimulations [12] on state sets of MDPs and do not consider state action equivalence. If we restrict homomorphisms to only the state set, our approach is equivalent to theirs in terms of the reductions achieved The theoretical results established in their framework hold, ....
[Article contains additional citation context not shown here]
Thomas Dean and Pobert Givan. Model minimization in markov decision processes. In Proceedings of AAAI-97, pages 106-111. AAAI, 1997.
....Markov processes, and present two main strategies for removing irrelevant detail : state aggregation decomposition and temporal abstraction. State decomposition methods typically represent states as collections of factored variables [1] or simplify the automaton by eliminating useless states [3]. Temporal abstraction mechanisms, for example in hierarchical reinforcement learning [34, 5, 23] encapsulate lower level observation or action sequences into a single unit at more abstract levels. For a uni ed algebraic treatment of abstraction of Markov decision processes that covers both ....
....Task 1 MTS Fig. 5. State and action based decomposition of Markov processes. Other related techniques for decomposition of large MDPs have been explored, and some of these are illustrated in Figure 5. A simple decomposition strategy is to split a large MDP into sub MDPs, which interact weakly [23, 34, 3]. An example of weak interaction is navigation, where the only interaction among sub MDPs is the states that connect di erent rooms together. Another strategy is to decompose a large MDP using the set of available actions, such as in air campaign planning problem [19] or in conversational ....
T. Dean and R. Givan. Model minimization in markov decision processes. Proceedings of AAAI, 1997.
.... methods do this by aggregating a set of states and treating the states within any aggregate state as if they were identical [3] Within AI, abstraction techniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables [14, 7, 12]. These methods automatically generate abstract MDPs by exploiting structured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, 7] In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic ....
T. Dean and R. Givan. Model minimization in Markov decision processes. Proc. AAAI-97, pp.106--111, Providence, 1997.
....we address in our work. There has been extensive research in AI in recent years on solving stochastic planning problems with large action and state spaces and variety of techniques for reducing the complexity (typically exponential in the number of components) of these problems have been proposed [3, 7, 6, 5, 9, 12, 2, 11]. However, all these works assume a fixed structure and a fixed parameterization of the planning problem. The unique aspect of our planning problem is that the underlying topology characterizing the problem can vary and it is itself subject to random changes and fluctuations (due to failures) The ....
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, 1997.
....which shows that planning with concurrent options is more e ective than when only one option is executed at a time. There is a clear connection between the model of concurrent options proposed in this paper to work on factored MDPs (Boutilier Goldszmidt, 1995; Boutilier et al. to appear; Dean Givan, 1997). Our approach relies on factoring the set of state variables using sets of behaviors that do not con ict. However, we do not use compact models of actions, such as dynamic Bayesian nets. One immediate problem for future research is to investigate how to represent options using DBNs, which would ....
Dean, T., & Givan, R. (1997). Model minimization in markov decision processes. Proceedings of AAAI.
....process leads to a large state space, and a large action space. There has been extensive research in AI in recent years on solving MDPs with large state spaces exploiting, e.g. the structure of a speci c problem through factoring and decompositions [20, 7, 11, 10, 22, 6] or various abstractions [9, 14]. However, all these works assume nite or at least discrete state space and the optimal solution or its close approximations still may not be eciently computable. Nevertheless, we show in this work that under several commonly used utility functions the optimal trading strategy for the multi site ....
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Articial Intelligence, pages 106-111, Providence, 1997.
.... methods do this by aggregating a set of states and treating the states within any aggregate state as if they were identical [3] Within AI, abstraction techniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables [14, 8, 12]. These methods automatically generate abstract MDPs by exploiting structured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, 8] In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106-- 111, Providence, 1997.
....reward 18 function, again much as an SNLP based planner such as UCPOP might merge subplans for individual goals. In (Boutilier, Brafman, Geib 1998) a reachability analysis inspired by Graphplan is used to restrict the states considered for policy creation given an initial state. Givan and Dean (Givan Dean 1997) show that STRIPS style goal regression computes an approximate minimized form of the finite state automaton corresponding to the problem. In (Dean Givan 1997) the authors show how to use model minimization techniques to solve MDPs. Partial observability Similar work has been done with ....
....analysis inspired by Graphplan is used to restrict the states considered for policy creation given an initial state. Givan and Dean (Givan Dean 1997) show that STRIPS style goal regression computes an approximate minimized form of the finite state automaton corresponding to the problem. In (Dean Givan 1997) the authors show how to use model minimization techniques to solve MDPs. Partial observability Similar work has been done with partially observable Markov decision processes or POMDPs, in which the assumptionof complete observability is relaxed. In a POMDP there is a set of observation labels O ....
Dean, T., and Givan, R. 1997. Model minimization in markov decision processes. In Proc. Fourteenth National Conference on Artificial Intelligence, 106--111. AAAI Press.
....to avoid an explicit representation of the state space while maintaining desirable convergence and optimality properties. Boutilier, Dean, and Goldszmidt (1995) show that it is possible, in some cases, to replace a tabular representation with a compact representation such as a decision tree. Dean and Givan (1997) provide further insight into when a compact representation of a value function can be achieved by assigning the same utility to similar states. In theory, this makes it possible to manipulate vast numbers of states that have the same utility as if they were a single state. These are promising and ....
....induce an optimal value function that assigns different values to every state and an optimal policy that implies some form of utility relationship between any two states. In some cases, it is possible to show that MDP states can be aggregated without any effect on solution quality (Lin, 1997; Dean Givan, 1997), but state abstraction generally involves a tradeoff between optimality and compactness. Difficult issues that must be resolved are 1. The manner in which a transition model and reward function for the abstract model are derived from the original model. 2. The relationship between the solution ....
[Article contains additional citation context not shown here]
Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Aritificial Intelligence, pp. 106-- 111 Providence, Rhode Island. MIT Press.
....underlying MDP to states connected by macro actions. This model is called an abstract MDP model (after [9] and was studied in [7, 9, 8, 14] The abstract model is suitable for hierarchical methods (see e.g. 9] and provides an alternative to various approximation methods for solving large MDPs [5, 2, 4, 3]. In our work we focus our attention solely on the augmented model that works with both primitive actions and macro actions. Precup, Sutton and Singh [15, 16] demonstrated empirically the advantage of adding macroactions to the original model by speeding up the convergence rate of value iteration ....
T. Dean and R. Givan. Model minimization in Markov decision processes. AAAI-97, pp.106-111, Providence, 1997.
....such as linear programming. In a very di erent direction, we believe that we can extend this approach to exploit various other types of structure in the model, including structured action spaces, where at each stage several actions are taken in parallel, and the context sensitivity utilized by [3, 6]. As a more ambitious goal, we would also like to extend it to deal with the much harder problem of planning in Partially Observable MDPs. Acknowledgments We thank Xavier Boyen for an insightful discussion that led to Theorem 4.1 and Carlos Guestrin, Uri Lerner and Simon Tong for useful ....
T. Dean and R. Givan. Model minimization in Markov decision processes. In Proc. AAAI-97. MIT Press, 1997.
.... methods do this by aggregating a set of states and treating the states within any aggregate state as if they were identical [3] Within AI, abstractiontechniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables [14, 7, 12]. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, 7] In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic ....
T. Dean and R. Givan. Model minimization in Markov decision processes. Proc. AAAI-97, pp.106--111, Providence, 1997.
....a succinctly represented MDP is to group states together into meta states, where all elements of a meta state behave exactly or approximately the same with respect to the reward function and all actions. These can be described as aggregate approximate models. We use the terminology of Dean, et al. (Dean, Givan, Leach 1997; Dean Givan 1997; Givan, Leach, Dean 1997; Givan Dean 1997) of Bounded Interval MDPs (BMDPs) These are MDPlike models where the transition probabilities and rewards are replaced by intervals. Work on planning algorithms for such systems by Givan et al. is reported in (Givan, Leach, Dean ....
....is to group states together into meta states, where all elements of a meta state behave exactly or approximately the same with respect to the reward function and all actions. These can be described as aggregate approximate models. We use the terminology of Dean, et al. Dean, Givan, Leach 1997; Dean Givan 1997; Givan, Leach, Dean 1997; Givan Dean 1997) of Bounded Interval MDPs (BMDPs) These are MDPlike models where the transition probabilities and rewards are replaced by intervals. Work on planning algorithms for such systems by Givan et al. is reported in (Givan, Leach, Dean 1997) and by ....
[Article contains additional citation context not shown here]
Dean, T., and Givan, R. 1997. Model minimization in Markov decision processes. In Proceedings of the 14th National Conference on Artificial Intelligence and 9th Innovative Applications of Artificial Intelligence Conference (AAAI-97/IAAI-97), 106--111. Menlo Park: AAAI Press.
....research on the use of MDPs for DTP has focussed on methods for solving MDPs that avoid explicit state space enumeration while constructing optimal or approximately optimal policies. These include the use of function approximators for value functions [2] aggregation and abstraction techniques [12, 6, 5, 8], reachability analysis [9] and decomposition techniques [11] This work was supported by NSERC Research Grant OGP0121843 and IRIS II Project IC 7. y Part of this work was undertaken at U.B.C. and was supported by NSERC. z Part of this work was undertaken at U.B.C. and was supported by ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, Providence, 1997.
....research on the use of MDPs for DTP has focussed on methods for solving MDPs that avoid explicit state space enumeration while constructing optimal or approximately optimal policies. These include the use of function approximators for value functions [2] aggregation and abstraction techniques [5, 6, 12, 8], reachability analysis [9] and decomposition techniques [11] This work was supported by NSERC Research Grant OGP0121843 and IRIS II Project IC 7. y Part of this work was undertaken at U.B.C. and was supported by NSERC. z Part of this work was undertaken at U.B.C. and was supported by ....
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, Providence, 1997.
....P (sjs Agg 2 ) X s 0 2s Agg 1 P (s 0 js; a) Alternatively one might construct a simpler MDP model with aggregate states that uses upper and lower bounds on transition probabilities and that does not require priors on states P (sjs Agg ) to be de ned. Such an approach was pursued by [Dean, Givan 97] Dean et al. 97] who also devised techniques to extract simpler models for factored MDPs. Note that the computation of the new simpler model from the old one may require a signi cant amount of time. If the model reduction is performed during problem solving, the overhead time spent on the ....
T. Dean, R. Givan. Model minimization in Markov decision processes. In Proceedings of the AAAI-97, pp. 106-111, 1997.
No context found.
T. Dean and R. Givan. Model minimization in Markov decision processes. AAAI-97, pp.106--111, Providence, 1997.
No context found.
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings AAAI-97, 1997.
No context found.
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI. 106-111.
No context found.
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI, 1997.
....methods address this issue by exploiting redundancy in problem speci cation to reduce the size of the MDP model. Symmetries in a problem speci cation can give rise to special forms of redundancy that are not exploited by existing minimization methods. In this work we extend Dean and Givan s [5] model minimization framework to include symmetries. We base our framework on concepts derived from nite state automata and group theory. 1 Introduction Markov Decision Processes (MDPs) 21] are a popular way to model stochastic sequential decision problems. But most modelling and solution ....
....decision problems. But most modelling and solution approaches to MDPs su er from the fact that they scale poorly with the size of the problem. While modelling real world scenarios, often there is a lot of redundancy in the MDP model. Model minimization methods introduced by Dean and Givan [5] exploit such redundancy in the problem speci cation to derive smaller models, i.e. models with fewer states, by aggregating equivalent states. Figure 1 illustrates the model minimization process. The gridworld on the left is the given MDP. This has the usual gridworld dynamics with 4 ....
[Article contains additional citation context not shown here]
Dean, T. and Givan, R. 1997. Model minimization in Markov Decision Processes. In Proceedings of AAAI-97.
....] Backstrom and Klein [ 1991 ] Bylander [ 1994 ] and Gupta and Nau [ 1991 ] provide basic results concerning the complexity of STRIPS planning and special cases. Etzioni [ 1993 ] describes a particular algorithm for reachability analysis and provides a survey of related techniques. In [ Dean and Givan, 1997 ] we show how model minimization can be used to solve implicit (or factored) Markov decision processes (MDPs) with very large state spaces, and prove that our model minimization based algorithms are asymptotically equivalent to existing methods (e.g. Boutilier et al. 1995 ] that operate on ....
....minimization can be used to solve implicit (or factored) Markov decision processes (MDPs) with very large state spaces, and prove that our model minimization based algorithms are asymptotically equivalent to existing methods (e.g. Boutilier et al. 1995 ] that operate on implicit MDPs. In [ Dean et al. 1997 ] we show how model reduction techniques can be used to trade time for space in computing approximately optimal solutions to Markov decision processes. Finally, in the longer version of this paper, we show how the methods of this paper can be used to understand the advantages of the ....
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
....use in solving partially observable MDPs. Puterman [13] provides an excellent introduction to Markov decision processes and techniques involving bounding value functions. Boutilier and Dearden [5] and Boutilier et al. 7] describe methods for solving implicitly described MDPs and Dean and Givan [8] reinterpret this work in terms of computing explicitly described MDPs with aggregate states. Bounded parameter MDPs allow us to represent uncertainty about or variation in the parameters of a Markov decision process. Interval value functions capture the resulting variation in policy values. In ....
Dean, T. and Givan, R., "Model Minimization in Markov Decision Processes, " Proceedings of AAAI-97, Providence, RI, 1997.
....about the structure of good plans. 5.1. 4 Model Minimization and Reduction Methods The abstraction techniques defined above can be recast in terms of minimizing a stochastic automaton, providing a unifying view of the different methods and offering new insights into the abstraction process [35]. From automata theory we know that for any given finite state machine M recognizing a language L there exists a unique minimal finite state machine M 0 which also recognizes L. It could be that M = M 0 but it might also be the case that M 0 is exponentially smaller than M . This minimal ....
....31(b) Now the property is satisfied for all pairs of clusters and the model shown in Figure 31(b) is the minimal model. 2 The Lee and Yannakakis algorithm for non deterministic finite state machines has been extended by Givan and Dean to handle classical STRIPS planning problems [57] and MDPs [35]. The basic step of splitting a cluster is closely related to goal regression and this relationship is explored in [57] There are variants of the model reduction approach that apply to the case in which the action space is large and in a factored form [36] for example, when each action is ....
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111. AAAI, 1997.
....section we discuss in more detail how BMDPs relate to MDPIPs. In a related paper, we have shown how BMDPs can be used as part of a strategy for efficiently approximating the solution of MDPs with very large state spaces and dynamics compactly encoded in a factored (or implicit) representation [ Dean et al. 1997 ] In this paper, we focus exclusively on BMDPs, on the BMDP analog of value functions, called interval value functions, and on policy selection for a BMDP. We provide BMDP analogs of the standard (exact) MDP algorithms for computing the value function for a fixed policy (plan) and (more ....
....i P q2B j F pq (ff) i Intuitively, this means that all states in a block behave approximately the same (assuming the lower and upper bounds are close to each other) in terms of transitions to other blocks even though they may differ widely with regard to transitions to individual states. In Dean et al. 1997 ] we discuss methods for using an implicit representation of a exact MDP with a large number of states to construct an explicit BMDP with a possibly much smaller number of states based on an aggregation method. We then show that policies computed for this BMDP can be extended to the original ....
[Article contains additional citation context not shown here]
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
....parameters over the primitive states belonging to the aggregates. In a related paper, we have shown how BMDPs can be used as part of a strategy for efficiently approximating the solution of MDPs with very large state spaces and dynamics compactly encoded in a factored (or implicit) representation [Dean et al. 1997]. In this paper, we focus exclusively on BMDPs, on the BMDP analog of value functions, called interval value functions, and on policy selection for a BMDP. We provide BMDP analogs of the standard (exact) MDP algorithms for computing the value function for a fixed policy (plan) and (more generally) ....
....X q2B j F pq (ff) 3 5 18 Intuitively, this means that all states in a block behave approximately the same (assuming the lower and upper bounds are close to each other) in terms of transitions to other blocks even though they may differ widely with regard to transitions to individual states. Dean et al. 1997] discuss methods for using an implicit representation of a exact MDP with a large number of states to construct a BMDP with a much smaller number of states based on an aggregation like that just discussed. We then show how to use this BMDP to compute policies with desirable properties with respect ....
[Article contains additional citation context not shown here]
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
.... [ Hartmanis and Stearns, 1966 ] and stochastic processes [ Kemeny and Snell, 1960 ] and has surfaced more recently in the work on model checking in computer aided verification [ Burch et al. 1994 ] Lee and Yannakakis, 1992 ] Building on the work of Lee and Yannakakis [ 1992 ] we have shown [ Dean and Givan, 1997 ] that several existing algorithms are asymptotically equivalent to first constructing the minimal reduced MDP and then solving this MDP using traditional methods that operate on the flat (unfactored) representations. The minimal model may be exponentially larger than the original compact MDP. In ....
....in the estimated value function from one iteration to the next during value iteration) The methods for manipulating factored representations of MDPs were largely borrowed from Boutilier et al. 1995b ] which provides an iterative algorithm for finding optimal solutions to factored MDPs. Dean and Givan [ 1997 ] describe a model minimization algorithm for solving factored MDPs which is asymptotically equivalent to the algorithm in [ Boutilier et al. 1995b ] Boutilier and Dearden [ extend the work in [ Boutilier et al. 1995b ] to compute approximate solutions to factored MDPs by associating ....
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
....However, the computational difficulty of applying classic dynamic programming algorithms to realistic problems has spurred much research into techniques to deal with large state and action spaces. These include function approximation [4] reachability considerations [8] and aggregation techniques [11, 6, 7]. One general method for tackling large MDPs is decomposition [10, 15, 17, 5] AnMDP is either specified in terms of a set of pseudo independent subprocesses [17] or automatically decomposed into such subprocesses [5] These subMDPs are then solved and the solutions to these subMDPs are ....
T. Dean and R. Givan. Model minimization in markov decision processes. Proc. AAAI-97, pp.106--111, Providence, 1997.
....where these spaces are generally too large to be explicitly enumerated. Considerable research has been directed toward the solution of Markov decision processes (MDPs) with large state and action spaces. These include function approximation [2] reachability analyses [5] and aggregation techniques [7, 3, 4]. Despite these advances, little attention has been paid to the reuse of policies or value functions generated for one MDP in the solution of a related MDP. While such reasoning is common in classical planning for instance, through the use of macros [8, 16, 13] or plan repair strategies ....
T. Dean and R. Givan. Model minimization in Markov decision processes. AAAI-97, pp.106--111, Providence, 1997.
....provide examples in which our techniques provide leverage and examples in which they fail to do so, and summarize the results of experiments with a preliminary implementation. 1. Introduction The methods developed in this paper extend specific planning algorithms [Boutilier et al. 1995; Dean Givan, 1997] developed for handling large state spaces to handle large action spaces. The basic approach involves reformulating a problem with large state and action spaces as a problem with much smaller state and action spaces. These methods are particularly effective in cases in which there are a large set ....
.... nonlinear operator that is at the heart of value iteration requires computing for each state a value maximizing over all actions and taking expectations over all states (11) In structured methods for solving MDPs with factorial representations, such as those described in [Boutilier et al. 1995] [Dean Givan, 1997], instead of considering the consequences of an action taking individual states to individual states, we consider how an action takes sets of states to sets of states. Here we extend this idea to consider how sets of actions takes sets of states to sets of states (see Figure 2) The result is a ....
[Article contains additional citation context not shown here]
Dean, Thomas and Givan, Robert, 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
No context found.
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, 1997.
No context found.
T. Dean and R. Givan. Model minimization in Markov decision processes. Proceedings of the National Conference on Artificial Intelligence, 14:106--111, 1997.
No context found.
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, 1997.
No context found.
Dean, T., and R. Givan. 1997. Model minimization in Markov decision processes. In Proc. 14th National Conference on Arti cial Intelligence, pp. 106 111 Providence, RI.
No context found.
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In AAAI/IAAI, pages 106111, 1997.
No context found.
Dean, T. and Givan, R.: 1997, Model minimization in Markov decision processes, in: Proc. of the American Association of Artificial Intelligence (AAAI-97).
No context found.
T. Dean and R. Givan, Model minimization in Markov decision processes, in ########### ## ### #### ######## ########## ## ######### ############ ### ### ########## ############ ## ######### ### ########## ########## #################, (Menlo Park), pp. 106111, AAAI Press, July 2731 1997.
No context found.
T. Dean and R. Givan, "Model minimization in Markov decision processes," pp. 106--111, 1997.
No context found.
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the American Association of Arti cial Intelligence (AAAI97) . AAAI Press, 1997.
No context found.
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of the American Association of Arti cial Intelligence (AAAI97) . AAAI Press, 1997.
No context found.
T. Dean and R. Givan. Model minimization in Markov decision processes. In Proc. AAAI-97. MIT Press, 1997.
No context found.
T. Dean and R. Givan. Model minimization in Markov decision processes. Proceedings of AAAI, 1997.
No context found.
Thomas Dean and Robert Givan. Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 106--111, 1997.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC