| W.S. LOVEJOY. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operations Research 28, 47--66 (1991). |
....as partially observable MDPs. Thus POMDPs have been considered in areas such as system inspection, maintenance, and management, with a large body of literature, from which we refer the reader to a very few [101, 75, 117, 61] General references on POMDPs include surveys by Monhan [90] and Lovejoy [79], and books by Bertsekas [9] and White [123] In AI research, MDPs have been recently adopted as a framework for acting and planning under uncertainty. For instance, probabilistic planning problems [74, 12] are a signi cant example of POMDPs, which motivated our analysis of in nite horizon ....
....7.1.1 POMDPs Early papers on POMDP problem formulations include those by Drake [36] Astrom [6] and Aoki [3] Sondik and Smallwood [113, 115] are probably the rst in addressing computational diculties associated with solving POMDPs. Key references on applications and algorithms are the surveys [90, 79]. In recent years, many algorithms and heuristics have been developed for solving POMDPs, especially in the arti cial intelligence research community. Analysis of the complexity of solving MDPs began with the work of Papadimitriou and Tsitsiklis [98] The computational complexity of ....
W. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, pages 47-66, 1991.
....holds for POMDPs under the total undiscounted reward and the average reward criteria. However, under the discounted criterion, the optimal value is approximable to within any 0; due to the presence of the discount factor, and value iteration algorithms, in particular, take advantage of this [30]. 4.6 Existence of Optimal Finite Controllers and Optimal Strings The question of whether an optimal nite controller exists for a given POMDP is an interesting one. A similar case to the threshold isolation can be made for motivating such a question: given a POMDP and an initial distribution, ....
....paper. 30 Early papers on POMDP problem formulations include those by Drake [16] Astrom [3] and Aoki [2] Sondik and Smallwood [46,47] are probably the rst in addressing computational diculties associated with solving POMDPs. Earlier references on applications and algorithms are the surveys [37,30]. Analysis of the complexity of solving MDPs and POMDPs began with the work of Papadimitriou and Tsitsiklis [40] in which they formally establish NP hardness results for a variety of nite horizon POMDP problems, but only conjecture as to the undecidability of the POMDP in nite horizon case. ....
W. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, pages 47-66, 1991.
....our results. Early papers on POMDP problem formulations include those by Drake [10] Astrom [3] and Aoki [2] Sondik and Smallwood [30,31] are probably the first in addressing computational difficulties associated with solving POMDPs. Key references on applications and algorithms are the surveys [21,19]. Analysis of the complexity of solving POMDPs began with the work of Papadimitriou and Tsitsiklis [25] in which they formally establish NP hardness results for a variety of finite horizon problems, but only conjecture as to the undecidability of the infinite horizon case. The computational ....
W. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, pages 47--66, 1991.
.... In this chapter, I discuss algorithms that address this issue using a parameterized 146 representation of the exact value functions produced in value iteration; algorithms have been developed that attempt to represent approximations of the infinite horizon value function more directly [100], but I will not discuss these representations here. 7.3 Algorithms for Solving Information state MDPS The information state MDP is a special kind of MDP, and many different algorithms are available for solving it. The algorithms in Chapter 2 do not apply directly, because those algorithms were ....
....provided that there is some linear predictor that performs well; could these re suits shed some light on learning good policies Algorithmic approaches to solving POMDPs, especially in terms of the information state MDP, have been surveyed extensively. The surveys by Monahah [109] Lovejoy [100], and White [176] are all clear, concise, and contain a great deal of useful information. 173 Although in this chapter I focus on approximating the infinite horizon problem using exact methods for the finite horizon, other methods have been explored. Lovejoy s survey cites several other methods ....
[Article contains additional citation context not shown here]
William S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28:47 66, 1991.
....is the reason for using the term non Markovian tasks in this paper. 1 the belief state can be computed, which indicates the relative probabilities of being in different environmental states, and from which the optimal policy can be computed using techniques derived from dynamic programming (Lovejoy, 1991). If no a priori model of the environment is available, there are different ways to proceed. One way is to build a model online, which learns to predict observations and rewards, and in this way learns to infer the environmental state at each point. A separate controller can then use these ....
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47--66.
.... where the utility is an arbitrary function of the final state (or the accumulation of rewards received earlier) Recently there have been claims made that Markov decision processes (MDPs) Bellman 1957, Puterman 1994) and partially observable Markov decision problems (POMDPs) Monahan 1982, Lovejoy 1991, Zhang Liu 1997, Kaelbling, Littman Cassandra 1998) are the appropriate framework for developing decision theoretic planners (e.g. Boutilier, Dearden Goldszmidt 1995, Geffner Bonet 1998) MDPs, POMDPs and dynamical system in general (Luenberger 1979) are based on the notion of a state. ....
....will let us characterise loop termination. 31 3 Comparison with Other Representations 3. 1 POMDPs and Bayesian nets Here is a way to motivate this paper that is orthogonal to the presentation of Section 1: ffl We start with partially observable Markov decision problems (POMDPs) Monahan 1982, Lovejoy 1991, Kaelbling et al. 1998) as the underlying model. ffl We describe states in terms of propositions or random variables. ffl We want to be able to express a locality that some variables only depend on a few other variables; this gives us Bayesian networks (Pearl 1988) or dynamic Bayesian networks ....
[Article contains additional citation context not shown here]
Lovejoy, W. (1991). A survey of algorithmic methods for partially observable Markov decision processes, Annals of Operations Research 28: 47--66.
....to be undecidable. The results apply to corresponding approximation problems as well. 1 Introduction We show that problems in probabilistic planning (Kushmerick, Hanks, Weld 1995; Boutilier, Dean, Hanks 1999) and infinite horizon partially observable Markov decision processes (POMDPs) (Lovejoy 1991; White 1993) are uncomputable. These models are central to the study of decision theoretic planning and stochastic control problems, and no computability results have previously been established for probabilistic planning. The undecidability of finding an optimal policy for an infinite horizon ....
Lovejoy, W. 1991. A survey of algorithmic methods for partially observable Markov decision processes.
....necessary to disambiguate situations. POMDP Approach Another strategy consists of using hidden Markov model #HMM# techniques to learn a model of the environment, including the hidden state, then to use that model to construct a perfect memory controller #Cassandra, Kaelbling, Littman, 1994; Lovejoy, 1991; Monahan, 1982#. Chrisman #1992# showed how the forward backward algorithm for learning HMMs could be adapted to learning POMDPs. He, and later McCallum #1993#, also gave heuristic statesplitting rules to attempt to learn the smallest possible model for a given environment. The resulting model ....
Lovejoy, W. S. #1991#. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47#66.
....are noise corrupted; at least part of the current state value is inaccessible. The POMDP requires information regarding how observations are related to state and action values and hence is more data intensive than the MDP. Current exact numerical solution procedures for the POMDP are presented in [2, 3, 4, 5, 15, 18, 19, 21, 25, 26, 30]. This paper is organized as follows. In Section 2, the POMDP is defined and several fundamental results are presented on which our optimal solution approach is based. A discussion of how we integrate GA and MIP is presented in Section 3. Section 4 briefly reviews the random keys GA, dc niche ....
....add to X ## in order to produce X # so that A(#, X # ) A(#,X) 2.7 Determination of A(#, X) Assume A(#, X ## ) is given for finite set of witness points X ## # X. The algorithm in Figure 2 determines A(#, X # ) A(#,X) Note that since # # # #, v(x) # z(x) and u # # 0. See 6 [2, 3, 4, 5, 15, 18, 19, 21, 25, 26, 30] for more details of LP based algorithms for determining A(#, X) from # and A(#, X # ) Step 0. Set # # = A(#, X ## ) and X # = X ## . Step 1. Determine the x # # X that causes the maximization in the following problem to be attained: u # # MAX v(x) z(x) S.T. v(x) max x# : ....
W. S. Lovejoy. A survey of algorithmic methods for the partially observable Markov decision processes. Annals of Operations Research, 28:47--66, 1991.
....1 This work was supported by AFOSR Grant F49620 96 10028. This paper was published in Proc. 1997 Conf. Decision and Control Because of the uncertain nature of the underlying object types, dynamic sensor management problems are a class of partially observed Markov decision problems (POMDP) [2, 1, 6, 7]. As such, this class of problems can be solved using stochastic dynamic programming [3] However, for large numbers of objects, the required state space is high dimensional, consisting of the conditional probability distributions of all of the objects. The sensor management problems discussed in ....
....constraints into expected value constraints. The modified SM formulation can be solved using Lagrangian relaxation [10] to decouple the multiobject problem into many single object problems. The single object problems are of small dimension, and can be solved using standard algorithms for POMDPs [6, 7]. The solutions of the single object problems are used to solve for the optimal values of the coordination variables (Lagrange multipliers) in an iterative fashion. The approach is illustrated using an example involving classification of 10,000 unknown objects. The remainder of this paper is ....
[Article contains additional citation context not shown here]
W. S. Lovejoy, "A survey of algorithmic methods for partially observable Markov decision processes," Annals of Operations Research, v. 28, 1991.
....resulting in the agent s uncertainty shrinking at each step. Generally, some actions in some situations will decrease the uncertainty while others will increase it. 2.3.2. Planning A variety of algorithms have been developed for explicitly solving partially observable Markov decision processes [28]. However the problem is so computationally challenging [29] that most techniques are too inefficient to be used on all but the smallest problems. The Witness algorithm [30, 31] finds exact solutions to discounted finite horizon partially observable Markov decision processes using value iteration. ....
W. Lovejoy, "A survey of algorithmic methods for partially observable Markov decision processes," Annals of Operations Research, vol. 28, pp. 47--66, 1991.
....the value function of a POMDP with 2 states. 5 4 Literature Review 4.1 History: Operations Research and Artificial Intelligence POMDPs were formulated by A. Drake in 1962 ( 11] and have been in the attention of researchers in Operations Research (see Monahan s and Lovejoy s surveys [25] and [21]) About ten years ago people in the Artificial Intelligence community approached the field, and in the recent years there have been a number of PhD theses about POMDPs ( 24] 18] 15] 3] and [12] 4.2 Piecewise Linear Convexity Smallwood and Sondik formulated the optimal control problem ....
W.S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, Vol 28, pages 47-66, 1991. 19
....complex; the agent chooses between actions based on the amount of information they provide, the amount of reward they produce, and how they change the state of the world. This paper is intended to make two contributions. The first is to recapitulate work from the operations research literature [36,42,56,59,64] and to describe its connection to closely related work in AI. The second is to describe a novel algorithmic approach for solving pomdps exactly. We begin by introducing the theory of Markov decision processes (mdps) and pomdps. We then outline a novel algorithm for solving pomdps off line and ....
William S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28(1):47--65, 1991.
.... particular state, yielding a formalism that is known as Markov Decision Processes (MDP) 20] When uncertainty is allowed with respect to the actual state of a system, for example because only part of the state is observable, an MDP is called a Partially Observable Markov Decision Process (POMDP) [14]. While in an MDP the problem is to nd optimal sequences of actions given state transitions, in a POMDP optimal action sequences are sought with respect to probability distributions over the states. Although the MDP formalism o ers a very general and unrestrictive formalism for representing ....
W.S. Lovejoy, A survey of algorithmic methods for partially observable Markov decision processes, Annals of Operations Research 28 (1991) 47-66. 12
....shown to be undecidable. The results apply to corresponding approximation problems as well. 1 Introduction We show that problems in probabilistic planning (Kushmerick, Hanks, Weld 1995; Boutilier, Dean, Hanks 1999) and infinite horizon partially observable Markov decision processes (POMDPs) (Lovejoy 1991; White 1993) are uncomputable. These models are central to the study of decision theoretic planning and stochastic control problems, and no computability results have previously been established for probabilistic planning. The undecidability of finding an optimal policy for an infinite horizon ....
Lovejoy, W. 1991. A survey of algorithmic methods for partially observable Markov decision processes.
....only more complex; the agent chooses between actions that differ in the amount of information they provide, the amount of reward they produce, and how they change the state of the world. Much of the content of this paper is a recapitulation of work in the operations research literature [28, 33, 48, 50, 54]. We have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective; for example, we give a novel development of exact finite horizon pomdp algorithms in terms of policy trees instead of the classical algebraic approach [48] We begin by introducing the ....
William S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28(1):47--65, 1991.
....complex; the agent chooses between actions based on the amount of information they provide, the amount of reward they produce, and how they change the state of the world. This paper is intended to make two contributions. The first is to recapitulate work from the operations research literature [30,35,50,52,57] and to describe its connection to closely related work in AI. The second is to describe a novel algorithmic approach for solving pomdps exactly. We begin by introducing the theory of Markov decision processes (mdps) and pomdps. We then outline a novel algorithm for solving pomdps off line and ....
William S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28(1):47--65, 1991.
....for the sample environment is P 3 (red) R, P 3 (blue) L that takes 5 total steps to reach the goal. This paper is concerned with finding optimal memoryless policies. The problem of finding more general optimal policies is addressed by research on partially observable Markov decision processes [6, 2] and is even harder in some sense although the policies are typically of higher quality. We can also distinguish policies as to whether they are deterministic or probabilistic. The intractability arguments of the next section only apply to deterministic policies, which consistently choose the same ....
William S. Lovejoy. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operations Research, 28:47--66, 1991.
....status or precisely where it is. The theory of partially observable Markov decision processes (pomdp s) Astrom, 1965, Smallwood and Sondik, 1973, Cassandra et al. 1994] models this situation and provides a basis for computing optimal behavior. A variety of algorithms exist for solving pomdp s [Lovejoy, 1991], but because the problem is so computationally challenging [Papadimitriou and Tsitsiklis, 1987] most techniques are too inefficient to be used on all but the smallest problems (2 to 5 states [Cheng, 1988] Recently, the Witness algorithm [Cassandra, 1994, Littman, 1994] has been used to solve ....
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28:47--66.
....of as generating probabilistic plans. Second, there are many instances in which simple probabilistic plans perform nearly as well as much larger and more complicated deterministic plans; this notion is often exploited in the field of randomized algorithms. Work by Platzman (1981) described by Lovejoy, 1991) shows how the idea of randomized plans can come in handy for planning in partially observable domains. 3. Looping Plans Looping plans can be applied to infinite horizon control. The complexity of plan existence and plan evaluation in flat domains (Theorems 1 and 2) does not depend on the ....
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28 (1), 47--65.
No context found.
W.S. LOVEJOY. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operations Research 28, 47--66 (1991).
No context found.
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research 28, 47--66.
No context found.
W.S. LOVEJOY. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operations Research 28, 47--66 (1991).
No context found.
W.S. Lovejoy. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operaitons Research, 28(1):47--65, 1991.
No context found.
William S. Lovejoy. A survey of algorithmic methods for partially observable markov decision processes. Annals of Operations Research, 28:47--66, 1991.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC