| Edward Sondik. The Optimal Control of Partially Observable Markov Decision Processes. PhD thesis, Stanford University, 1971. |
....be nice to take into account the fact that monitoring will be approximate. This may lead to ways to solve POMDPs in a bounded optimal fashion. Partially Observable Markov Decision Processes (POMDPs) The POMDP concept was first introduced in the control theory and operations re search communities [9, 1, 35, 31, 6, 29] as a framework to model stochastic dynamical systems and to make optimal decisions. This framework was later considered by the artificial intelligence community as a principled approach to do planning under un certainty [4, 22] Compared to other methods, POMDPs have the advantage of a ....
....Note as well that many aspects of this algorithm will be reused by the algorithms introduced for error bound computation and approximation search. Among the policy iteration algorithms, Hansen s algorithm [13, 14] is reviewed because it is conceptually simple (compared to Sondik s algorithm [35, 36]) it has been observed to outperform value iteration algorithms on several examples and it circumvents the need for belief state monitoring by computing policies represented by finite state controllers. 2.3.1 Incremental Pruning Incremental pruning was first introduced by Zhang and Liu [38] and ....
Edward J. Sondik. The optimal control of partially observable Markov Decision Processes. PhD thesis, Stanford university, Palo Alto, 1971.
....b. DOMINATESLP( VARIABLES: b1 ; bs ; d; hb ( k )i d 8 k 2 , b i = 1 ; b i 0; 8i. IF d 0 RETURN b. Figure 4: Linear program for checking if the set of vectors dominates a vector . is very successful, and often converges much faster than value iteration. Sondik [1978] suggested the use of policy iteration for POMDPs, proposing that the policy be represented as a finite state machine. Hansen [1998] proposed a more practical and implementable version of policy iteration for POMDPs, and proved that it converges to the optimal policy. He also showed, empirically, ....
E. J. Sondik. The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26:282--304, 1978.
.... things are not quite as bad as they might seem, since knowledge the observation history may be summarized in the form of a belief state (the current distribution over states) which is itself updated based only upon the current observation, and knowledge of which is sucient for optimal behaviour [1, 31, 32]. Thus, the main contribution of this paper is to extend GPOMDP to classes of parameterized policies whose actions depend on an internal state (rather than the current observation or a nite history of observations) Since, in general, it will not be possible to store the entire belief state for ....
E J Sondik. The optimal control of partially observable Markov decision processes over the innite horizon: Discounted costs. Operations Research, 26, 1978.
.... things are not quite as bad as they might seem, since knowledge the observation history may be summarized in the form of a belief state (the current distribution over states) which is itself updated based only upon the current observation, and knowledge of which is sucient for optimal behaviour [1, 31, 32]. Thus, the main contribution of this paper is to extend GPOMDP to classes of parameterized policies whose actions depend on an internal state (rather than the current observation or a nite history of observations) Since, in general, it will not be possible to store the entire belief state for ....
R D Smallwood and E J Sondik. The optimal control of partially observable Markov decision processes over a nite horizon. Operations Research, 21:1071-1098, 1973.
....; d; MAXIMIZE: d; SUBJECT TO: hb ( k )i d 8 k 2 , P i b i = 1 ; b i 0; 8i. IF d 0 RETURN dominated; ELSE RETURN b. Figure 4: Linear program for checking if the set of vectors dominates a vector . is very successful, and often converges much faster than value iteration. Sondik [1978] suggested the use of policy iteration for POMDPs, proposing that the policy be represented as a finite state machine. Hansen [1998] proposed a more practical and implementable version of policy iteration for POMDPs, and proved that it converges to the optimal policy. He also showed, empirically, ....
E. J. Sondik. The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26:282--304, 1978.
....are the speech utterances given by the speech recognition system, from which some knowledge about the current state can be inferred. By accepting the partial observability of the world, the dialogue problem becomes one that is addressed by Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971). Finding an optimal policy for a given POMDP model corresponds to defining an optimal dialogue strategy. Optimality is attained within the context of a set of rewards that define the relative value of taking various actions. We will show that conventional MDP solutions are insufficient, and that ....
....appropriately for better or worse sensing. As the speech recognition degrades, the POMDP policy acquires reward more slowly, but makes fewer mistakes and blind guesses compared to a conventional MDP policy. There are several POMDP algorithms that may be the natural choice for policy generation (Sondik, 1971; Monahan, 1982; Parr and Russell, 1995; Cassandra et al. 1997; Kaelbling et al. 1998; Thrun, 1999) However, solving real world dialogue scenarios is computationally intractable for full blown POMDP solvers, as the complexity is doubly exponential in the number of states. We therefore will use ....
E. Sondik. 1971. The Optimal Control of Partially Observable Markov Decision Processes. Ph.D. thesis, Stanford University, Stanford, California.
....are the speech utterances given by the speech recognition system, from which some knowledge about the current state can be inferred. By accepting the partial observability of the world, the dialogue problem becomes one that is addressed by Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971). Finding an optimal policy for a given POMDP model corresponds to defining an optimal dialogue strategy. Optimality is attained within the context of a set of rewards that define the relative value of taking various actions. We will show that conventional MDP solutions are insufficient, and that ....
....appropriately for better or worse sensing. As the speech recognition degrades, the POMDP policy acquires reward more slowly, but makes fewer mistakes and blind guesses compared to a conventional MDP policy. There are several POMDP algorithms that may be the natural choice for policy generation (Sondik, 1971; Monahan, 1982; Parr and Russell, 1995; Cassandra et al. 1997; Kaelbling et al. 1998; Thrun, 1999) However, solving real world dialogue scenarios is computationally intractable for full blown POMDP solvers, as the complexity is doubly exponential in the number of states. We therefore will use ....
E. Sondik. 1971. The Optimal Control of Partially Observable Markov Decision Processes. Ph.D. thesis, Stanford University, Stanford, California.
....are noise corrupted; at least part of the current state value is inaccessible. The POMDP requires information regarding how observations are related to state and action values and hence is more data intensive than the MDP. Current exact numerical solution procedures for the POMDP are presented in [2, 3, 4, 5, 15, 18, 19, 21, 25, 26, 30]. This paper is organized as follows. In Section 2, the POMDP is defined and several fundamental results are presented on which our optimal solution approach is based. A discussion of how we integrate GA and MIP is presented in Section 3. Section 4 briefly reviews the random keys GA, dc niche ....
....be accrued over the problem horizon, relative to the set of all strategies, conditioned on the a priori pmv. 2.2 Preliminary Results Let X = x : x i # 0,i#S, # i#S x i=1 . Define x(t) x i (t) i#S , where x i (t)is the probability that s(t) i, given the history at stage t. It is shown in [25, 30] that the resulting process, the information process x(t) t=0,1, T ,isasu#cient statistic for the POMDP. Hence, it is su#cient to base action selection at stage t on x(t) Let v t,T (x)bethe maximum expected total discounted reward to be accrued from stage t through the terminal stage, ....
[Article contains additional citation context not shown here]
R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21:1071--1088, 1973.
....applicability in two ways. First, the POMDP is more data intensive than the MDP, additionally requiring information regarding how observations are related to state and action values. Second, current numerical solution procedures for the POMDP are not tractable for realistically sized problems. See [21, 24, 27, 32, 33, 34, 35, 41, 42] for further discussions. This paper is organized as follows. In Section 2, the POMDP is defined and several fundamental results are presented on which our heuristics are based. The basics of GA are discussed in Section 3 within the context of the POMDP. Section 4 reviews the concept of random ....
E. J. Sondik. The optimal control of partially observable Markov decision processes over th infinite horizon: discounted costs. Operations Research, 24:282--304, 1978.
....applicability in two ways. First, the POMDP is more data intensive than the MDP, additionally requiring information regarding how observations are related to state and action values. Second, current numerical solution procedures for the POMDP are not tractable for realistically sized problems. See [21, 24, 27, 32, 33, 34, 35, 41, 42] for further discussions. This paper is organized as follows. In Section 2, the POMDP is defined and several fundamental results are presented on which our heuristics are based. The basics of GA are discussed in Section 3 within the context of the POMDP. Section 4 reviews the concept of random ....
....accrued over the problem horizon, relative to the set of all strategies, conditioned on the a priori pmv. 2.2 Preliminary Results Let X = x : x i # 0,i# S, # i#S x i =1 . Define x(t) x i (t) i# S , where x i (t) is the probability that s(t) i, given the history at stage t. It is shown in [33, 40] that the resulting process, the information process x(t) t=0, 1, T ,isasu#cient statistic for the POMDP. Hence, it is su#cient to base action selection at stage t on x(t) Let v t,T (x) be the maximum expected total discounted reward to be accrued from stage t through the terminal stage, ....
R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21:1071--1088, 1973.
....be modeled as Markov decision processes (MDPs) Bellman, 1957; Howard, 1960; Puterman, 1994; Boutilier, Dean, Hanks, 1999) and their extensions. The model of choice for problems similar to patient management is the partially observable Markov decision process (POMDP) Drake, 1962; Astrom, 1965; Sondik, 1971; Lovejoy, 1991b) The POMDP represents two sources of uncertainty: stochasticity of the underlying controlled process (e.g. disease dynamics in the patient management problem) and imperfect observability of its states via a set of noisy observations (e.g. symptoms, ndings, results of tests) In ....
....to beliefstate MDPs is that the belief space is in nite and we need to compute an update V i = HV i 1 for all of it. This poses the following threats: the value function for the ith step may not be representable by nite means and or computable in a nite number of steps. To address this problem Sondik (Sondik, 1971; Smallwood Sondik, 1973) showed that one can guarantee the computability of the ith value function as well as its nite description for a belief state MDP by considering only piecewise linear and convex representations of value function estimates (see Figure 4) In particular, Sondik showed ....
[Article contains additional citation context not shown here]
Sondik, E. J. (1971). The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University.
....a planner that models the effect of ambiguous sensing can act to disambiguate the user s intention by asking for confirmation of which box. Such a planning methodology does in fact exist; the Partially Observable Markov Decision Process (POMDP) maintains probabilistic model of sensing and acting (Sondik, 1971). Whereas the simple Markov Decision Process (MDP) models the problem of unpredictability in how the world changes as a result of actions, the POMDP models the problem of not even being able to determine what the state is. Instead of making decisions based on the current perceived state of the ....
....stated above, but the current state at any point in time is not known, and can only be inferred from observations over time. 4 2.0. 2 Partially Observable Markov Decision Processes Partially Observable Markov Decision Processes (POMDPs) are a framework for planning in partially observable worlds (Sondik, 1971; Cassandra et al. 1994) At no time is a system acting in the world guaranteed to know the state of the world, and the system may not even know its own state with respect to the world (e.g. its exact position may be only partially observable) The POMDP consists of an underlying, unobservable ....
[Article contains additional citation context not shown here]
Sondik, E. (1971). The Optimal Control of Partially Observable Markov Decision Processes. PhD thesis, Stanford University, Stanford, California.
....good empirical results with a branch andbound method for finding globally optimal deterministic policies, and a gradient ascent method for finding locally optimal stochastic policies. 1 INTRODUCTION In many application domains, the partially observable Markov decision process (POMDP) [1, 22, 23, 5, 9, 4, 13] is a much more realistic model than its completely observable counterpart, the classic MDP [11, 20] However, the complexity resulting from the lack of observability limits the application of POMDPs to dramatically small decision problems. One of the difficulties of the optimal Bayesian ....
E.J. Sondik. The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26, 1978.
....good empirical results with a branch andbound method for finding globally optimal deterministic policies, and a gradient ascent method for finding locally optimal stochastic policies. 1 INTRODUCTION In many application domains, the partially observable Markov decision process (POMDP) [1, 22, 23, 5, 9, 4, 13] is a much more realistic model than its completely observable counterpart, the classic MDP [11, 20] However, the complexity resulting from the lack of observability limits the application of POMDPs to dramatically small decision problems. One of the difficulties of the optimal Bayesian ....
R.D. Smallwood and E.J. Sondik. The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21:1071-- 1098, 1973.
No context found.
Edward Sondik. The Optimal Control of Partially Observable Markov Decision Processes. PhD thesis, Stanford University, 1971.
No context found.
Sondik, E. J. (1978). The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26.
No context found.
E.J. Sondik, The optimal control of partially observable Markov Decision Processes, Stanford, 1971
No context found.
Sondik, E. J. (1971). The optimal control of partially observable Markov decision processes. Doctoral dissertation, Stanford University.
No context found.
E. J. Sondik. The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University, 1971.
No context found.
E. J. Sondik. The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University, 1971.
No context found.
E. J. Sondik. The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University, 1971. 87
No context found.
Edward J. Sondik. The optimal control of partially observable Markov Decision Processes. PhD thesis, Stanford university, Palo Alto, 1971.
No context found.
EJ Sondik. The Optimal Control of Partially Observable Markov Decision Processes. Phd, Stanford University, 1971.
No context found.
E. J. Sondik. The Optimal Control of Partially Observable Markov DEcision PRocesses. PhD thesis, Stanford University, California, 1971.
No context found.
E. J. Sondik, "The optimal control of partially observable Markov decision processes," Ph.D. dissertation, Stanford University, 1971.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC