Results 1  10
of
14
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 168 (1 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
 Journal of Artificial Intelligence Research
, 2001
"... Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems. 1. Introduction POMDPs model sequential decision making problems where effects of actions are nondeterministic and the state of the world is not known with certainty. They have attracted many researchers in Operations Research and Artificial Intelligence because of their potential applications in a wide range of areas (Monahan 1982, Cassandra 1998b), one of which is planning under uncertai...
Incremental methods for computing bounds in partially observable Markov decision processes
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental methods for computing bounds on the value function for control problems with infinite discounted horizon criteria. The methods described and tested include novel incremental versions of gridbased linear interpolation method and simple lower bound method with Sondik's updates. Both of these can work with arbitrary points of the belief space and can be enhanced by various heuristic point selection strategies. Also introduced is a new method for computing an initial upper bound  the fast informed bound method. This method is able to improve significantly on the standard and commonly used upper boun...
The Censored Newsvendor and the Optimal Acquisition of Information
 Operations Research
, 1998
"... This paper investigates the effect of demand censoring on the optimal policy in newsvendor inventory models with general parametric demand distributions and unknown parameter values. We show that the newsvendor problem with observable lost sales reduces to a sequence of singleperiod problems while ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
This paper investigates the effect of demand censoring on the optimal policy in newsvendor inventory models with general parametric demand distributions and unknown parameter values. We show that the newsvendor problem with observable lost sales reduces to a sequence of singleperiod problems while the newsvendor problem with unobservable lost sales requires a dynamic analysis. Using a Bayesian Markov decision process approach we show that the optimal inventory level in the presence of partially observable demand is higher than when demand is completely observed. We explore the economic rationality for this observation and illustrate it with numerical examples. Key words: Inventory, Bayesian Markov decision processes, lost sales, demand estimation, censoring Inspite of the extensive research on inventory models, there still remain some practical issues that have not received due consideration. One of them is demand estimation and its effect on optimal policies. Most results in stocha...
Planning and control in stochastic domains with imperfect information
, 1997
"... Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combi ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. Our work focuses on two possible approaches that can be used to solve larger problems: approximation methods and exploitation of additional problem structure. First, a number of new eÆcient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex v...
Algorithms for Partially Observable Markov Decision Processes
 HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
, 2001
"... Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are... ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are...
Bayesian Sequential Decision Processes with Censored Observations: Derivative Analysis and Applications
"... Censored (or truncated) observations are quite prevalent in practice. How does the existence of imperfect observations affect optimal decisions? We consider this problem in the setting of a general finitehorizon Bayesian sequential decision process. We first prove a general derivative result that r ..."
Abstract
 Add to MetaCart
(Show Context)
Censored (or truncated) observations are quite prevalent in practice. How does the existence of imperfect observations affect optimal decisions? We consider this problem in the setting of a general finitehorizon Bayesian sequential decision process. We first prove a general derivative result that resembles the classic envelope theorem. We then show that for a class of Bayesian sequential decision processes, there exists a recursive formula for the firstorder derivative of the Bayesian dynamic programming objective function. With this formula, the derivative can be computed by a simple backward iteration, and the optimal decision in each period can be determined based on the firstorder condition. For illustrative purposes, we give three application examples in the contexts of dynamic pricing, online auction, and optimal inventory control. The insights obtained from these applications all suggest that the existence of imperfect observations tends to shift optimal decisions in the direction of reducing the effect of censoring.
An Envelope Theorem for Bayesian Dynamic Program and Its Application to an Inventory Problem
, 2009
"... A generalized envelope theorem is established for a Bayesian dynamic program problem. An application of the theorem is given in a Bayesian inventory management problem with unobserved lost sales. Specifically, we show that the optimal inventory level with unobserved lost sales is greater than the op ..."
Abstract
 Add to MetaCart
(Show Context)
A generalized envelope theorem is established for a Bayesian dynamic program problem. An application of the theorem is given in a Bayesian inventory management problem with unobserved lost sales. Specifically, we show that the optimal inventory level with unobserved lost sales is greater than the optimal inventory level with observed lost sales. We prove this result under the continuous demand distribution, which complements the existing work in the literature. We further comment that the results can be easily extended to the Markovmodulated demand process.
1 Project Description
"... This project will examine the problem of controlling the inventory of an product with partially observed, nonstationary, random demand. The probability distribution for the demand process is not known with certainty at any point in time, and this distribution may randomly change from one period to ..."
Abstract
 Add to MetaCart
(Show Context)
This project will examine the problem of controlling the inventory of an product with partially observed, nonstationary, random demand. The probability distribution for the demand process is not known with certainty at any point in time, and this distribution may randomly change from one period to the next. The underlying demand process is partially observed through the previous demand observations which are themselves random. Because the control decisions are made with only partial information about the demand process, the level of uncertainty and the cost of suboptimal decisions is much higher than for most problems considered in the research literature. The nonstationary aspect of the demand process further increases the uncertainty because older observations of demand are less valuable in identifying the current state than more recent observations. This problem is an accurate representation of the inventory control problems faced by many organizations. However, it has not been directly addressed in the inventory literature or by existing decision support systems � therefore, inventory managers are forced to make potentially costly simplifying assumptions when addressing this challenging problem. The primary objectives of this project are to 1. Develop a modeling framework that adequately captures the important aspects of the problem,
Multidisciplinary Development of a Proposed Early Warning and Automated Response Sys
"... Abstract: It is becoming increasingly clear that changes in the environment can promote growth of disease vectors like mosquitoes and rodents as well as rise in waterborne illnesses like cholera and other enteric diseases. Our paper is based on the concept that a multidisciplinary informatics sca ..."
Abstract
 Add to MetaCart
Abstract: It is becoming increasingly clear that changes in the environment can promote growth of disease vectors like mosquitoes and rodents as well as rise in waterborne illnesses like cholera and other enteric diseases. Our paper is based on the concept that a multidisciplinary informatics scaffold can serve as an Early Warning and Automated Response System (EWARS) for a variety of communicable diseases. It will utilize multiple information resources including Remote Sensing (RS), Global Positioning Systems (GPS) and Geographic Information Systems (GIS) for exposure assessment (John et al 2004). Early warning systems are not new (Witt et al 2009). However, what is novel in the proposal is the integration of an adaptive Fuzzy Logic Decision Support System to the early warning component being developed by the investigative team. It also markedly differs from other efforts to provide decisionsupport in the unique degree of detail and process guidance delivered as well as in the broad multidisciplinary nature of the study group. Use of such a system will enable prophylactic public health interventions to be rapidly deployed based on a realtime, scientificallybased assessment of the threat environment. Particular attention will be paid to environmental factors amenable to corrective interventions. The informatics component of the system will be based on open source software, and the software developed during this project will also be in the open source domain. The project is multinational and multidisciplinary.