Results 1  10
of
71
A framework for sequential planning in multiagent settings
 Journal of Artificial Intelligence Research
, 2005
"... This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian ..."
Abstract

Cited by 129 (32 self)
 Add to MetaCart
(Show Context)
This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian update to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents ’ autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piecewise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be nonunique and are not able to capture offequilibrium behaviors. We do so at the cost of having to represent, process and continually revise models of other agents. Since the agent’s beliefs may be arbitrarily nested the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions. 1.
Recursive Markov decision processes and recursive stochastic games
 In Proc. of 32nd Int. Coll. on Automata, Languages, and Programming (ICALP’05
, 2005
"... Abstract. We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), and study the decidability and complexity of algorithms for their analysis and verification. These models extend Recursive Markov Chains (RMCs), introduced in [EY05a,EY05b] as a natural ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
Abstract. We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), and study the decidability and complexity of algorithms for their analysis and verification. These models extend Recursive Markov Chains (RMCs), introduced in [EY05a,EY05b] as a natural model for verification of probabilistic procedural programs and related systems involving both recursion and probabilistic behavior. RMCs define a class of denumerable Markov chains with a rich theory generalizing that of stochastic contextfree grammars and multitype branching processes, and they are also intimately related to probabilistic pushdown systems. RMDPs & RSSGs extend RMCs with one controller or two adversarial players, respectively. Such extensions are useful for modeling nondeterministic and concurrent behavior, as well as modeling a system’s interactions with an environment. We provide a number of upper and lower bounds for deciding, given an RMDP (or RSSG) A and probability p, whether player 1 has a strategy to force termination at a desired exit with probability at least p. We also address “qualitative ” termination questions, where p = 1, and model checking questions. 1
Complexity of Planning with Partial Observability
 ICAPS 2004. Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling
, 2004
"... We show that for conditional planning with partial observability the problem of testing existence of plans with success probability 1 is 2EXPcomplete. This result completes the complexity picture for nonprobabilistic propositional planning. We also give new proofs for the EXPhardness of conditio ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
We show that for conditional planning with partial observability the problem of testing existence of plans with success probability 1 is 2EXPcomplete. This result completes the complexity picture for nonprobabilistic propositional planning. We also give new proofs for the EXPhardness of conditional planning with full observability and the EXPSPACEhardness of conditional planning without observability. The proofs demonstrate how lack of full observability allows the encoding of exponential space Turing machines in the planning problem, and how the necessity to have branching in plans corresponds to the move to a complexity class defined in terms of alternation from the corresponding deterministic complexity class. Lack of full observability necessitates the use of beliefs states, the number of which is exponential in the number of states, and alternation corresponds to the choices a branching plan can make.
On Decision Problems for Probabilistic Büchi Automata
, 2008
"... Probabilistic Büchi automata (PBA) are finitestate acceptors for infinite words where all choices are resolved by fixed distributions and where the accepted language is defined by the requirement that the measure of the accepting runs is positive. The main contribution of this paper is a complement ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
Probabilistic Büchi automata (PBA) are finitestate acceptors for infinite words where all choices are resolved by fixed distributions and where the accepted language is defined by the requirement that the measure of the accepting runs is positive. The main contribution of this paper is a complementation operator for PBA and a discussion on several algorithmic problems for PBA. All interesting problems, such as checking emptiness or equivalence for PBA or checking whether a finite transition system satisfies a PBAspecification, turn out to be undecidable. An important consequence of these results are several undecidability results for stochastic games with incomplete information, modelled by partiallyobservable Markov decision processes and ωregular winning objectives. Furthermore, we discuss an alternative semantics for PBA where it is required that almost all runs for an accepted word are accepting, which turns out to be less powerful, but has a decidable emptiness problem.
Feature reinforcement learning: Part I. Unstructured MDPs
 Journal of General Artificial Intelligence
, 2009
"... www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part
Formal models and algorithms for decentralized control of multiple agents
 Journal of Autonomous Agents and MultiAgent Systems
, 2008
"... Over the last five years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents ” under uncertainty. This problem arises in many application domains, such as multirobot coordination, manufacturing, information gathering, and load balancing. ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
Over the last five years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents ” under uncertainty. This problem arises in many application domains, such as multirobot coordination, manufacturing, information gathering, and load balancing. Such problems must be treated as decentralized decision problems because each agent may have different partial information about the other agents and about the state of the world. It has been shown that these problems are significantly harder than their centralized counterparts, requiring new formal models and algorithms to be developed. Rapid progress in recent years has produced a number of different frameworks, complexity results, and planning algorithms. The objectives of this paper are to provide a comprehensive overview of these results, to compare and contrast the existing frameworks, and to provide a deeper understanding of their relationships with one another, their strengths, and their weaknesses. While we focus on cooperative systems, we do point out important connections with gametheoretic approaches. We analyze five different formal frameworks, three different optimal algorithms, as well as a series of approximation techniques. The paper provides interesting insights into the structure of decentralized problems, the expressiveness of
Qualitative analysis of partiallyobservable Markov decision processes
 In CoRR: 0909.1645
, 2009
"... Abstract. We study observationbased strategies for partiallyobservable Markov decision processes (POMDPs) with omegaregular objectives. An observationbased strategy relies on partial information about the history of a play, namely, on the past sequence of observations. We consider the qualitative ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
(Show Context)
Abstract. We study observationbased strategies for partiallyobservable Markov decision processes (POMDPs) with omegaregular objectives. An observationbased strategy relies on partial information about the history of a play, namely, on the past sequence of observations. We consider the qualitative analysis problem: given a POMDP with an omegaregular objective, whether there is an observationbased strategy to achieve the objective with probability 1 (almostsure winning), or with positive probability (positive winning). Our main results are twofold. First, we present a complete picture of the computational complexity of the qualitative analysis of POMDPs with parity objectives (a canonical form to express omegaregular objectives) and its subclasses. Our contribution consists in establishing several upper and lower bounds that were not known in literature. Second, we present optimal bounds (matching upper and lower bounds) on the memory required by pure and randomized observationbased strategies for the qualitative analysis of POMDPs with parity objectives and its subclasses. 1
POMDPbased coding rate adaptation for typeI hybrid ARQ systems over fading channels with memory
 IEEE Trans. Wireless Commun
, 2006
"... Abstract — We address the issue of optimal coding rate scheduling for adaptive typeI hybrid automatic repeat request wireless systems. In this scheme, the coding rate is varied depending on channel, buffer and incoming traffic conditions. In general, we consider the hidden Markov model for both tim ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We address the issue of optimal coding rate scheduling for adaptive typeI hybrid automatic repeat request wireless systems. In this scheme, the coding rate is varied depending on channel, buffer and incoming traffic conditions. In general, we consider the hidden Markov model for both timevarying flat fading channel and bursty correlated incoming traffic. It is shown that the appropriate framework for computing the optimal coding rate allocation policies is partially observable Markov decision process (POMDP). In this framework, the optimal coding rate allocation policy maximizes the reward function, which is a weighted sum of throughput and buffer occupancy with appropriate sign. Since polynomial amount of space is needed to calculate the optimal policy even for a simple POMDP problem, we investigate maximumlikelihood, voting and QMDP policy heuristic approaches for the purpose of efficient and realtime solution. Our results show that three heuristics perform close to completely observable system state case if the fading and/or traffic state mixing rate is slow. On the other hand, when the channel fading is fast, QMDP heuristic is the most throughputefficient among considered heuristics. Also, its performance is close to the optimal coding rate allocation policy of fully observable system state case. We also explore the performances of the proposed heuristics in the bursty correlated traffic case and show that maximumlikelihood and voting heuristics consistently outperform the nonadaptive case. Index Terms — Packet scheduling, partially observable Markov decision process, wireless timevarying fading channel, adaptive typeI hybrid automatic repeat request, hidden Markov model. I.
Quantitative model checking revisited: neither decidable nor approximable
 In FORMATS’07, LNCS 4763
, 2007
"... Abstract. Quantitative model checking computes the probability values of a given property quantifying over all possible schedulers. It turns out that maximum and minimum probabilities calculated in such a way are overestimations on models of distributed systems in which components are loosely coup ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
(Show Context)
Abstract. Quantitative model checking computes the probability values of a given property quantifying over all possible schedulers. It turns out that maximum and minimum probabilities calculated in such a way are overestimations on models of distributed systems in which components are loosely coupled and share little information with each other (and hence arbitrary schedulers may result too powerful). Therefore, we focus on the quantitative model checking problem restricted to distributed schedulers that are obtained only as a combination of local schedulers (i.e. the schedulers of each component) and show that this problem is undecidable. In fact, we show that there is no algorithm that can compute an approximation to the maximum probability of reaching a state within a given bound when restricted to distributed schedulers. 1