Results 1 - 10
of
10
Monte-Carlo Planning in Large POMDPs
- In Advances in Neural Information Processing Systems 23
, 2010
"... This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent’s belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent’s belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions. These properties enable POMCP to plan effectively in significantly larger POMDPs than has previously been possible. We demonstrate its effectiveness in three large POMDPs. We scale up a well-known benchmark problem, rocksample, by several orders of magnitude. We also introduce two challenging new POMDPs: 10 × 10 battleship and partially observable PacMan, with approximately 10 18 and 10 56 states respectively. Our Monte-Carlo planning algorithm achieved a high level of performance with no prior knowledge, and was also able to exploit simple domain knowledge to achieve better results with less search. POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs. 1
Multi-Robot Coordination with Periodic Connectivity
"... Abstract — We consider the problem of multi-robot coordination subject to constraints on the configuration. Specifically, we examine the case in which a mobile network of robots must search, survey, or cover an environment while remaining connected. While many algorithms utilize continual connectivi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — We consider the problem of multi-robot coordination subject to constraints on the configuration. Specifically, we examine the case in which a mobile network of robots must search, survey, or cover an environment while remaining connected. While many algorithms utilize continual connectivity for such tasks, we relax this requirement and introduce the idea of periodic connectivity, where the network must regain connectivity at a fixed interval. We show that, in some cases, this problem reduces to the well-studied NP-hard multi-robot informative path planning (MIPP) problem, and we propose an online algorithm that scales linearly in the number of robots and allows for arbitrary periodic connectivity constraints. We prove theoretical performance guarantees and validate our approach in the coordinated search domain in simulation and in real-world experiments. Our proposed algorithm significantly outperforms a gradient method that requires continual connectivity and performs competitively with a market-based approach, but at a fraction of the computational cost. I.
Relatively Robust Grasping
"... In this paper, we present an approach for robustly grasping objects under positional uncertainty. We maintain a belief state (a probability distribution over world states), model the problem as a partially observable Markov decision process (POMDP), and select actions with a receding horizon using f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we present an approach for robustly grasping objects under positional uncertainty. We maintain a belief state (a probability distribution over world states), model the problem as a partially observable Markov decision process (POMDP), and select actions with a receding horizon using forward search through the belief space. Our actions are world-relative trajectories, or fixed trajectories expressed relative to the most-likely state of the world. We localize the object, ensure its reachability, and robustly grasp it at a goal position by using information-gathering, reorientation, and goal actions. We choose among candidate actions in a tractable way online by computing and storing the observation models needed for belief update offline. This framework is used to successfully grasp objects (including a powerdrill and a Brita pitcher) despite significant uncertainty, both in simulation and with an actual robot arm.
Bayesian Theory of Mind
"... We present a computational framework for Theory of Mind (ToM): the human ability to make joint inferences about the unobservable beliefs and preferences underlying the observed actions of other agents. These mental state attributions can be understood as Bayesian inferences in a probabilistic genera ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a computational framework for Theory of Mind (ToM): the human ability to make joint inferences about the unobservable beliefs and preferences underlying the observed actions of other agents. These mental state attributions can be understood as Bayesian inferences in a probabilistic generative model for rational action, or planning under uncertain and incomplete information, formalized as a Partially Observable Markov Decision Problem (POMDP). That is, we posit that ToM inferences approximately reconstruct the combination of a reward function and belief state trajectory for an agent based on observing that agent’s action sequence in a given environment. We test this POMDP model by showing human subjects the trajectories of agents moving in simple spatial environments and asking for joint inferences about the agents ’ utilities and beliefs about unobserved aspects of the environment. Our model performs substantially better than two simpler variants: one in which preferences are inferred without reference to an agents ’ beliefs, and another in which beliefs are inferred without reference to the agent’s dynamic observations in the environment. We find that preference inferences are substantially more robust and consistent with our model’s predictions than are belief inferences, in line with classic work showing that the ability to infer goals is more concretely grounded in visual data, develops earlier in infancy, and can be localized to specific neurons in the primate brain. 1
unknown title
"... Abstract—The difficulties encountered in sequential decisionmaking problems under uncertainty are often linked to the large size of the state space. Exploiting the structure of the problem, for example by employing a factored representation, is usually an efficient approach but, in the case of parti ..."
Abstract
- Add to MetaCart
Abstract—The difficulties encountered in sequential decisionmaking problems under uncertainty are often linked to the large size of the state space. Exploiting the structure of the problem, for example by employing a factored representation, is usually an efficient approach but, in the case of partially observable Markov decision processes, the fact that some state variables may be visible has not been sufficiently appreciated. In this article, we present a complementary analysis and discussion about MOMDPs, a formalism that exploits the fact that the state space may be factored in one visible part and one hidden part. Starting from a POMDP description, we dig into the structure of the belief update, value function, and the consequences in value iteration, specifically how classical algorithms can be adapted to this factorization, and demonstrate the resulting benefits through an empirical evaluation. I.
Structured Parameter Elicitation
"... The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable ..."
Abstract
- Add to MetaCart
The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable Markov decision processes (POMDPs) provide a principled framework for such uncertainty planning tasks, but they suffer from high computational complexity. However, POMDPs for parameter elicitation often possess special structural properties, specifically, factorization and symmetry. This work identifies these properties and exploits them for efficient solution through a factored belief representation. The experimental results show that our new POMDP solvers outperform SARSOP and MOMDP, two of the fastest general-purpose POMDP solvers available, and can handle significantly larger problems.
Improving the Efficiency of Clearing with Multi-Agent Teams
"... We present an anytime algorithm for coordinating multiple autonomous searchers to find a potentially adversarial target on a graphical representation of a physical environment. This problem is closely related to the mathematical problem of searching for an adversary on a graph. Prior methods in the ..."
Abstract
- Add to MetaCart
We present an anytime algorithm for coordinating multiple autonomous searchers to find a potentially adversarial target on a graphical representation of a physical environment. This problem is closely related to the mathematical problem of searching for an adversary on a graph. Prior methods in the literature treat multi-agent search as either a worst-case problem (i.e., clear an environment of an adversarial evader with potentially infinite speed), or an average-case problem (i.e., minimize average capture time given a model of the target’s motion). Both of these problems have been shown to be NP-hard, and optimal solutions typically scale exponentially in the number of searchers. We propose treating search as a resource allocation problem, which leads to a scalable anytime algorithm for generating schedules that clear the environment of a worst-case adversarial target and have good average-case performance considering a nonadversarial motion model. Our algorithm yields theoretically bounded average-case performance and allows for online and decentralized operation, making it applicable to real-world search tasks. We validate our proposed algorithm through a large number of experiments in simulation and with a team of robot and human searchers in an office building.
Learning the Behavior Model of a Robot
"... Complex artifacts are designed today from well specified and well modeled components. But most often, the models of these components cannot be composed into a global functional model of the artifact. A significant observation, modeling and identification effort is required to get such a global model ..."
Abstract
- Add to MetaCart
Complex artifacts are designed today from well specified and well modeled components. But most often, the models of these components cannot be composed into a global functional model of the artifact. A significant observation, modeling and identification effort is required to get such a global model, which is needed in order to better understand, control and improve the designed artifact. Robotics provides a good illustration of this need. Autonomous robots are able to achieve more and more complex tasks, relying on more advanced sensori-motor functions. To better understand their behavior and improve their performance, it becomes necessary but more difficult to characterize and to model, at the global level, how robots behave in a given environment. Low-level models of sensors, actuators and controllers cannot be easily combined into a behavior model. Sometimes high level models operators used for planning are also available, but generally they are too coarse to represent the actual robot behavior. We propose here a general framework for learning from observation data the behavior model of a robot when performing a given task. The behavior is modeled as a Dynamic Bayesian Network, a convenient stochastic structured representations. We show how such a probabilistic model can be learned and how it can be used to improve, on line, the robot behavior with respect to a specific environment and user preferences. Framework and algorithms are detailed; they are substantiated by experimental results for autonomous navigation tasks. 1 1
Author manuscript, published in "22nd International Conference on Tools with Artificial Intelligence- ICTAI 2010 (2010)"
, 2010
"... Abstract—The difficulties encountered in sequential decisionmaking problems under uncertainty are often linked to the large size of the state space. Exploiting the structure of the problem, for example by employing a factored representation, is usually an efficient approach but, in the case of parti ..."
Abstract
- Add to MetaCart
Abstract—The difficulties encountered in sequential decisionmaking problems under uncertainty are often linked to the large size of the state space. Exploiting the structure of the problem, for example by employing a factored representation, is usually an efficient approach but, in the case of partially observable Markov decision processes, the fact that some state variables may be visible has not been sufficiently appreciated. In this article, we present a complementary analysis and discussion about MOMDPs, a formalism that exploits the fact that the state space may be factored in one visible part and one hidden part. Starting from a POMDP description, we dig into the structure of the belief update, value function, and the consequences in value iteration, specifically how classical algorithms can be adapted to this factorization, and demonstrate the resulting benefits through an empirical evaluation. I.

