Results 1  10
of
29
CHOMP: Covariant Hamiltonian Optimization for Motion Planning
"... Popular highdimensional motion planning algorithms seem to pay an overly high price in terms of trajectory quality for their performance. In domains sparsely populated by obstacles, the heuristics used by samplingbased planners to navigate narrow passages can be needlessly complex; furthermore, ad ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
Popular highdimensional motion planning algorithms seem to pay an overly high price in terms of trajectory quality for their performance. In domains sparsely populated by obstacles, the heuristics used by samplingbased planners to navigate narrow passages can be needlessly complex; furthermore, additional postprocessing is required to remove the jerky or extraneous motions from the paths that such planners generate. In this paper, we present CHOMP, a novel method for continuous path refinement that uses covariant gradient techniques to improve the quality of sampled trajectories. Our optimization technique converges over a wider range of input paths and is able to optimize higher order dynamics of trajectories than previous path optimization strategies. As a result, CHOMP can be used as a standalone motion planner in many realworld planning queries. The effectiveness of our proposed method is demonstrated in manipulation planning for a 6DOF robotic arm as well as in trajectory generation for a walking quadruped robot. 1
Computational Rationalization: The Inverse Equilibrium Problem
"... Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multiagent domains. Here, unlike singleagent settings, a player cannot myopically maximize its reward — it must speculate on how the other agents may act to influence the game’s outcome. Employing the gametheoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior, as well as recovering a reward function in these domains. 1.
Bayesian multitask inverse reinforcement learning
"... Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same t ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same task. Our main technical contribution is to solve the problem by formalising it as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. We show that our methodology allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and imitation learning from multiple teachers.
Bootstrapping Apprenticeship Learning
"... We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of stateaction features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert’s policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations. 1
Preference elicitation and inverse reinforcement learning
, 2011
"... Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent’s policy is suboptimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent’s preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.
Maximum Causal Entropy Correlated Equilibria for Markov Games
"... Motivated by a machine learning perspective—that gametheoretic equilibria constraints should serve as guidelines for predicting agents ’ strategies, we introduce maximum causal entropy correlated equilibria (MCECE), a novel solution concept for generalsum Markov games. In line with this perspective ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Motivated by a machine learning perspective—that gametheoretic equilibria constraints should serve as guidelines for predicting agents ’ strategies, we introduce maximum causal entropy correlated equilibria (MCECE), a novel solution concept for generalsum Markov games. In line with this perspective, a MCECE strategy profile is a uniquelydefined joint probability distribution over actions for each game state that minimizes the worstcase prediction of agents ’ actions under logloss. Equivalently, it maximizes the worstcase growth rate for gambling on the sequences of agents’ joint actions under uniform odds. We present a convex optimization technique for obtaining MCECE strategy profiles that resembles value iteration in finitehorizon games. We assess the predictive benefits of our approach by predicting the strategies generated by previously proposed correlated equilibria solution concepts, and compare against those previous approaches on that same prediction task.
Learning the Structure and Parameters of LargePopulation Graphical Games from Behavioral Data
"... We formalize and study the problem of learning the structure and parameters of graphical games from strictly behavioral data. We cast the problem as a maximum likelihood estimation (MLE) based on a generative model defined by the purestrategy Nash equilibria (PSNE) of the game. The formulation brin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We formalize and study the problem of learning the structure and parameters of graphical games from strictly behavioral data. We cast the problem as a maximum likelihood estimation (MLE) based on a generative model defined by the purestrategy Nash equilibria (PSNE) of the game. The formulation brings out the interplay between goodnessoffit and model complexity: good models capture the equilibrium behavior represented in the data while controlling the true number of PSNE, including those potentially unobserved. We provide a generalization bound for MLE. We discuss several optimization algorithms including convex loss minimization (CLM), sigmoidal approximations and exhaustive search. We formally prove that games in our hypothesis space have a small true number of PSNE, with high probability; thus, CLM is sound. We illustrate our approach, show and discuss promising results on synthetic data and the U.S. congressional voting records. 1
Predictive Inverse Optimal Control in Large Decision Processes via Heuristicbased Search
"... Human behavior is much more structured than physical limitations require; variability in tasks ranging from manipulating an object (Castiello, 2005) to locomoting (Taga, 1995) is relatively small. For robots ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Human behavior is much more structured than physical limitations require; variability in tasks ranging from manipulating an object (Castiello, 2005) to locomoting (Taga, 1995) is relatively small. For robots
The Principle of Maximum Causal Entropy for Estimating Interacting Processes
"... Abstract—The principle of maximum entropy provides a powerful framework for estimating joint, conditional, and marginal probability distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work pre ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The principle of maximum entropy provides a powerful framework for estimating joint, conditional, and marginal probability distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work presents the principle of maximum causal entropy—an approach based on directed information theory for estimating an unknown process based on its interactions with a known process. We demonstrate the breadth of the approach using two applications: a predictive solution for inverse optimal control in decision processes and computing equilibrium strategies in sequential games. Index Terms—Maximum entropy, statistical estimation, causal entropy, directed information, inverse optimal control, inverse reinforcement learning, correlated equilibrium. I.
Probabilistic inverse reinforcement learning in unknown environments, arXiv preprint arXiv:1307.3785
"... Abstract We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous p ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.