Results 1  10
of
11
An expectation maximization algorithm for continuous markov decision processes with arbitrary rewards
 IN TWELFTH INT. CONF. ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS
, 2009
"... We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closedform solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.
New inference strategies for solving Markov decision processes using reversible jump MCMC
 IN UAI
, 2009
"... In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
Approximate Dynamic Programming
, 2008
"... This is an updated version of the researchoriented Chapter 6 on Approximate Dynamic Programming. In addition to editorial revisions and rearrangements, it includes an account of new research (joint with J. Yu), which is collected mostly in the new Section 6.7. The chapter will be periodically updat ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
This is an updated version of the researchoriented Chapter 6 on Approximate Dynamic Programming. In addition to editorial revisions and rearrangements, it includes an account of new research (joint with J. Yu), which is collected mostly in the new Section 6.7. The chapter will be periodically updated as new research becomes available, and will replace the current chapter 6 in the book’s next printing.
Learning Nonparametric Policies by Imitation
"... Abstract — A long cherished goal in artificial intelligence has been the ability to endow a robot with the capacity to learn and generalize skills from watching a human teacher. Such an ability to learn by imitation has remained hard to achieve due to a number of factors, including the problem of le ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract — A long cherished goal in artificial intelligence has been the ability to endow a robot with the capacity to learn and generalize skills from watching a human teacher. Such an ability to learn by imitation has remained hard to achieve due to a number of factors, including the problem of learning in highdimensional spaces and the problem of uncertainty. In this paper, we propose a new probabilistic approach to the problem of teaching a high degreeoffreedom robot (in particular, a humanoid robot) flexible and generalizable skills via imitation of a human teacher. The robot uses inference in a graphical model to learn sensorbased dynamics and infer a stable plan from a teacher’s demonstration of an action. The novel contribution of this work is a method for learning a nonparametric policy which generalizes a fixed action plan to operate over a continuous space of task variation. A notable feature of the approach is that it does not require any knowledge of the physics of the robot or the environment. By leveraging advances in probabilistic inference and Gaussian process regression, the method produces a nonparametric policy for sensorbased feedback control in continuous state and action spaces. We present experimental and simulation results using a Fujitsu HOAP2 humanoid robot demonstrating imitationbased learning of a task involving lifting objects of different weights from a single human demonstration. I.
Teaching Old Dogs New Tricks: Incremental Multimap Regression for Interactive Robot Learning from Demonstration
, 2010
"... We consider autonomous robots as having associated control policies that determine their actions in response to perceptions of the environment. Often, these controllers are explicitly transferred from a human via programmatic description or physical instantiation. Alternatively, Robot Learning from ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We consider autonomous robots as having associated control policies that determine their actions in response to perceptions of the environment. Often, these controllers are explicitly transferred from a human via programmatic description or physical instantiation. Alternatively, Robot Learning from Demonstration (RLfD) can enable a robot to learn a policy from observing only demonstrations of the task itself. We focus on interactive, teleoperative teaching, where the user manually controls the robot and provides demonstrations while receiving learner feedback. With regression, the collected perceptionactuation pairs are used to directly estimate the underlying policy mapping. This dissertation contributes an RLfD methodology for interactive, mixedinitiative learning of unknown tasks. The goal of the technique is to enable users to implicitly instantiate autonomous robot controllers that perform desired tasks as well as the demonstrator, as measured by taskspecific metrics. With standard regression techniques, we show that such “onpar” learning is restricted to policies typified by a manytoone mapping (a unimap) from perception to actuation. Thus, controllers representable as multistate Finite State Machines (FSMs) and that exhibit a onetomany mapping (a multimap) cannot be learnt. To be able to do so we must address the three issues of model selection (how many subtasks or FSM states), policy learning (for each subtask),
Inference strategies for solving semiMarkov decision processes
"... SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation giv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation gives us an intuitive way to reason about actions where it is also necessary to take into account how long these actions will take to perform. Formally we can define an SMDP as a continuoustime controlled stochastic process (x(t), u(t)) consisting, respectively, of states and actions at every point in time t where state transitions occur at random arrival times Tn. In particular, the process is stationary in between jumps, i.e. x(t) = xn and u(t) = un
On solving general statespace sequential decision problems using inference algorithms
"... A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expec ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing ExpectationMaximization (EM) algorithms. For example, Toussaint et al (2006) uses EM with optimal smoothing in the E step to solve finite statespace Markov Decision Processes. In this paper, we extend this EM approach in two directions. First, we derive a nontrivial EM algorithm for linear Gaussian models where the reward function is represented by a mixture of Gaussians, as opposed to the less flexible classical single quadratic function. Second, in order to treat arbitrary continuous statespace models, we present an EM algorithm with particle smoothing. However, by making the crucial observation that the stochastic control problem can be reinterpreted as one of transdimensional inference, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its smoothing counterparts. Moreover, this observation also enables us to design an alternative full Bayesian approach for policy search, which can be implemented using a single MCMC run. 1
A Hybrid MultiRobot Control Architecture
 Master’s thesis, Graduate School of Engineering, Air Force Institute of Technology (AETC), WrightPatterson AFB OH
, 2007
"... AFIT/GCS/ENG/0802 Multirobot systems provide system redundancy and enhanced capability versus single robot systems. Implementations of these systems are varied, each with specific design approaches geared towards an application domain. Some traditional single robot control architectures have been ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
AFIT/GCS/ENG/0802 Multirobot systems provide system redundancy and enhanced capability versus single robot systems. Implementations of these systems are varied, each with specific design approaches geared towards an application domain. Some traditional single robot control architectures have been expanded for multirobot systems, but these expansions predominantly focus on the addition of communication capabilities. Both design approaches are application specific and limit the generalizability of the system. This work presents a redesign of a common single robot architecture in order to provide a more sophisticated multirobot system. The single robot architecture chosen for application is the Three Layer Architecture (TLA). The primary strength of TLA is in the ability to perform both reactive and deliberative decision making, enabling the robot to be both sophisticated and perform well in stochastic environments. The redesign of this architecture includes incorporation of the Unified Behavior Framework (UBF) into the controller layer and an addition of a sequencerlike layer (called
Bayesian Policy Learning with TransDimensional MCMC
"... A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expe ..."
Abstract
 Add to MetaCart
A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing ExpectationMaximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of transdimensional inference. With this new understanding, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to carry out full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
RESEARCH ARTICLE A Bayesian Developmental Approach to Robotic GoalBased Imitation Learning
"... A fundamental challenge in robotics today is building robots that can learn new skills by observing humans and imitating human actions. We propose a new Bayesian approach to robotic learning by imitation inspired by the developmental hypothesis that children use selfexperience to bootstrap the proc ..."
Abstract
 Add to MetaCart
A fundamental challenge in robotics today is building robots that can learn new skills by observing humans and imitating human actions. We propose a new Bayesian approach to robotic learning by imitation inspired by the developmental hypothesis that children use selfexperience to bootstrap the process of intention recognition and goalbased imitation. Our approach allows an autonomous agent to: (i) learn probabilistic models of actions through selfdiscovery and experience, (ii) utilize these learned models for inferring the goals of human actions, and (iii) perform goalbased imitation for robotic learning and humanrobot collaboration. Such an approach allows a robot to leverage its increasing repertoire of learned behaviors to interpret increasingly complex human actions and use the inferred goals for imitation, even when the robot has very different actuators from humans. We demonstrate our approach using two different scenarios: (i) a simulated robot that learns humanlike gaze following behavior, and (ii) a robot that learns to imitate human actions in a tabletop organization task. In both cases, the agent learns a probabilistic model of its own actions, and uses this model for goal inference and goalbased imitation. We also show that the robotic agent can use its probabilistic model to seek human assistance when it recognizes that its inferred actions are too uncertain, risky, or impossible to perform, thereby opening the door to humanrobot collaboration.