#### DMCA

## Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods (2001)

### Cached

### Download Links

- [pecan.srv.cs.cmu.edu]
- [repository.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Conference on Robotics and Automation |

Citations: | 116 - 1 self |

### Citations

5464 | Reinforcement Learning: An Introduction
- Sutton, R, et al.
- 1998
(Show Context)
Citation Context ...h an algorithm encourages is the antithesis of what we would hope for in building safe controllers. The literature on the exploration /exploitation problem in reinforcement learning is extensive. See =-=[6]-=- for further discussion of the problem. II. Preliminary Setup We address first the formalism necessary to discuss our results. The measure theoretic details are of little importance and can be ignored... |

3828 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...f robustness to undermodeling. Further, even given a good model, the complexity of building optimal policies typically rises exponentially in the number of dimensions. (The "curse of dimensionali=-=ty", [2]-=-). Finally, learning systems, and particularly those operating in the physical world where experiments are costly and time-consuming, must face the well-know exploration /exploitation dilemma. The lea... |

301 | Near-optimal reinforcement learning in polynomial time
- Kearns, Singh
- 1998
(Show Context)
Citation Context ...o the stochastic transitions of an MDP. Good empirical results were obtained, but this method relies on an assumption that model error is uncorrelated through time and space, which is rarely the case.=-=[5]-=- make exploration deliberative and guarantee near-optimal performance in polynomial time. Although this leads to nice theoretical results about the complexity of reinforcement learning, the aggressive... |

256 | Pegasus: A policy search method for large mdps and pomdps
- Ng, Jordan
- 2000
(Show Context)
Citation Context ... called strategies or policies here) that come from a restricted class, denoted \Pi. For the purposes of this paper, we will consider all offline simulations to be on a deterministic simulative model =-=[7]-=- where we can sample a typical event, ! 2\Omega under the distribution Q (the joint distribution of initial states, models, and Markov noise in transitions and observations) and that each such ! can b... |

211 | Algorithms for sequential decision making
- Littman
- 1996
(Show Context)
Citation Context ...e the Markov property that makes dynamic programming an efficient solution technique. The problem becomes similar to the one of finding memoryless policies in a POMDP, and thus a reduction similar to =-=[10]-=- proves the result. D. Sampling Algorithms Until this point we have deferred the question of sampling from the space\Omega\Gamma In the case of Bayesian parametric approximators of system dynamics, sa... |

49 |
Using locally weighted regression for robot learning
- Atkeson
- 1991
(Show Context)
Citation Context ...o policy evaluation. However, in many problems in robotics, it has been demonstrated that non-parametric regression techniques admirably serve to model the often highly non-linear and noisy dynamics. =-=[11]-=- These techniques make it impossible to directly sample from the space of possible models. Some non-parametric models like Locally Weighted Bayesian Regression do make it possible to sample from a set... |

30 |
System identification of small-size unmanned helicopter dynamics
- Mettler, Tischler, et al.
- 1999
(Show Context)
Citation Context ...e so-called "core dynamics" of the helicopter, the pitch, roll, and horizontal translations. The dynamic instabilities are known to lie in these dynamics, and control of these is therefore p=-=aramount. [12]-=- Existing proportional-derivative (PD) controllers, tediously tuned by the helicopter team, were used on the yaw-heave dynamics. From a high-level, the goal will be the regulation of the helicopter ho... |

25 | Exploiting model uncertainty estimates for safe dynamic control learning
- Schneider
- 1996
(Show Context)
Citation Context ...he control cycle by suitably limiting the complexity of the controller structure. Finally, it is often the case that physical insight leads to good selections of controller class. A. Previous Work In =-=[4]-=-, safety is addressed by treating learned model uncertainty as another source of noise to be incorporated into the stochastic transitions of an MDP. Good empirical results were obtained, but this meth... |

7 |
Memory based stochastic optimization
- Moore, Schneider
- 1995
(Show Context)
Citation Context ...istic approaches commonly used in experiment design. (For a discussion of the application of stochastic optimization in artificial intelligence and a description of the algorithms mentioned here, see =-=[9].) Al-=-gorithms developing controllers to maximize this criterion can be seen as searching for a good experiment to perform to collect information; they are essentially designed according to the "optimi... |

4 |
Robustness and exploration in policy-search based reinforcement learning
- Bagnell, Schneider
- 2001
(Show Context)
Citation Context ...robustness and exploration In many applications it will be important to consider optimization criterion that more explicitly encourage robustness and exploration. We address these issues at length in =-=[8]-=-. Briefly, the central idea for safety and robustness criterion is to consider maximizing the performance on the worst model in a large set of models, or on almost all trajectories the controller exec... |

3 |
Littman,Algorithms for Sequential Decision Making
- unknown authors
- 1996
(Show Context)
Citation Context ... lose the Markov property that makes dynamic programming an ecient solution technique. The problem becomes similar to the one ofsnding memoryless policies in a POMDP, and thus a reduction similar to =-=[10]-=- proves the result. D. Sampling Algorithms Until this point we have deferred the question of sampling from the spaces. In the case of Bayesian parametric approximators of system dynamics, sampling can... |

3 |
System identi of small-size unmanned helicopter dynamics
- Mettler, Tischler, et al.
- 1999
(Show Context)
Citation Context ...e so-called \core dynamics" of the helicopter, the pitch, roll, and horizontal translations. The dynamic instabilities are known to lie in these dynamics, and control of these is therefore paramount. =-=[12]-=- Existing proportional-derivative (PD) controllers, tediously tuned by the helicopter team, were used on the yaw-heave dynamics. From a high-level, the goal will be the regulation of the helicopter ho... |

2 |
Approximateplanning in large pomdps via reusable trajectories
- Kearns, Mansour, et al.
- 1999
(Show Context)
Citation Context ...that is not apparent in any reasonable number of evaluations. This property is captured in theorems relating the uniform convergence of such estimates and the complexity of the searched policy class. =-=[3]-=- Structured policies are very natural in the robotics field as well. It is natural to build restriction we would like on the controller directly into its structure. One can also easily limit the amoun... |

2 |
Exploitingmodel uncertainty estimates for safe dynamic control learning
- Schneider
- 1996
(Show Context)
Citation Context ...he control cycle by suitably limiting the complexity of the controller structure. Finally, it is often the case that physical insight leads to good selections of controller class. A. Previous Work In =-=[4]-=-, safety is addressed by treating learned model uncertainty as another source of noise to be incorporated into the stochastic transitions of an MDP. Good empirical results were obtained, but this meth... |