DMCA
Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment
Citations
427 | Policy gradient methods for reinforcement learning with function approximation
- Sutton, McAllester, et al.
- 2000
(Show Context)
Citation Context ...π(τ )R(τ )dτ , where T is the set of all possible trajectories with horizonH , R(τ ) = 1 H H∑ t=1 r(st,at, st+1) , and (2) pπ(τ ) = P0(s1) H∏ t=1 P(st+1|st,at) π(at|st) . (3) Policy Gradient methods (=-=Sutton et al. 1999-=-; Peters et al. 2005) represent the agent’s policy π as a function defined over a vector θ ∈ Rd of control parameters and a vector of state features given by the transformation Φ : S → Rm. By substitu... |
314 |
The explicit linear quadratic regulator for constrained systems
- Bemporad, Morari, et al.
- 2002
(Show Context)
Citation Context ...nsfer for Policy Gradients (MAXDT-PG) framework is detailed1 in Algorithm 1. Special Cases Our work can be seen as an extension of the simpler modelbased case with a linear-quadratic regulator (LQR) (=-=Bemporad et al. 2002-=-) policy, which is derived and explained in the online appendix2 accompanying this paper. Although the assumptions made by the model-based case seem restrictive, the analysis in the appendix covers a ... |
116 | Policy gradient methods for robotics - Peters, Schaal - 2006 |
111 |
A survey of robot learning from demonstration,” Robot
- Argall, Chernova, et al.
- 2009
(Show Context)
Citation Context ...rate learning. Since policy gradient methods are prone to becoming stuck in local maxima, it is crucial that the policy be initialized in a sensible fashion. A common technique (Peters & Schaal 2006; =-=Argall et al. 2009-=-) for policy initialization is to first collect demonstrations from a human controlling the system, then use supervised learning to fit policy parameters that maximize the likelihood of the human-demo... |
100 |
Transfer learning for reinforcement learning domains: A survey
- Taylor, Stone
(Show Context)
Citation Context ...transfer learning (TL) to initialize the policy for a new target domain based on knowledge from one or more source tasks. In RL transfer, the source and target tasks may differ in their formulations (=-=Taylor & Stone 2009-=-). In particular, Copyright c© 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. when the source and target tasks have different state and/or action... |
89 | Natural actor-critic
- Peters, Vijayakumar, et al.
- 2005
(Show Context)
Citation Context ... T is the set of all possible trajectories with horizonH , R(τ ) = 1 H H∑ t=1 r(st,at, st+1) , and (2) pπ(τ ) = P0(s1) H∏ t=1 P(st+1|st,at) π(at|st) . (3) Policy Gradient methods (Sutton et al. 1999; =-=Peters et al. 2005-=-) represent the agent’s policy π as a function defined over a vector θ ∈ Rd of control parameters and a vector of state features given by the transformation Φ : S → Rm. By substituting this parameteri... |
87 | Reinforcement learning and dynamic programming using function approximators - Busoniu, Babuska, et al. - 2010 |
63 | Transfer learning via inter-task mappings for temporal difference learning
- Taylor, Stone, et al.
(Show Context)
Citation Context ...015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. when the source and target tasks have different state and/or action spaces, an inter-task mapping (=-=Taylor et al. 2007-=-a) that describes the relationship between the two tasks is typically needed. This paper introduces a framework for autonomously learning an inter-task mapping for cross-domain transfer in policy grad... |
55 | Building portable options: Skill transfer in reinforcement learning
- Konidaris, Barto
(Show Context)
Citation Context ...very different tasks (Taylor & Stone 2009). However, the majority of existing work assumes that a) the source task and target task are similar enough that no mapping is needed (Banerjee & Stone 2007; =-=Konidaris & Barto 2007-=-), or b) an inter-task mapping is provided to the agent (Taylor et al. 2007a; Torrey et al. 2008). The main difference between these methods and this paper is that we are interested in learning a mapp... |
51 | Design and control of quadrotors with application to autonomous flying - Bouabdallah - 2007 |
47 | Value-function-based transfer for reinforcement learning using structure mapping - Liu, Stone - 2006 |
42 |
Transfer via inter-task mappings in policy search reinforcement learning
- Taylor, Whiteson, et al.
- 2007
(Show Context)
Citation Context ...015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. when the source and target tasks have different state and/or action spaces, an inter-task mapping (=-=Taylor et al. 2007-=-a) that describes the relationship between the two tasks is typically needed. This paper introduces a framework for autonomously learning an inter-task mapping for cross-domain transfer in policy grad... |
38 | Reinforcement learning in robotics: a survey
- Kober, Bagnell, et al.
- 2013
(Show Context)
Citation Context ...ndomly initialized policy, as shown in our experiments. Although we focus on policy gradient methods, our approach could easily be adapted to other policy search methods (e.g., PoWER, REPS, etc.; see =-=Kober et al. 2013-=-). Learning an Inter-State Mapping Unsupervised Manifold Alignment (UMA) is a technique that efficiently discovers an alignment between two datasets (Wang & Mahadevan 2009). UMA was developed to align... |
31 | Autonomous transfer for reinforcement learning
- Taylor, Kuhlmann, et al.
- 2008
(Show Context)
Citation Context ...ween two tasks (Liu & Stone 2006), background knowledge about the range or type of state variables (Taylor et al. 2007b), or transition models for each possible mapping could be generated and tested (=-=Taylor et al. 2008-=-). However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis o... |
29 | General game learning using knowledge transfer
- Banerjee, Stone
- 2007
(Show Context)
Citation Context ...omous transfer between very different tasks (Taylor & Stone 2009). However, the majority of existing work assumes that a) the source task and target task are similar enough that no mapping is needed (=-=Banerjee & Stone 2007-=-; Konidaris & Barto 2007), or b) an inter-task mapping is provided to the agent (Taylor et al. 2007a; Torrey et al. 2008). The main difference between these methods and this paper is that we are inter... |
26 | Relational macros for transfer in reinforcement learning
- Torrey, Shavlik, et al.
- 2007
(Show Context)
Citation Context ...he source task and target task are similar enough that no mapping is needed (Banerjee & Stone 2007; Konidaris & Barto 2007), or b) an inter-task mapping is provided to the agent (Taylor et al. 2007a; =-=Torrey et al. 2008-=-). The main difference between these methods and this paper is that we are interested in learning a mapping between tasks. There has been some recent work on learning such mappings. For example, mappi... |
10 | Local Procrustes for manifold embedding: a measure of embedding quality and embedding algorithms. Machine Learning 77(1 - Goldberg, Ritov |
9 | Reinforcement learning transfer via sparse coding
- Ammar, Taylor, et al.
- 2012
(Show Context)
Citation Context ...ing from an exponential explosion. In previous work, we used sparse coding, sparse projection, and sparse Gaussian processes to learn an inter-task mapping between MDPs with arbitrary variations (Bou =-=Ammar et al. 2012-=-). However, this previous work relied on a Euclidean distance correlation between source and target task triplets, which may fail for highly dissimilar tasks. Additionally, it placed restrictions on t... |
7 |
Natural actor-critic. Neurocomputing
- Peters, Schaal
- 2008
(Show Context)
Citation Context ...olicy gradient reinforcement learning (RL) algorithms have been applied with considerable success to solve highdimensional control problems, such as those arising in robotic control and coordination (=-=Peters & Schaal 2008-=-). These algorithms use gradient ascent to tune the parameters of a policy to maximize its expected performance. Unfortunately, this gradient ascent procedure is prone to becoming trapped in local max... |
3 | Alignment-based transfer learning for robot models - Bócsi, Csato, et al. - 2013 |
2 | Nonlinear tracking and landing controller for quadrotor aerial robots - Voos, Ammar, et al. |