21 citations found. Retrieving documents...
Wiering, M., Schmidhuber, J.: HQ-learning. Adaptive Behavior 6(2) (1997) 219--246

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Hierarchical Methods for Planning under Uncertainty - Pineau (2001)   (Correct)

....of POMDP solutions. 2.4.2 POMDP hierarchical approaches Preliminary attempts at hierarchical partitioning POMDP problems into subtasks typically make strict assumptions about prior knowledge of low level tasks and ordering, which are substantially restrictive. The HQ learning algorithm [61] learns a sequence of subgoals, assuming that each subgoal is satis ed through a reactive policy, and subgoal completion is fully observable. Castanon [12] addresses a very speci c sensor management problem, in which he decomposes a multi object detection problem into many single object ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2), 1997. 25


Combining Latent Learning with Dynamic Programming in the.. - Gérard, Meyer, Sigaud   (Correct)

....with ambiguous perceptions, ACS uses action sequences. In the LCS framework, several other ways have been explored. CXCS [TB00] builds sequences of classi ers and XCSM [Lan98] uses explicit memory registers to de ne internal states. In the general Reinforcement Learning framework, Wiering [WS97] and Sun [SP00] propose to learn how to divide a non markov problems into several markov ones (see gure 24) These techniques all require that information about changes of the internal state can be associated to transitions between situations. Unfortunately, MACS classi ers do not represent ....

M. Wiering and J. Schmidhuber. Hq-learning. Adaptive Behavior, 6(2):219246, 1997. 26


Propagation of Q-values in Tabular TD(lambda) - Preux (2002)   (Correct)

....does not involve any extra parameter. Second, as a natural extension of tabular TD( algorithms, propagation can be used instead of these algorithms in many places and applications. For example, it can take advantage of techniques to speed up Q( or be used in hierarchical Q( to solve POMDPs [15], Third, though not studied here, existant convergence proofs should be able to be adapted from other algorithms to PTD . Fourth, PTD is worthy when combining reinforcement learning with training (or supervised learning) by avoiding tabular TD to be unable to come back to the taught ....

M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior, 6(2):219246, 1997.


Reinforcement Learning by Policy Search - Peshkin (2000)   (7 citations)  (Correct)

....of previous observations. For many problems this can be quite e ective [36, 47] More generally, we may use a nite size memory, which can possibly be in nite horizon (the systems remembers only a nite number of events, but these events can be arbitrarily far in the past) Wiering and Schmidhuber [64] proposed such an approach, involving learning a policy that is a nite sequence of memoryless policies. Another class of approaches assumes complete knowledge of the underlying process, modeled as a pomdp. Given a model, it is possible to attempt optimal solution [25] or to search for ....

Marco Wiering and Juergen Schmidhuber. HQ-learning. Adaptive Behavior, 6(2), 1997. BIBLIOGRAPHY 76


Using Classifier Systems as Adaptive Expert Systems for Control - Sigaud, Gérard (2000)   (Correct)

....CSs, choosing one among them at each time step. The decision CS shares their inputs with the behavior CSs, but it also needs an information on previous decisions. Its output tells which basic behavior must be red at a given time step. Interestingly, our architecture is very similar to those of [Wiering and Schmidhber, 1997] and [Sun and Sessions, 2000] how both presented learning algorithms devoted to applying adaptive algorithms to them. 4.5 Modularity in our experimental set up The notion of role appears naturally in the strategy we presented in section 2.2. In our solution, at least one agent must push the ....

Wiering, M. and Schmidhber, J. (1997). HQlearning. Adaptive Behavior, 6(2):219246.


Solving POMDPs by Searching the Space of Finite Policies - Meuleau, Kim, Kaelbling.. (1999)   (13 citations)  (Correct)

....will call it, a finite policy graph . Many previous approaches implicitly rely on a similar hypothesis: Some authors [14, 12, 2, 27] search for optimal reactive (or memoryless) policies, McCallum [15, 16] searches the space of policies using a finite horizon memory, Wiering and Schmidhuber s HQL [26] learns finite sequences of reactive policies, and Peshkin et al. 19] look for optimal finiteexternal memory policies. All these examples are particular cases of finite policy graphs, with a set of extra structural constraints in each case (i.e. not every node transition and action choice is ....

....A Monte Carlo approach based on Watkins Q learning [25, 24] is also applicable to our problem. For instance, we can an use Q learning based on observation action pairs to find (with no guarantee of convergence) the optimal RP for a POMDP [14] Another instance is Wiering and Schmidhuber s HQL [26], which learns finite sequences of RPs. U L Figure 3: The load unload problem with 8 locations: the agent starts in the Unload location (U) and receives a reward each time it returns to this place after passing through the Load location (L) The problem is partially observable because the ....

[Article contains additional citation context not shown here]

M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior, 6(2):219--246, 1997.


Learning Finite-State Controllers for Partially.. - Meuleau, Peshkin.. (1999)   (17 citations)  (Correct)

....finite policy graphs. Many existing algorithms for learning to plan in POMDPs rely on a similar assumption. For instance, some researchers [14, 3, 32] try to learn memoryless (or reactive) policies, McCallum s learning algorithm [19, 20] uses a finite horizon memory, Wiering and Schmidhuber s HQL [31] learns finite sequences of reactive policies using an implicit memory of some of the previous observations, and Peshkin et al. 23] look for optimal finite externalmemory policies. All these finite memory architectures correspond to finite policy graphs with a particular structure in each case ....

M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior, 6(2):219--246, 1997.


Learning Policies with External Memory - Peshkin, Meuleau, Kaelbling (1999)   (19 citations)  (Correct)

....of previous observations. For many problems this can be quite e ective [12, 18] More generally, we may use a nite size memory, which can possibly be in nite horizon (the systems remembers only a nite number of events, but these events can be arbitrarily far in the past) Wiering and Schmidhuber [26] proposed such an approach, learning a policy that is a nite sequence of memoryless policies. Another class of approaches assumes complete knowledge of the underlying process, modeled as a partially observable Markov decision process (pomdp) Given a model, it is possible to attempt optimal ....

Marco Wiering and Juergen Schmidhuber. HQlearning. Adaptive Behavior, 6(2), 1997.


Schmidhuber's Lab - Turteltaub (1999)   (Correct)

....100 papers on diverse topics including fine arts [96] and the nature of surprises [97] Apparently he even founded a religion [94] Most of his articles, however, are about machines that learn from experience. I have started to compile an incomplete list of references to work by him and his lab [117, 116, 39, 50, 40, 42, 43, 41, 52, 49, 56, 44, 54, 47, 48, 51, 53, 57, 46, 68, 45, 55, 69, 64, 65, 59, 66, 58, 67, 60, 63, 61, 73, 71, 79, 70, 74, 62, 72, 75, 78, 82, 80, 76, 81, 77, 84, 89, 88, 94, 87, 85, 96, 83, 100, 86, 90, 99, 91, 93, 105, 119, 95, 92, 97, 120, 118, 98, 125, 130, 129, 126, 128, 124, 123, 122, 131, 127, 35, 34, 36, 38, 32, 33, 37, 27, 28, 25, 24, 22, 23, 15, 9, 21, 10, 16, 26, 17, 18, 6, 7, 8, 13, 11, 20, 19, 14, 12, 115, 114, 121, 30, 106, 108, 107, 29, 31, 109, 110, 111, 112, 113, 5, 101, 103, 104, 4, 3, 2, 1, 102]. Hopefully I ll be able to add missing entries soon. Future work will concentrate on categorizing related papers and establishing common threads. ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219--246, 1998.


Hierarchical Reinforcement Learning of Low-Dimensional.. - Morimoto, Doya (1998)   (1 citation)  (Correct)

....modules explore a large area using a representation of the low dimensional state space. In contrast, the lower learning modules explore small areas in the original high dimensional state space. Hierarchical reinforcement learning using subgoals has been applied to discrete maze search tasks [6, 7] and to the sequential control of a two link robot arm [8] In [7] the subgoals were movement directions in a coarse grain state space. In [6] and [8] the subgoals were specific points in a lower level state space. One problem in applying a hierarchical method to high dimensional problems is ....

....areas in the original high dimensional state space. Hierarchical reinforcement learning using subgoals has been applied to discrete maze search tasks [6, 7] and to the sequential control of a two link robot arm [8] In [7] the subgoals were movement directions in a coarse grain state space. In [6] and [8] the subgoals were specific points in a lower level state space. One problem in applying a hierarchical method to high dimensional problems is that even a very coarse division of the state space, e.g. two in each variable, can result in a very large number of states in the upper level. ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2), 1997.


Reinforcement learning of dynamic motor sequence: Learning to .. - Morimoto, Doya   (Correct)

....The basic idea is to divide a highly non linear task in a high dimensional space into two levels: local optimization in the high dimensional space and global exploration in a lower dimensional space. Hierarchical reinforcement learning using subgoals has been applied to discrete maze search tasks [10, 3] and to sequential control of a two link robot arm [9] In [3] the subgoals were movement direction in a coarse grain state space. In [10] and [9] the subgoals were specific points in the lower level state space. In our stand up task, we take only the joint and pitch angles of intermediate ....

....space and global exploration in a lower dimensional space. Hierarchical reinforcement learning using subgoals has been applied to discrete maze search tasks [10, 3] and to sequential control of a two link robot arm [9] In [3] the subgoals were movement direction in a coarse grain state space. In [10] and [9] the subgoals were specific points in the lower level state space. In our stand up task, we take only the joint and pitch angles of intermediate postures as the subgoals while other variables, such as angular velocities, are left unspecified. This enables the learning of subgoals in a ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2), 1997.


Module-Based Reinforcement Learning: Experiments.. - Kalmar.. (1998)   (1 citation)  (Correct)

....to mention just a few examples. Inventing subgoals, macro operators and hierarchies turns to be a more difficult problem. Some results in this direction include those of Thrun and Schwartz (1995) and T oth, Kov acs, and orincz (1995) whose algorithms learn skills useful in multiple tasks, or Wiering and Schmidhuber (1997) who deal with partial observability and follow a divide and conquer approach. Switching controls have also received considerable atten MODULE BASED REINFORCEMENT LEARNING 3 tion among traditional control theorists under the name hybrid control (Brockett, 1993; Grossman, Nerode, Ravn, and ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2), 1997.


Hierarchical Reinforcement Learning Based on Subgoal.. - Bakker, Schmidhuber (2004)   Self-citation (Schmidhuber)   (Correct)

No context found.

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6:2, 1997.


Evolutionary Computation versus Reinforcement Learning - Schmidhuber (2000)   Self-citation (Schmidhuber)   (Correct)

....environments [12, 19] but they (a) are practically limited to very small problems [18] and (b) do require knowledge of a discrete state space model of the environment. To various degrees, problem (b) also holds for certain hierarchical RL approaches to memory based input disambiguation [24, 25, 26, 20, 50]. Although no discrete models are necessary for DPRL systems with function approximators based on recurrent neural networks [31, 17] the latter do suffer from a lack of theoretical foundation, perhaps even more so than the backgammon player. EC, however, does not care at all for Markovian ....

....DPRL. Some researchers address the case where an external teacher provides intermediate subgoals and or prewired macro actions consisting of sequences of lower level actions [21, 48, 44, 39, 10, 7, 45] Others focus on the more ambitious goal of automatically learning useful subgoals and macros [30, 8, 24, 26, 5, 50, 42]. Most current work in hierarchical DPRL aims at speeding up credit assignment in fully observable environments. Approaches like HQ learning [50] however, additionally achieve a qualitative (as opposed to just quantitative) decomposition by learning to decompose problems that cannot be solved at ....

[Article contains additional citation context not shown here]

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219--246, 1998.


Market-Based Reinforcement Learning in Partially.. - Kwee, Hutter, Schmidhuber (2001)   (1 citation)  Self-citation (Schmidhuber)   (Correct)

....and memorize important past events in complex, partially observable settings, such as in the introductory example above. Several recent, non standard RL approaches in principle are able to deal with partially observable environments and can learn to memorize certain types of relevant events [7, 8, 6, 3, 9, 10, 13, 18, 17, 14]. None of them, however, represents a satisfactory solution to the general problem of learning in worlds described by partially observable Markov Decision Processes This work was supported by SNF grants 21 55409.98 and 2000 61847.00 (POMDPs) The approaches are either ad hoc, or work in ....

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219-246, 1998.


Imitative Reinforcement Learning for Soccer Playing Robots - Latzke, Behnke, Bennewitz   (Correct)

No context found.

Wiering, M., Schmidhuber, J.: HQ-learning. Adaptive Behavior 6(2) (1997) 219--246


Reinforcement Learning by Policy Search - Peshkin (2001)   (7 citations)  (Correct)

No context found.

Marco Wiering and Juergen Schmidhuber, HQ-learning, Adaptive Behavior 6 (1998), no. 2, 219-246.


Reinforcement Learning for Problems with Hidden State - Hasinoff (2003)   (Correct)

No context found.

M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior, 6(2):219--246, 1998.


An Experimental Comparison between ATNoSFERES and ACS - Landau, Sigaud, Picault.. (2003)   (Correct)

No context found.

M. Wiering and J. Schmidhber. HQ-Learning. Adaptive Behavior, 6(2):219246, 1997.


Stochastic Local Search for POMDP Controllers - Braziunas, Boutilier (2004)   (2 citations)  (Correct)

No context found.

Marco Wiering and Juergen Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219--246, 1997.


Q-Cut - Dynamic Discovery of Sub-Goals in Reinforcement.. - Menache, Mannor, Shimkin   (Correct)

No context found.

M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219--246, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC