A recent approach to market-based RL (Hayek4) is reimplemented and evaluated in a toy POMDP setting
Abstract: . Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the rst time evaluate it in a toy POMDP setting. (Update)
Similar documents based on text: More All
2.4: Gradient-based Reinforcement Planning in Policy-Search.. - Kwee, Hutter, Schmidhuber (2001)
(Correct)
0.4: Distribution of Mutual Information from Complete And.. - Hutter, Zaffalon (2004)
(Correct)
0.4: Optimal Control Using the Tranport Equation: the Liouville.. - Kwee, Schmidhuber (2000)
(Correct)
BibTeX entry: (Update)
I. Kwee, M. Hutter, and J. Schmidhuber. Market-based reinforcement learning in partially observable worlds. Proceedings of the International Conference on Arti cial Neural Networks (ICANN-2001. http://citeseer.ist.psu.edu/kwee01marketbased.html More
@article{ hutter:01market,
author = "Ivo Kwee and Marcus Hutter and Juergen Schmidhuber",
title = "Market-Based Reinforcement Learning in Partially Observable Worlds",
number = "IDSIA-10-01",
institution = "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)",
address = "Manno(Lugano), CH",
month = aug,
year = "2001",
pages = "865--873",
journal = "Proceedings of the International Conference on Artificial Neural Networks (ICANN-2001)",
editor = "Georg Dorffner and Horst Bishof and Kurt Hornik",
publisher = "Springer",
series = "Lecture Notes in Computer Science (LNCS 2130)",
url = "citeseer.ist.psu.edu/kwee01marketbased.html",
url = "http://www.hutter1.de/ai/pmarket.htm",
url2 = "http://arxiv.org/abs/cs.AI/0105025",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-10-01.ps.gz",
keywords = "Hayek system; reinforcement learning; partial observable environment",
it in a toy POMDP setting." }
Citations (may not include all citations):
281
Machine Learning (context) - Watkins, Dayan - 1992
187
Planning and acting in partially observable stochastic domai..
- Kaelbling, Littman et al. - 1995
113
Learning policies for partially observable environments: Sca..
- Littman, Cassandra et al. - 1995
81
Reinforcement learning algorithm for partially observable Ma..
- Jaakkola, Singh et al. - 1995
59
Overcoming incomplete perception with utile distinction memo..
- McCallum - 1993
41
Reinforcement learning in Markovian and nonMarkovian environ..
- Schmidhuber - 1991
39
Continual Learning in Reinforcement Environments
- Ring - 1994
36
Learning to predict by the methods of temporal dierences (context) - Sutton - 1988
33
Shifting inductive bias with success-story algorithm
- Schmidhuber, Zhao et al. - 1997
23
Solving POMDPs with Levin search and EIRA
- Wiering, Schmidhuber - 1996
21
Adaptive Behavior (context) - Wiering, Schmidhuber - 1998
20
Evolutionary principles in self-referential learning (context) - Schmidhuber - 1987
20
A local learning algorithm for dynamic feedforward and recur..
- Schmidhuber - 1989
17
Properties of the bucket brigade (context) - Holland - 1985
12
ZCS: A zeroth level classier system (context) - Wilson - 1994
[Article contains additional citations not shown here]
Documents on the same site (http://www.idsia.ch/techrep.html): More
Improvements and Comparison of Heuristics for.. - Brimberg, Hansen, .. (1997)
(Correct)
Learning to Forget: Continual Prediction with LSTM - Felix A. Gers, Jürgen.. (1999)
(Correct)
Online Local Gain Adaptation for Multi-Layer Perceptrons - Schraudolph (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC