See this document in CiteSeerX!

Market-Based Reinforcement Learning in Partially Observable Worlds (2001)  (Make Corrections)  (4 citations)
Ivo Kwee, Marcus Hutter, Jürgen Schmidhuber
Proceedings of the International Conference on Artificial Neural Networks (ICANN-2001)



  Home/Search   Context   Related

 
View or download:
idsia.ch/pub/techre...IDSIA1001.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  idsia.ch/techrep (more)
Homepages:  I.Kwee  M.Hutter
  J.Schmidhuber  

Rate this article: (best)
  Comment on this article  
A recent approach to market-based RL (Hayek4) is reimplemented and evaluated in a toy POMDP setting

Abstract: . Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the rst time evaluate it in a toy POMDP setting. (Update)

Similar documents based on text:   More   All
2.4:   Gradient-based Reinforcement Planning in Policy-Search.. - Kwee, Hutter, Schmidhuber (2001)   (Correct)
0.4:   Distribution of Mutual Information from Complete And.. - Hutter, Zaffalon (2004)   (Correct)
0.4:   Optimal Control Using the Tranport Equation: the Liouville.. - Kwee, Schmidhuber (2000)   (Correct)

BibTeX entry:   (Update)

I. Kwee, M. Hutter, and J. Schmidhuber. Market-based reinforcement learning in partially observable worlds. Proceedings of the International Conference on Arti cial Neural Networks (ICANN-2001. http://citeseer.ist.psu.edu/kwee01marketbased.html   More

@article{ hutter:01market,
  author =       "Ivo Kwee and Marcus Hutter and Juergen Schmidhuber",
  title =        "Market-Based Reinforcement Learning in Partially Observable Worlds",
  number =       "IDSIA-10-01",
  institution =  "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)",
  address =      "Manno(Lugano), CH",
  month =        aug,
  year =         "2001",
  pages =        "865--873",
  journal =      "Proceedings of the International Conference on Artificial Neural Networks (ICANN-2001)",
  editor =       "Georg Dorffner and Horst Bishof and Kurt Hornik",
  publisher =    "Springer",
  series =       "Lecture Notes in Computer Science (LNCS 2130)",
  url =          "citeseer.ist.psu.edu/kwee01marketbased.html",
  url =          "http://www.hutter1.de/ai/pmarket.htm",
  url2 =         "http://arxiv.org/abs/cs.AI/0105025",
  ftp =          "ftp://ftp.idsia.ch/pub/techrep/IDSIA-10-01.ps.gz",
  keywords =     "Hayek system; reinforcement learning; partial observable environment",
                 it in a toy POMDP setting." }
Citations (may not include all citations):
281   Machine Learning (context) - Watkins, Dayan - 1992
187   Planning and acting in partially observable stochastic domai.. - Kaelbling, Littman et al. - 1995
113   Learning policies for partially observable environments: Sca.. - Littman, Cassandra et al. - 1995
81   Reinforcement learning algorithm for partially observable Ma.. - Jaakkola, Singh et al. - 1995
59   Overcoming incomplete perception with utile distinction memo.. - McCallum - 1993
41   Reinforcement learning in Markovian and nonMarkovian environ.. - Schmidhuber - 1991
39   Continual Learning in Reinforcement Environments - Ring - 1994
36   Learning to predict by the methods of temporal di erences (context) - Sutton - 1988
33   Shifting inductive bias with success-story algorithm - Schmidhuber, Zhao et al. - 1997
23   Solving POMDPs with Levin search and EIRA - Wiering, Schmidhuber - 1996
21   Adaptive Behavior (context) - Wiering, Schmidhuber - 1998
20   Evolutionary principles in self-referential learning (context) - Schmidhuber - 1987
20   A local learning algorithm for dynamic feedforward and recur.. - Schmidhuber - 1989
17   Properties of the bucket brigade (context) - Holland - 1985
12   ZCS: A zeroth level classier system (context) - Wilson - 1994

[Article contains additional citations not shown here]

Documents on the same site (http://www.idsia.ch/techrep.html):   More
Improvements and Comparison of Heuristics for.. - Brimberg, Hansen, .. (1997)   (Correct)
Learning to Forget: Continual Prediction with LSTM - Felix A. Gers, Jürgen.. (1999)   (Correct)
Online Local Gain Adaptation for Multi-Layer Perceptrons - Schraudolph (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC