• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Reinforcement learning with self-modifying policies, in Learning to (1997)

by J Schmidhuber, J Zhao, N Schraudolph
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

An introduction to collective intelligence

by David H. Wolpert, Kagan Tumer - Handbook of Agent technology. AAAI , 1999
"... ..."
Abstract - Cited by 80 (16 self) - Add to MetaCart
Abstract not found

Optimal Ordered Problem Solver

by Jürgen Schmidhuber , 2002
"... We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the ..."
Abstract - Cited by 47 (12 self) - Add to MetaCart
We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the space of domain-specific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience.

A computer scientist’s view of life, the universe, and everything

by Jürgen Schmidhuber - Foundations of Computer Science: Potential - Theory - Cognition , 1997
"... Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, genera ..."
Abstract - Cited by 27 (11 self) - Add to MetaCart
Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, generalization, and learning in a given universe. Preliminaries Assumptions. A long time ago, the Great Programmer wrote a program that runs all possible universes on His Big Computer. “Possible ” means “computable”: (1) Each universe evolves on a discrete time scale. (2) Any universe’s state at a given time is describable by a finite number of bits. One of the many universes is ours, despite some who evolved in it and claim it is incomputable. Computable universes. Let TM denote an arbitrary universal Turing machine with unidirectional output tape. TM’s input and output symbols are “0”, “1”, and “, ” (comma). TM’s possible input programs can be ordered

HQ-Learning

by Marco Wiering, Jürgen Schmidhuber - ADAPTIVE BEHAVIOR , 1997
"... HQ-learning is a hierarchical extension of Q()-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can s ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
HQ-learning is a hierarchical extension of Q()-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work.

What's Interesting?

by Jürgen Schmidhuber , 1997
"... Interestingness depends on the observer's current knowledge and computational abilities. Things are boring if either too much or too little is known about them --- if they appear either trivial or random. Interesting are unexpected regularities that seem easy to figure out. I attempt to implement th ..."
Abstract - Cited by 15 (6 self) - Add to MetaCart
Interestingness depends on the observer's current knowledge and computational abilities. Things are boring if either too much or too little is known about them --- if they appear either trivial or random. Interesting are unexpected regularities that seem easy to figure out. I attempt to implement these ideas in a "curious", "creative" explorer with two coevolving "brains". It executes a lifelong sequence of instructions whose modifiable probabilities are conditioned on both brains --- both must agree on each instruction. There are special instructions for comparing computational results. The brains can predict outcomes of such comparisons. If their opinions differ, then the winner will get rewarded, the loser punished. Hence each brain wants to lure the other into agreeing upon instruction subsequences involving comparisons that surprise it. The surprised brain adapts. In turn, the other loses a source of reward --- an incentive to shift the focus of interest. Both brains deal with the...

Exploring the Predictable

by Jürgen Schmidhuber , 2002
"... Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatio-temporal events. It constructs probabilistic algorithms that (1) control in ..."
Abstract - Cited by 15 (6 self) - Add to MetaCart
Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatio-temporal events. It constructs probabilistic algorithms that (1) control interaction with the world, (2) map event sequences to abstract internal representations (IRs), (3) predict IRs from IRs computed earlier. Its goal is to create novel algorithms generating IRs useful for correct IR predictions, without wasting time on those learned before. This requires an adaptive novelty measure which is implemented by a coevolutionary scheme involving two competing modules collectively designing (initially random) algorithms representing experiments. Using special instructions, the modules can bet on the outcome of IR predictions computed by algorithms they have agreed upon. If their opinions dier then the system checks who's right, punishes the loser (the surprised one), and rewards the winner. An evolutionary or reinforcement learning algorithm forces each module to maximize reward. This motivates both modules to lure each other into agreeing upon experiments involving predictions that surprise it. Since each module essentially can veto experiments it does not consider profitable, the system is motivated to focus on those computable aspects of the environment where both modules still have confident but different opinions. Once both share the same opinion on a particular issue (via the loser's learning process, e.g., the winner is simply copied onto the loser), the winner loses a source of reward -- an incentive to shift the focus of interest onto novel experiments. My simulations include an example where surprise-generation of this kind helps to speed up ...

A survey of collectives

by Kagan Tumer, David Wolpert - IN COLLECTIVES AND THE DESIGN OF COMPLEX SYSTEMS , 2004
"... Due to the increasing sophistication and miniaturization of computational components, complex, distributed systems of interacting agents are becoming ubiquitous. Such systems, where each agent aims to optimize its own performance, but where there is a welldefined set of system-level performance cr ..."
Abstract - Cited by 14 (7 self) - Add to MetaCart
Due to the increasing sophistication and miniaturization of computational components, complex, distributed systems of interacting agents are becoming ubiquitous. Such systems, where each agent aims to optimize its own performance, but where there is a welldefined set of system-level performance criteria, are called collectives. The fundamental problem in analyzing/designing such systems is in determining how the combined actions of a large number of agents leads to “coordinated ” behavior on the global scale. Examples of artificial systems which exhibit such behavior include packet routing across a data network, control of an array of communication satellites, coordination of multiple rovers, and dynamic job scheduling across a distributed computer grid. Examples of natural systems include ecosystems, economies, and the organelles within a living cell. No current scientific discipline provides a thorough understanding of the relation between the structure of collectives and how well they meet their overall performance criteria. Although still very young, research on collectives has resulted in successes both in understanding and designing such systems. It is expected that as it matures and draws upon other disciplines related to collectives, this field will greatly expand the range of computationally addressable tasks. Moreover, in addition to drawing on them, such a fully developed field of collective intelligence may provide insight into already established scientific fields, such as mechanism design, economics, game theory, and population biology. This chapter provides a survey to the emerging science of collectives.

Reinforcement Learning Soccer Teams with Incomplete World Models

by Marco Wiering, Rafal P. Salustowicz, Jürgen Schmidhuber , 1999
"... . We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
. We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q()-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons. Keywords: reinforcement learning, CMAC, world models, simulated soccer, Q(), evolutionary computation, PIPE 1. Introduction Our goal is to build teams of autonomous agents that learn to play soccer from very sparse reinforcement signals: only scoring a goal yields reward for the successful team. Team members try to maximize reward by improv...

Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements

by Jürgen Schmidhuber , 2003
"... An old dream of computer scientists is to build an optimally ecient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Godel's celebrated self-referential formulas (1931). Our Godel machine's initial software includes an axioma ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
An old dream of computer scientists is to build an optimally ecient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Godel's celebrated self-referential formulas (1931). Our Godel machine's initial software includes an axiomatic description of: the Godel machine's hardware, the problem-speci c utility function (such as the expected future reward of a robot), known aspects of the environment, costs of actions and computations, and the initial software itself (this is possible without introducing circularity). It also includes a typically sub-optimal initial problem-solving policy and an asymptotically optimal proof searcher searching the space of computable proof techniques|that is, programs whose outputs are proofs. Unlike previous approaches, the self-referential Godel machine will rewrite any part of its software, including axioms and proof searcher, as soon as it has found a proof that this will improve its future performance, given its typically limited computational resources. We show that self-rewrites are globally optimal|no local minima!|since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for.

H-PIPE: Facilitating Hierarchical Program Evolution through Skip Nodes

by Rafal Salustowicz, Jürgen Schmidhuber , 1998
"... To evolve structured programs we introduce H-PIPE, a hierarchical extension of Probabilistic Incremental Program Evolution (PIPE - Sa/lustowicz and Schmidhuber, 1997). Structure is induced by "hierarchical instructions" (HIs) limited to top-level, structuring program parts. "Skip nodes" (SNs) inspir ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
To evolve structured programs we introduce H-PIPE, a hierarchical extension of Probabilistic Incremental Program Evolution (PIPE - Sa/lustowicz and Schmidhuber, 1997). Structure is induced by "hierarchical instructions" (HIs) limited to top-level, structuring program parts. "Skip nodes" (SNs) inspired by biology's introns (non-coding segments) allow for switching program parts on and off. In our experiments H-PIPE outperforms PIPE, and SNs facilitate synthesis of certain structured programs but not unstructured ones. We conclude that introns can be particularly useful in the presence of structural bias. Keywords: Probabilistic Incremental Program Evolution, Structured Programs, Hierarchical Programs, Introns, Non-Coding Segments. 1 Introduction and Previous Work Overview. Hierarchical Probabilistic Incremental Program Evolution (H-PIPE) is a novel method for synthesizing structured programs. It uses the PIPE paradigm (Sa/lustowicz and Schmidhuber, 1997) to iteratively generate succes...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University