Results 1  10
of
69
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditio ..."
Abstract

Cited by 75 (16 self)
 Add to MetaCart
(Show Context)
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Integrating Samplebased Planning and Modelbased Reinforcement Learning
"... Recent advancements in modelbased reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Recent advancements in modelbased reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes a near optimal policy, and while many traditional MDP algorithms make this guarantee, their computation time grows with the number of states. We show how to replace these overmatched planners with a class of samplebased planners—whose computation time is independent of the number of states—without sacrificing the sampleefficiency guarantees of the overall learning algorithms. To do so, we define sufficient criteria for a samplebased planner to be used in such a learning system and analyze two popular samplebased approaches from the literature. We also introduce our own samplebased planner, which combines the strategies from these algorithms and still meets the criteria for integration into our learning system. In doing so, we define the first complete RL solution for compactly represented (exponentially sized) state spaces with efficiently learnable dynamics that is both sample efficient and whose computation time does not grow rapidly with the number of states.
The Adaptive kMeteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning
"... The purpose of this paper is threefold. First, we formalize and study a problem of learning probabilistic concepts in the recently proposed KWIK framework. We give details of an algorithm, known as the Adaptive kMeteorologists Algorithm, analyze its samplecomplexity upper bound, and give a matchi ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
(Show Context)
The purpose of this paper is threefold. First, we formalize and study a problem of learning probabilistic concepts in the recently proposed KWIK framework. We give details of an algorithm, known as the Adaptive kMeteorologists Algorithm, analyze its samplecomplexity upper bound, and give a matching lower bound. Second, this algorithm is used to create a new reinforcementlearning algorithm for factoredstate problems that enjoys significant improvement over the previous stateoftheart algorithm. Finally, we apply the Adaptive kMeteorologists Algorithm to remove a limiting assumption in an existing reinforcementlearning algorithm. The effectiveness of our approaches is demonstrated empirically in a couple benchmark domains as well as a robotics navigation problem. 1.
An objectoriented representation for efficient reinforcement learning
 Proceedings of 25 th International Conference on Machine Learning, Finland 14  P a g e
, 2008
"... Rich representations in reinforcement learning have been studied for the purpose of enabling generalization and making learning feasible in large state spaces. We introduce ObjectOriented MDPs (OOMDPs), a representation based on objects and their interactions, which is a natural way of modeling en ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
(Show Context)
Rich representations in reinforcement learning have been studied for the purpose of enabling generalization and making learning feasible in large state spaces. We introduce ObjectOriented MDPs (OOMDPs), a representation based on objects and their interactions, which is a natural way of modeling environments and offers important generalization opportunities. We introduce a learning algorithm for deterministic OOMDPs and prove a polynomial bound on its sample complexity. We illustrate the performance gains of our representation and algorithm in the wellknown Taxi domain, plus a reallife videogame. 1.
Robust Bounds for Classification via Selective Sampling
"... We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous marginbased semisupervised algorithms, our sampling condition ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous marginbased semisupervised algorithms, our sampling condition hinges on a simultaneous upper bound on bias and variance of the RLS estimate under a simple linear label noise model. This fact allows us to prove performance bounds that hold for an arbitrary sequence of instances. In particular, we show that our sampling strategy approximates the margin of the Bayes optimal classifier to any desired accuracy ε by asking Õ ( d/ε2) queries (in the RKHS case d is replaced by a suitable spectral quantity). While these are the standard rates in the fully supervised i.i.d. case, the best previously known result in our harder setting was Õ ( d3 /ε4). Preliminary experiments show that some of our algorithms also exhibit a good practical performance. 1.
The Strategic Student Approach for LifeLong Exploration and Learning
 in IEEE Conference on Development and Learning / EpiRob
, 2012
"... Abstract—This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which ..."
Abstract

Cited by 23 (17 self)
 Add to MetaCart
(Show Context)
Abstract—This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which conditions a strategy where time allocation or learning method is chosen from the easier to the more complex topic is optimal. Then, we show an algorithm, based on multiarmed bandit techniques, that allows empirical online evaluation of learning progress and approximates the optimal solution under more general conditions. Finally, we show that the strategic student problem formulation allows to view in a common framework many previous approaches to active and developmental learning. I.
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Robust selective sampling from single and multiple teachers
, 2010
"... We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the ins ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Using a simple onlinetobatch conversion technique, our selective sampling algorithm can be converted into a statistical (poolbased) active learning algorithm. We extend our algorithm and analysis to the multipleteacher setting, where the algorithm can choose which subset of teachers to query for each label.
Generalizing apprenticeship learning across hypothesis classes
 In ICML
, 2010
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Provably Efficient Learning with Typed Parametric Models
"... To quickly achieve good performance, reinforcementlearning algorithms for acting in large continuousvalued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
To quickly achieve good performance, reinforcementlearning algorithms for acting in large continuousvalued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. Our algorithm balances this tradeoff by using a stochastic, switching, parametric dynamics representation. We argue that this model characterizes a number of significant, realworld domains, such as robot navigation across varying terrain. We prove that this representational assumption allows our algorithm to be probably approximately correct with a sample complexity that scales polynomially with all problemspecific quantities including the statespace dimension. We also explicitly incorporate the error introduced by approximate planning in our sample complexity bounds, in contrast to prior Probably Approximately Correct (PAC) Markov Decision Processes (MDP) approaches, which typically assume the estimated MDP can be solved exactly. Our experimental results on constructing plans for driving to work using real car trajectory data, as well as a small robot experiment on navigating varying terrain, demonstrate that our dynamics representation enables us to capture realworld dynamics in a sufficient manner to produce good performance.