• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Knows What It Knows: A Framework for Self-Aware Learning. In (2008)

by L Li, M Littman, T Walsh
Venue:ICML,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 71
Next 10 →

Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010)

by Jürgen Schmidhuber
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990-) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditio ..."
Abstract - Cited by 75 (16 self) - Add to MetaCart
The simple but general formal theory of fun & intrinsic motivation & creativity (1990-) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but non-optimal implementations (1991, 1995, 1997-2002) are reviewed, as well as several recent variants by others (2005-). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
(Show Context)

Citation Context

...he computational cost of learning new skills, e.g., [79]. While others recently have started to study active RL as well, e.g., Brafman and Tennenholtz (R-MAX Algorithm [10]), Li et al.(KWIK-framework =-=[44]-=-), and Strehl et al. [112], our more general systems measure and maximize algorithmic [37], [45], [80], [110] novelty (learnable, but previously unknown compressibility or predictability) of self-gene...

Integrating Sample-based Planning and Model-based Reinforcement Learning

by Thomas J. Walsh, Sergiu Goschin, Michael L. Littman
"... Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes ..."
Abstract - Cited by 38 (5 self) - Add to MetaCart
Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes a near optimal policy, and while many traditional MDP algorithms make this guarantee, their computation time grows with the number of states. We show how to replace these over-matched planners with a class of sample-based planners—whose computation time is independent of the number of states—without sacrificing the sampleefficiency guarantees of the overall learning algorithms. To do so, we define sufficient criteria for a sample-based planner to be used in such a learning system and analyze two popular sample-based approaches from the literature. We also introduce our own sample-based planner, which combines the strategies from these algorithms and still meets the criteria for integration into our learning system. In doing so, we define the first complete RL solution for compactly represented (exponentially sized) state spaces with efficiently learnable dynamics that is both sample efficient and whose computation time does not grow rapidly with the number of states.

The Adaptive k-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning

by Carlos Diuk, Lihong Li, Bethany R. Leffler
"... The purpose of this paper is three-fold. First, we formalize and study a problem of learning probabilistic concepts in the recently proposed KWIK framework. We give details of an algorithm, known as the Adaptive k-Meteorologists Algorithm, analyze its sample-complexity upper bound, and give a matchi ..."
Abstract - Cited by 36 (6 self) - Add to MetaCart
The purpose of this paper is three-fold. First, we formalize and study a problem of learning probabilistic concepts in the recently proposed KWIK framework. We give details of an algorithm, known as the Adaptive k-Meteorologists Algorithm, analyze its sample-complexity upper bound, and give a matching lower bound. Second, this algorithm is used to create a new reinforcement-learning algorithm for factoredstate problems that enjoys significant improvement over the previous state-of-the-art algorithm. Finally, we apply the Adaptive k-Meteorologists Algorithm to remove a limiting assumption in an existing reinforcement-learning algorithm. The effectiveness of our approaches is demonstrated empirically in a couple benchmark domains as well as a robotics navigation problem. 1.
(Show Context)

Citation Context

...le to belong to a class with certain probability. While Kearns and Schapire (1994) study PAC-learning of probabilistic concepts, this paper considers learning in the recently proposed KWIK framework (=-=Li et al., 2008-=-). The first contribution of this paper is to formalize two KWIK learning problems, known as the k-Meteorologists and the Adaptive k-Meteorologists, expand an algorithmic idea introduced by Li et al. ...

An object-oriented representation for efficient reinforcement learning

by Carlos Diuk, Andre Cohen, Michael L. Littman - Proceedings of 25 th International Conference on Machine Learning, Finland 14 | P a g e , 2008
"... Rich representations in reinforcement learning have been studied for the purpose of enabling generalization and making learning feasible in large state spaces. We introduce Object-Oriented MDPs (OO-MDPs), a representation based on objects and their interactions, which is a natural way of modeling en ..."
Abstract - Cited by 35 (7 self) - Add to MetaCart
Rich representations in reinforcement learning have been studied for the purpose of enabling generalization and making learning feasible in large state spaces. We introduce Object-Oriented MDPs (OO-MDPs), a representation based on objects and their interactions, which is a natural way of modeling environments and offers important generalization opportunities. We introduce a learning algorithm for deterministic OO-MDPs and prove a polynomial bound on its sample complexity. We illustrate the performance gains of our representation and algorithm in the wellknown Taxi domain, plus a real-life videogame. 1.
(Show Context)

Citation Context

...ich work as follows. Using examples of transitions (s, a, s ′), a learning algorithm constructs the transition model T . The learning algorithm must satisfy the KWIK (knows what it knows) conditions (=-=Li et al., 2008-=-), which say: (1) all predictions must be accurate (assuming a valid hypothesis class), and (2) however, the learning algorithm may also return ⊥, which indicates that it cannot yet predict the output...

Robust Bounds for Classification via Selective Sampling

by Nicolò Cesa-bianchi, Francesco Orabona
"... We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous margin-based semisupervised algorithms, our sampling condition ..."
Abstract - Cited by 25 (6 self) - Add to MetaCart
We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous margin-based semisupervised algorithms, our sampling condition hinges on a simultaneous upper bound on bias and variance of the RLS estimate under a simple linear label noise model. This fact allows us to prove performance bounds that hold for an arbitrary sequence of instances. In particular, we show that our sampling strategy approximates the margin of the Bayes optimal classifier to any desired accuracy ε by asking Õ ( d/ε2) queries (in the RKHS case d is replaced by a suitable spectral quantity). While these are the standard rates in the fully supervised i.i.d. case, the best previously known result in our harder setting was Õ ( d3 /ε4). Preliminary experiments show that some of our algorithms also exhibit a good practical performance. 1.
(Show Context)

Citation Context

...mechanism generating labels and instances; however, they are unable prove bounds on the label query rate as we do here. The KWIK model of (Strehl & Littman, 2008) —see also the more general setup in (=-=Li et al., 2008-=-)— is closest to the setting considered in this paper. There the goal is to approximate the Bayes margin to within a given accuracy ε. The authors assume arbitrary sequences of instances and the same ...

The Strategic Student Approach for LifeLong Exploration and Learning

by Manuel Lopes, Pierre-yves Oudeyer - in IEEE Conference on Development and Learning / EpiRob , 2012
"... Abstract—This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which ..."
Abstract - Cited by 23 (17 self) - Add to MetaCart
Abstract—This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which conditions a strategy where time allocation or learning method is chosen from the easier to the more complex topic is optimal. Then, we show an algorithm, based on multi-armed bandit techniques, that allows empirical online evaluation of learning progress and approximates the optimal solution under more general conditions. Finally, we show that the strategic student problem formulation allows to view in a common framework many previous approaches to active and developmental learning. I.
(Show Context)

Citation Context

...he case where each arm provides an object among k classes. For this we can model each arm as a multinomial distribution and we know that we can learn the task by making a number of queries bounded by =-=[37]-=-: B(, δ) = n82 ln 2n δ . A more complex example is when each model is itself a reinforcement learning problem. The learning curve has been shown to be polynomial ( [18], [37]). This means that the e...

A unifying framework for computational reinforcement learning theory

by Lihong Li , 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervised-learning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract - Cited by 23 (7 self) - Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervised-learning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize long-term utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce short-term utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better game-playing strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach near-optimal strategies

Robust selective sampling from single and multiple teachers

by Ofer Dekel, Claudio Gentile, Karthik Sridharan , 2010
"... We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the ins ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Using a simple online-to-batch conversion technique, our selective sampling algorithm can be converted into a statistical (pool-based) active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label.
(Show Context)

Citation Context

...ronment. Inspired by known online ridge regression algorithms (e.g., (Hoerl & Kennard, 1970; Lai & Wei, 1982; Vovk, 2001; Azoury & Warmuth, 2001; Cesa-Bianchi et al., 2003; Cesa-Bianchi et al., 2005; =-=Li et al., 2008-=-; Strehl & Littman, 2008; Cavallanti et al., 2009; Cesa-Bianchi et al., 2009)), we begin by presenting a new robust selective sampling algorithm within the label-noise setting considered in (Cavallant...

Generalizing apprenticeship learning across hypothesis classes

by Thomas J. Walsh, Michael L. Littman, Carlos Diuk, Thomas J. Walsh, Kaushik Subramanian, Michael L. Littman, Carlos Diuk - In ICML , 2010
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract - Cited by 19 (10 self) - Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
(Show Context)

Citation Context

...Frameworks for Learning Models In this section, we introduce a class of dynamics where PAC-MDP-Trace behavior can be induced. In autonomous reinforcement learning, the recent development of the KWIK (=-=Li et al., 2008-=-) or “Knows What It Knows” framework has unified the analysis of models that can be efficiently learned. The KWIK-learning protocol consists of an agent seeing an infinite stream of inputs xt ∈ X. For...

Provably Efficient Learning with Typed Parametric Models

by Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy
"... To quickly achieve good performance, reinforcement-learning algorithms for acting in large continuous-valued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. ..."
Abstract - Cited by 15 (3 self) - Add to MetaCart
To quickly achieve good performance, reinforcement-learning algorithms for acting in large continuous-valued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. Our algorithm balances this tradeoff by using a stochastic, switching, parametric dynamics representation. We argue that this model characterizes a number of significant, real-world domains, such as robot navigation across varying terrain. We prove that this representational assumption allows our algorithm to be probably approximately correct with a sample complexity that scales polynomially with all problem-specific quantities including the state-space dimension. We also explicitly incorporate the error introduced by approximate planning in our sample complexity bounds, in contrast to prior Probably Approximately Correct (PAC) Markov Decision Processes (MDP) approaches, which typically assume the estimated MDP can be solved exactly. Our experimental results on constructing plans for driving to work using real car trajectory data, as well as a small robot experiment on navigating varying terrain, demonstrate that our dynamics representation enables us to capture real-world dynamics in a sufficient manner to produce good performance.
(Show Context)

Citation Context

...the agent to quickly learn a good strategy. At a high level, our work falls into the category of model-based reinforcement-learning algorithms in which the MDP model (Equation 1) can be KWIK-learned (=-=Li et al., 2008-=-; Li, 2009), and thus it is efficient in exploring the world. The Knows Whats It Knows (KWIK) framework is an alternate learning framework which incorporates characteristics of the Probably Approximat...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University