Results 1  10
of
36
Knows What It Knows: A Framework For SelfAware Learning
"... We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract

Cited by 71 (20 self)
 Add to MetaCart
We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcementlearning and activelearning problems. We catalog several KWIKlearnable classes and open problems. 1.
Reinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis
"... Editor: We study the problem of learning nearoptimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PACMDP ” algorithms include the wellknown E 3 and RMAX algorithms as well as the more recent Delayed Qlearning algorithm. We summarize the current ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
Editor: We study the problem of learning nearoptimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PACMDP ” algorithms include the wellknown E 3 and RMAX algorithms as well as the more recent Delayed Qlearning algorithm. We summarize the current stateoftheart by presenting bounds for the problem in a unified theoretical framework. We also present a more refined analysis that yields insight into the differences between the modelfree Delayed Qlearning and the modelbased RMAX. Finally, we conclude with open problems.
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Exploration in Relational Domains for Modelbased Reinforcement Learning
, 2012
"... A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficien ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a wellknown context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a nearoptimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.
Automatic Feature Selection for ModelBased Reinforcement Learning in Factored MDPs
"... Abstract—Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection specifically designed for the challenges of reinforcement learning. In our method, the agent learns a model, represented as a dynamic Bayesian network, of a factored Markov decision process, deduces a minimal feature set from this network, and efficiently computes a policy on this feature set using dynamic programming methods. Experiments in a stocktrading benchmark task demonstrate that this approach can reliably deduce minimal feature sets and that doing so can substantially improve performance and reduce the computational expense of planning. KeywordsReinforcement learning; feature selection; factored MDPs I.
Efficient learning of relational models for sequential decision making
, 2010
"... The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in actionschema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcementlearning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link
Rational general reinforcement learning
"... We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be nearoptimal for all but O(N log 2 N) timesteps with high probability. Infinite classes are also considered where we show ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be nearoptimal for all but O(N log 2 N) timesteps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform samplecomplexity bounds. A matching lower bound is given for the finite case. 1.
Online Feature Selection for Modelbased Reinforcement Learning Trung Thanh Nguyen
"... We propose a new framework for learning the world dynamics of featurerich environments in modelbased reinforcement learning. The main idea is formalized as a new, factored statetransition representation that supports efficient onlinelearning of the relevant features. We construct the transition ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We propose a new framework for learning the world dynamics of featurerich environments in modelbased reinforcement learning. The main idea is formalized as a new, factored statetransition representation that supports efficient onlinelearning of the relevant features. We construct the transition models through predicting how the actions change the world. We introduce an online sparse coding learning technique for feature selection in highdimensional spaces. We derive theoretical guarantees for our framework and empirically demonstrate its practicality in both simulated and real robotics domains. 1.
Regret Bounds for Reinforcement Learning with Policy Advice
"... Abstract. In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best p ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sublinear regret of Õ( T) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided. 1