Results 11  20
of
303
Accelerating Reinforcement Learning through Implicit Imitation
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2003
"... Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments ..."
Abstract

Cited by 78 (0 self)
 Add to MetaCart
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments
A Bayesian Sampling Approach to Exploration in Reinforcement Learning
"... We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by pr ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
(Show Context)
We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to stateoftheart reinforcementlearning approaches and illustrate its flexibility by pairing it with a nonparametric model that generalizes across states. 1
Knows What It Knows: A Framework For SelfAware Learning
"... We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract

Cited by 67 (20 self)
 Add to MetaCart
(Show Context)
We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcementlearning and activelearning problems. We catalog several KWIKlearnable classes and open problems. 1.
NearBayesian exploration in polynomial time (full version). Available at http://ai.stanford.edu/˜kolter
, 2009
"... We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to modelbased RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intr ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to modelbased RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform ǫclose to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the socalled PACMDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we can achieve lower sample complexity bounds than existing algorithms, while using an exploration strategy that is much greedier than the (extremely cautious) exploration of PACMDP algorithms. 1.
PAC bounds for multiarmed bandit and Markov decision processes
 In Fifteenth Annual Conference on Computational Learning Theory (COLT
, 2002
"... Abstract. The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O ( n ɛ2 log 1) times to find an ɛoptimal δ arm with probability of at least 1 − δ. This is in contrast to the naive bound of O ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O ( n ɛ2 log 1) times to find an ɛoptimal δ arm with probability of at least 1 − δ. This is in contrast to the naive bound of O ( n ɛ2 log n). We derive another algorithm whose complexity δ depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model MultiArmed Bandit problem, one can derive a batch learning algorithm for Markov Decision Processes. This is done essentially by simulating Value Iteration, and in each iteration invoking the multiarmed bandit algorithm. Using our PAC algorithm for the multiarmed bandit problem we improve the dependence on the number of actions. 1
Efficient structure learning in factoredstate MDPs
, 2007
"... We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
(Show Context)
We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns the DBN structures as part of the reinforcementlearning process and provably provides an efficient learning algorithm when combined with factored Rmax.
Pac modelfree reinforcement learning
 In: ICML06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed QLearning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorith ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
(Show Context)
For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed QLearning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience—no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Qlearning’s perexperience computation cost is much less than that of previous PAC algorithms. 1.
Bayesian sparse sampling for online reward optimization
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making whil ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
(Show Context)
We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior—rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios. 1.
Efficient learning equilibrium
 In Proceedings of NIPS
, 2002
"... We introduce ecient learning equilibrium (ELE), a normative approach to learning in noncooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
(Show Context)
We introduce ecient learning equilibrium (ELE), a normative approach to learning in noncooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from the prescribed ELE become irrational after polynomial time. We prove the existence of an ELE (where the desired value is the expected payoff in a Nash equilibrium) and of a ParetoELE (where the objective is the maximization of social surplus) in repeated games with perfect monitoring. We also show that an ELE does not always exist in the imperfect monitoring case. Finally, we discuss the extension of these results to generalsum stochastic games.
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.