Results 1  10
of
14
Knows What It Knows: A Framework For SelfAware Learning
"... We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract

Cited by 71 (20 self)
 Add to MetaCart
We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcementlearning and activelearning problems. We catalog several KWIKlearnable classes and open problems. 1.
Online regret bounds for undiscounted continuous reinforcement learning
 In Advances in Neural Information Processing Systems NIPS
, 2012
"... We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satis ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Hölder continuity of rewards and transition probabilities. 1
Cover Tree Bayesian Reinforcement Learning
 Journal of Machine Learning Research
"... iv ..."
(Show Context)
Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
"... We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, ..."
Abstract
 Add to MetaCart
We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1dimensional state space a regret bound of ~O(T 3 4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using nonparametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be ~O(T 2 3) asymptotically for reinforcement learning in 1dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space. 1.
Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence Scaling Up Reinforcement Learning through Targeted Exploration
"... Recent Reinforcement Learning (RL) algorithms, such as RMAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State T ..."
Abstract
 Add to MetaCart
(Show Context)
Recent Reinforcement Learning (RL) algorithms, such as RMAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted RMAX (STARMAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STARMAX behaves identically to RMAX. When ξ is a subset of the state space, to keep exploration within ξ, arecovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STARMAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned online and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to RMAX. We expect these results to lead to more efficient methods for RL in largescale problems.
JMLR: Workshop and Conference Proceedings 21:69–83, 2012 The 11th ICGI Integrating Grammatical Inference into Robotic Planning ∗
"... This paper presents a method for the control synthesis of robotic systems in an unknown, dynamic, and adversarial environments. We (1) incorporate a grammatical inference module that identifies the governing dynamics of the adversarial environment and (2) utilize game theory to compute a motion plan ..."
Abstract
 Add to MetaCart
This paper presents a method for the control synthesis of robotic systems in an unknown, dynamic, and adversarial environments. We (1) incorporate a grammatical inference module that identifies the governing dynamics of the adversarial environment and (2) utilize game theory to compute a motion plan for a system given a task specification. The framework is flexible and modular since different games can be formulated for different system objectives and different grammatical inference algorithms can be utilized depending on the abstract nature of the dynamic environment.
Efficient Modelbased Exploration in Continuous Statespace Environments
"... The impetus for exploration in reinforcement learning (RL) is decreasing uncertainty about the environment for the purpose of better decision making. As such, exploration plays a crucial role in the efficiency of RL algorithms. In this dissertation, I consider continuous state control problems and i ..."
Abstract
 Add to MetaCart
The impetus for exploration in reinforcement learning (RL) is decreasing uncertainty about the environment for the purpose of better decision making. As such, exploration plays a crucial role in the efficiency of RL algorithms. In this dissertation, I consider continuous state control problems and introduce a new methodology for representing uncertainty that engenders more efficient algorithms. I argue that the new notion of uncertainty allows for more efficient use of function approximation, which is essential for learning in continuous spaces. In particular, I focus on a class of algorithms referred to as modelbased methods and develop several such algorithms that are much more efficient than the current stateoftheart methods. These algorithms attack the longstanding “curse of dimensionality” — learning complexity often scales exponentially with problem dimensionality. I introduce algorithms that can exploit the dependency structure between state variables to exponentially decrease the sample complexity of learning, both in cases where the dependency structure is provided by the user a priori and cases where the algorithm has to find it on its own. I also use the new uncertainty notion to derive a multiresolution exploration scheme, and demonstrate how this new
JMLR: Workshop and Conference Proceedings 21:1–16, 2012 The 11th International Conference on GI Integrating Grammatical Inference into Robotic Planning ∗
"... This paper shows how grammatical inference (GI) and gametheoretic techniques can be jointly utilized for robotic planning. The planning problem is to find a sequence of robot maneuvers so that a desired task is completed; the maneuvers themselves are assumed to be implemented by some existing lowl ..."
Abstract
 Add to MetaCart
This paper shows how grammatical inference (GI) and gametheoretic techniques can be jointly utilized for robotic planning. The planning problem is to find a sequence of robot maneuvers so that a desired task is completed; the maneuvers themselves are assumed to be implemented by some existing lowlevel controllers. The challenge here is that the environment
RealTime Scheduling via Reinforcement Learning
"... Cyberphysical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actuation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be ba ..."
Abstract
 Add to MetaCart
Cyberphysical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actuation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be balanced against the need to perform more general tasks such as obstacle avoidance. This problem has been addressed by maintaining relative utilization of shared resources among tasks near a userspecified target level. Producing optimal scheduling strategies requires complete prior knowledge of task behavior, which is unlikely to be available in practice. Instead, suitable scheduling strategies must be learned online through interaction with the system. We consider the sample complexity of reinforcement learning in this domain, and demonstrate that while the problem state space is countably infinite, we may leverage the problem’s structure to guarantee efficient learning. 1