Results 1  10
of
23
Selecting the StateRepresentation in Reinforcement Learning
"... The problem of selecting the right staterepresentation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The problem of selecting the right staterepresentation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T 2/3 where T is the horizon time. 1
Consistency of Feature Markov Processes
, 2010
"... We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
(Show Context)
We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.
Avoiding Unintended AI Behaviors
"... Abstract: Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract: Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a twostage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.
Feature reinforcement learning in practice
 In Proc. 9th European Workshop on Reinforcement Learning (EWRL9), volume 7188 of LNAI
, 2011
"... Abstract. Following a recent surge in using historybased methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [12]. To create a practical algorithm we devise a stochastic search procedure for ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Following a recent surge in using historybased methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [12]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical Utree algorithm [18] and the recent activeLZ algorithm [6], and is competitive with MCAIXICTW [26] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Qlearning on the corresponding MDP. Our ΦMDP algorithm is much simpler, yet consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems. 1
Adaptive Bandits: Towards the best historydependent strategy
"... We consider multiarmed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides re ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider multiarmed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ ∗ ∈ Θ. The regret is measured with respect to (w.r.t.) the best historydependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best historyclassbased strategy) for the best model in Θ. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When Θ = {θ}, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by Õ( √ T AC), where C is the number of classes of θ. Now, when many models are available, all known algorithms achieving a nice regret O ( √ T) are unfortunately not tractable and scale poorly with the number of models Θ. Our contribution here is to provide tractable algorithms with regret bounded by T 2/3 C 1/3 log(Θ) 1/2. 1
One decade of universal artificial intelligence
 In Theoretical Foundations of Artificial General Intelligence
, 2012
"... The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the awardwinning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAIbased universal measure of rational intelligence, which is formal, objective and nonanthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same
Sketch of an AGI architecture with illustration
"... Here we present a framework for AGI inspired by knowledge about the only working prototype: the brain. We consider the neurobiological findings as directives. The main algorithmic modules are defined and solutions for each subtasks are given together with the available mathematical (hard) constraint ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Here we present a framework for AGI inspired by knowledge about the only working prototype: the brain. We consider the neurobiological findings as directives. The main algorithmic modules are defined and solutions for each subtasks are given together with the available mathematical (hard) constraints. The main themes are compressed sensing, factor learning, independent process analysis and low dimensional embedding for optimal state representation to be used by a particular RL system that can be integrated with a robust controller. However, the blending of the suggested partial solutions is not a straightforward task. Nevertheless we start to combine these modules and illustrate their working on a simulated problem. We will discuss the steps needed to complete the integration.
Unsupervised modelfree representation learning
"... Abstract. Numerous control and learning problems face the situation where sequences of highdimensional highly dependent data are available, but no or little feedback is provided to the learner. In such situations it may be useful to find a concise representation of the input signal, that would pres ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Numerous control and learning problems face the situation where sequences of highdimensional highly dependent data are available, but no or little feedback is provided to the learner. In such situations it may be useful to find a concise representation of the input signal, that would preserve as much as possible of the relevant information. In this work we are interested in the problems where the relevant information is in the timeseries dependence. Thus, the problem can be formalized as follows. Given a series of observations X0,..., Xn coming from a large (highdimensional) space X, find a representation function f mapping X to a finite space Y such that the series f(X0),..., f(Xn) preserve as much information as possible about the original timeseries dependence in X0,..., Xn. For stationary time series, the function f can be selected as the one maximizing the timeseries information I∞(f) = h0(f(X)) − h∞(f(X)) where h0(f(X)) is the Shannon entropy of f(X0) and h∞(f(X)) is the entropy rate of the time series f(X0),..., f(Xn),.... In this paper we study the functional I∞(f) from the learningtheoretic point of view. Specifically, we provide some uniform approximation results, and study the behaviour of I∞(f) in the problem of optimal control. 1
Feature Reinforcement Learning using Looping Suffix Trees
"... There has recently been much interest in historybased methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have longterm dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which hav ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
There has recently been much interest in historybased methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have longterm dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates results from CTΦMDP for environments with short term dependencies, while it outperforms LSTMbased methods on TMaze, a deep memory environment. 1.
Decision Support for Safe AI Design
"... Abstract: There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter's agentenvironment framework to define a decision support system for simulating, visualizing and analyzing AI designs to u ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract: There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter's agentenvironment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.