Results 21  30
of
5,613
Treebased batch mode reinforcement learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the socalled Qfunction based on a set of fourtuples (xt,ut,rt,xt+1) where xt denotes the system state a ..."
Abstract

Cited by 224 (42 self)
 Add to MetaCart
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the socalled Qfunction based on a set of fourtuples (xt,ut,rt,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Qfunction. The Qfunction approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical treebased supervised learning methods (CART, Kdtree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of fourtuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees.
A Survey of Collaborative Filtering Techniques
, 2009
"... As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenge ..."
Abstract

Cited by 216 (0 self)
 Add to MetaCart
As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memorybased, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the stateoftheart, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
A stochastic model of humanmachine interaction for learning dialog strategies,
 in IEEE Transactions on Speech and Audio Processing,
, 2000
"... ..."
(Show Context)
Infinitehorizon policygradient estimation
 Journal of Artificial Intelligence Research
, 2001
"... Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a si ..."
Abstract

Cited by 208 (5 self)
 Add to MetaCart
(Show Context)
Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a simulationbased algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes ( � s) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm’s chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter � � (which has a natural interpretation in terms of biasvariance tradeoff), and requires no knowledge of the underlying state. We prove convergence of � � , and show how the correct choice of the parameter is related to the mixing time of the controlled �. We briefly describe extensions of � � to controlled Markov chains, continuous state, observation and control spaces, multipleagents, higherorder derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by � � can be used in both a traditional stochastic gradient algorithm and a conjugategradient procedure to find local optima of the average reward. 1.
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 204 (17 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Probabilistic Algorithms in Robotics
 AI Magazine vol
"... This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progr ..."
Abstract

Cited by 199 (6 self)
 Add to MetaCart
(Show Context)
This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progress in the field, using indepth examples to illustrate some of the nuts and bolts of the basic approach. Our central conjecture is that the probabilistic approach to robotics scales better to complex realworld applications than approaches that ignore a robot’s uncertainty. 1
Probabilistic Algorithms and the Interactive Museum TourGuide Robot Minerva
, 2000
"... This paper describes Minerva, an interactive tourguide robot that was successfully deployed in a Smithsonian museum. Minerva's software is pervasively probabilistic, relying on explicit representations of uncertainty in perception and control. This article describes ..."
Abstract

Cited by 196 (38 self)
 Add to MetaCart
(Show Context)
This paper describes Minerva, an interactive tourguide robot that was successfully deployed in a Smithsonian museum. Minerva's software is pervasively probabilistic, relying on explicit representations of uncertainty in perception and control. This article describes
Learning attractor landscapes for learning motor primitives
 in Advances in Neural Information Processing Systems
, 2003
"... Many control problems take place in continuous stateaction spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control p ..."
Abstract

Cited by 195 (28 self)
 Add to MetaCart
(Show Context)
Many control problems take place in continuous stateaction spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous stateaction spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with welldefined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be reused and modified online under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degreeoffreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
Agentbased computational economics: Growing economies from the bottomup
 Artificial Life
, 2002
"... Abstract: Agentbased computational economics (ACE) is the computational study of economies modeled as evolving systems of autonomous interacting agents. Thus, ACE is a specialization to economics of the basic complex adaptive systems paradigm. This study outlines the main objectives and defining ch ..."
Abstract

Cited by 192 (5 self)
 Add to MetaCart
(Show Context)
Abstract: Agentbased computational economics (ACE) is the computational study of economies modeled as evolving systems of autonomous interacting agents. Thus, ACE is a specialization to economics of the basic complex adaptive systems paradigm. This study outlines the main objectives and defining characteristics of the ACE methodology, and discusses similarities and distinctions between ACE and artificial life research. Eight ACE research areas are identified, and a number of publications in each area are highlighted for concrete illustration. Open questions and directions for future ACE research are also considered. The study concludes with a discussion of the potential benefits associated with ACE modeling, as well some potential difficulties. Keywords: Agentbased computational economics; artificial life; learning; evolution of norms; markets; networks; parallel experiments with humans and computational agents; computational laboratories. 1
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we prop ..."
Abstract

Cited by 189 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,