Results 1  10
of
498
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract

Cited by 5481 (118 self)
 Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve longterm goals.
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 509 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Synchronization and linearity: an algebra for discrete event systems
, 2001
"... The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific ..."
Abstract

Cited by 362 (11 self)
 Add to MetaCart
The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific community. Copyright Statement This electronic document is in PDF format. One needs Acrobat Reader (available freely for most platforms from the Adobe web site) to benefit from the full interactive machinery: using the package hyperref by Sebastian Rahtz, the table of contents and all LATEX crossreferences are automatically converted into clickable hyperlinks, bookmarks are generated automatically, etc.. So, do not hesitate to click on references to equation or section numbers, on items of thetableofcontents and of the index, etc.. One may freely use and print this document for one’s own purpose or even distribute it freely, but not commercially, provided it is distributed in its entirety and without modifications, including this preface and copyright statement. Any use of thecontents should be acknowledged according to the standard scientific practice. The
PEGASUS: A policy search method for large MDPs and POMDPs
 In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent&qu ..."
Abstract

Cited by 256 (9 self)
 Add to MetaCart
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the general problem of policy search to one in which we need only consider POMDPs with deterministic transitions. We give a natural way of estimating the value of all policies in these transformed POMDPs. Policy search is then simply performed by searching for a policy with high estimated value. We also establish conditions under which our value estimates will be good, recovering theoretical results similar to those of Kearns, Mansour and Ng [7], but with "sample complexity" bounds that have only a polynomial rather than exponential dependence on the horizon time. Our method appl...
Monetary Policy Evaluation with Noisy Information
, 1998
"... This paper investigates the implications of noisy information regarding the measurement of economic activity for the evaluation of monetary policy. A common implicit assumption in such evaluations is that policymakers observe the current state of the economy promptly and accurately and can therefore ..."
Abstract

Cited by 237 (26 self)
 Add to MetaCart
This paper investigates the implications of noisy information regarding the measurement of economic activity for the evaluation of monetary policy. A common implicit assumption in such evaluations is that policymakers observe the current state of the economy promptly and accurately and can therefore adjust policy based on this information. However, in reality, decisions are made in real time when there is considerable uncertainty about the true state of affairs in the economy. Policy must be made with partial information. Using a simple model of the U.S. economy, I show that failing to account for the actual level of information noise in the historical data provides a seriously distorted picture of feasible macroeconomic outcomes and produces inefficient policy rules. Naive adoption of policies identified as efficient when such information noise is ignored results in macroeconomic performance worse than actual experience. When the noise content of the data is properly taken into account, policy reactions are cautious and less sensitive to the apparent imbalances in the unfiltered data. The resulting policy prescriptions reflect the recognition that excessively activist policy can increase rather than decrease economic instability.
Ratedistortion methods for image and video compression
 IEEE Signal Process. Mag. 1998
"... In this paper we provide an overview of ratedistortion (RD) based optimization techniques and their practical application to image and video coding. We begin with a short discussion of classical ratedistortion theory and then we show how in many practical coding scenarios, such as in standardsco ..."
Abstract

Cited by 219 (7 self)
 Add to MetaCart
In this paper we provide an overview of ratedistortion (RD) based optimization techniques and their practical application to image and video coding. We begin with a short discussion of classical ratedistortion theory and then we show how in many practical coding scenarios, such as in standardscompliant coding environments, resource allocation can be put in an RD framework. We then introduce two popular techniques for resource allocation, namely, Lagrangian optimization and dynamic programming. After a discussion of these two techniques as well as some of their extensions, we conclude with a quick review of recent literature in these areas citing a number of applications related to image and video compression and transmission. We
Asynchronous stochastic approximation and Qlearning
 Machine Learning
, 1994
"... Abstract £ We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, areinforcement learning method for solving Markov decision problems, and establi ..."
Abstract

Cited by 202 (4 self)
 Add to MetaCart
Abstract £ We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, areinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
EXACT AND APPROXIMATE ALGORITHMS FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES
, 1998
"... Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policies becomes more complex. The more sources of uncertainty there are, the harder the problem becomes to solve. ..."
Abstract

Cited by 183 (2 self)
 Add to MetaCart
Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policies becomes more complex. The more sources of uncertainty there are, the harder the problem becomes to solve. In this work, we look at sequential decision making in environments where the actions have probabilistic outcomes and in which the system state is only partially observable. We focus on using a model called a partially observable Markov decision process (POMDP) and explore algorithms which address computing both optimal and approximate policies for use in controlling processes that are modeled using POMDPs. Although solving for the optimal policy is PSPACEcomplete (or worse), the study and improvements of exact algorithms lends insight into the optimal solution structure as well as providing a basis for approximate solutions. We present some improvements, analysis and empirical comparisons for some existing and some novel approaches for computing the optimal POMDP policy exactly. Since it is also hard (NPcomplete or worse) to derive close approximations to the optimal solution for POMDPs, we consider a number of approaches for deriving policies that yield suboptimal control and empirically explore their performance on a range of problems. These approaches
Planning Under Time Constraints in Stochastic Domains
 ARTIFICIAL INTELLIGENCE
, 1993
"... We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future reward ..."
Abstract

Cited by 178 (20 self)
 Add to MetaCart
We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete ...