Results 1  10
of
2,216,016
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 443 (6 self)
 Add to MetaCart
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs
Generalization in Reinforcement Learning: Safely Approximating the Value Function
 Advances in Neural Information Processing Systems 7
, 1995
"... To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a genera ..."
Abstract

Cited by 307 (4 self)
 Add to MetaCart
by techniques based on dynamic programming (DP). These algorithms compute a value function ...
Advances in Prospect Theory: Cumulative Representation of Uncertainty
 JOURNAL OF RISK AND UNCERTAINTY, 5:297323 (1992)
, 1992
"... We develop a new version of prospect theory that employs cumulative rather than separable decision weights and extends the theory in several respects. This version, called cumulative prospect theory, applies to uncertain as well as to risky prospects with any number of outcomes, and it allows differ ..."
Abstract

Cited by 1717 (17 self)
 Add to MetaCart
different weighting functions for gains and for losses. Two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting functions. A review of the experimental evidence and the results of a new experiment confirm a
The ratedistortion function for source coding with side information at the decoder
 IEEE Trans. Inform. Theory
, 1976
"... AbstractLet {(X,, Y,J}r = 1 be a sequence of independent drawings of a pair of dependent random variables X, Y. Let us say that X takes values in the finite set 6. It is desired to encode the sequence {X,} in blocks of length n into a binary stream*of rate R, which can in turn be decoded as a seque ..."
Abstract

Cited by 1060 (1 self)
 Add to MetaCart
the infimum is with respect to all auxiliary random variables Z (which take values in a finite set 3) that satisfy: i) Y,Z conditiofally independent given X; ii) there exists a functionf: “Y x E +.%, such that E[D(X,f(Y,Z))] 5 d. Let Rx, y(d) be the ratedistortion function which results when the encoder
Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition
 in Conference Record of The TwentySeventh Asilomar Conference on Signals, Systems and Computers
, 1993
"... In this paper we describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocks e.g. aiEne (wa.velet) frames. We propoeea modification to the Matching Pursuit algorithm of Mallat and Zhang (199 ..."
Abstract

Cited by 637 (1 self)
 Add to MetaCart
In this paper we describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocks e.g. aiEne (wa.velet) frames. We propoeea modification to the Matching Pursuit algorithm of Mallat and Zhang
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 515 (4 self)
 Add to MetaCart
or plans. Planning problems commonly possess structure in the reward and value functions used to de...
LeastSquares Policy Iteration
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach ..."
Abstract

Cited by 462 (12 self)
 Add to MetaCart
We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach
Stochastic Perturbation Theory
, 1988
"... . In this paper classical matrix perturbation theory is approached from a probabilistic point of view. The perturbed quantity is approximated by a firstorder perturbation expansion, in which the perturbation is assumed to be random. This permits the computation of statistics estimating the variatio ..."
Abstract

Cited by 907 (36 self)
 Add to MetaCart
and the eigenvalue problem. Key words. perturbation theory, random matrix, linear system, least squares, eigenvalue, eigenvector, invariant subspace, singular value AMS(MOS) subject classifications. 15A06, 15A12, 15A18, 15A52, 15A60 1. Introduction. Let A be a matrix and let F be a matrix valued function of A
A Scalable ContentAddressable Network
 IN PROC. ACM SIGCOMM 2001
, 2001
"... Hash tables – which map “keys ” onto “values” – are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a ContentAddressable Network (CAN) as a distributed infra ..."
Abstract

Cited by 3371 (32 self)
 Add to MetaCart
Hash tables – which map “keys ” onto “values” – are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a ContentAddressable Network (CAN) as a distributed
EndToEnd Arguments In System Design
, 1984
"... This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the endtoend argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost o ..."
Abstract

Cited by 1037 (10 self)
 Add to MetaCart
This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the endtoend argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost
Results 1  10
of
2,216,016