Results 1  10
of
5,203,008
Prioritized Sweeping Converges to the Optimal Value Function
"... Prioritized sweeping (PS) and its variants are modelbased reinforcementlearning algorithms that have demonstrated superior performance in terms of computational and experience efficiency in practice. This note establishes the first—to the best of our knowledge—formal proof of convergence to the op ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
to the optimal value function when they are used as planning algorithms. We also describe applications of this result to provably efficient modelbased reinforcement learning in the PACMDP framework. We do not address the issue of convergence rate in the present paper.
On The Relation Between Discounted And Average Optimal Value Functions
, 1998
"... : We investigate the relation between discounted and average deterministic optimal control problems for nonlinear control systems. In particular we are interested in the corresponding optimal value functions. Using the concepts of Viability, Chain Controllability and Controllability a global converg ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
: We investigate the relation between discounted and average deterministic optimal control problems for nonlinear control systems. In particular we are interested in the corresponding optimal value functions. Using the concepts of Viability, Chain Controllability and Controllability a global
Bisimulation Metrics are Optimal Value Functions Norm Ferns ∗
"... Bisimulation is a notion of behavioural equivalence on the states of a transition system. Its definition has been extended to Markov decision processes, where it can be used to aggregate states. A bisimulation metric is a quantitative analog of bisimulation that measures how similar states are fr ..."
Abstract
 Add to MetaCart
is the optimal value function of an optimal coupling of two copies of the original model. We prove the result in the general case of continuous state spaces. This result has important implications in understanding the complexity of computing such metrics, and opens up the possibility of more efficient
An upper bound on the loss from approximate optimalvalue functions
 Machine Learning
, 1994
"... Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP). The value of this theoretical understanding, however, is tempered by many practical concerns. One important question is whether DPbased appro ..."
Abstract

Cited by 82 (5 self)
 Add to MetaCart
based approaches that use function approximation rather than lookup tables can avoid catastrophic e ects on performance. This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when
EConvexity of the Optimal Value Function in Parametric Nonlinear Programming
"... Abstract Consider a general parametric optimization problem P(ε) of the form min x f (x,ε), s.t. x ∈ R(ε). Convexity and generalized convexity properties of the optimal value function f ∗ and the solution set map S ∗ form an important part of the theoretical basis for sensitivity, stability, and par ..."
Abstract
 Add to MetaCart
Abstract Consider a general parametric optimization problem P(ε) of the form min x f (x,ε), s.t. x ∈ R(ε). Convexity and generalized convexity properties of the optimal value function f ∗ and the solution set map S ∗ form an important part of the theoretical basis for sensitivity, stability
On the Rate of Convergence of Infinite Horizon Discounted Optimal Value Functions
, 1998
"... In this paper we investigate the rate of convergence of the optimal value function of an infinite horizon discounted optimal control problem as the discount rate tends to zero. Using the Integration Theorem for Laplace transformations we provide conditions on averaged functionals along suitable traj ..."
Abstract
 Add to MetaCart
In this paper we investigate the rate of convergence of the optimal value function of an infinite horizon discounted optimal control problem as the discount rate tends to zero. Using the Integration Theorem for Laplace transformations we provide conditions on averaged functionals along suitable
Particle swarm optimization
, 1995
"... eberhart @ engr.iupui.edu A concept for the optimization of nonlinear functions using particle swarm methodology is introduced. The evolution of several paradigms is outlined, and an implementation of one of the paradigms is discussed. Benchmark testing of the paradigm is described, and applications ..."
Abstract

Cited by 3535 (22 self)
 Add to MetaCart
eberhart @ engr.iupui.edu A concept for the optimization of nonlinear functions using particle swarm methodology is introduced. The evolution of several paradigms is outlined, and an implementation of one of the paradigms is discussed. Benchmark testing of the paradigm is described
Mining the Network Value of Customers
 In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining
, 2002
"... One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected pro t from a customer is greater than the cost of marketing to her, the marketing action for that customer is executed. So far, work in this area has considered only ..."
Abstract

Cited by 562 (11 self)
 Add to MetaCart
only the intrinsic value of the customer (i.e, the expected pro t from sales to her). We propose to model also the customer's network value: the expected pro t from sales to other customers she may inuence to buy, the customers those may inuence, and so on recursively. Instead of viewing a market
An Upper Bound on the Loss from Approximate OptimalValue Functions
 Machine Learning
, 1994
"... Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP). The value of this theoretical understanding, however, is tempered by many practical concerns. One important question is whether DPbased a ..."
Abstract
 Add to MetaCart
based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance. This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance
Results 1  10
of
5,203,008