Results 1  10
of
163,407
Modelbased Hierarchical Averagereward Reinforcement Learning
 In Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... There is a growing interest in using task hierarchies to tame the complexity of reinforcement learning. In this paper, we extend the MAXQ framework to hierarchical Averagereward Reinforcement Learning. We motivate and introduce the notions of recursive gainoptimality and hierarchical gainoptimalit ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
There is a growing interest in using task hierarchies to tame the complexity of reinforcement learning. In this paper, we extend the MAXQ framework to hierarchical Averagereward Reinforcement Learning. We motivate and introduce the notions of recursive gainoptimality and hierarchical
Processoriented planning and averagereward optimality
 IJCAI95
, 1995
"... We argue that many AI planning problems should be viewed as processoriented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goaloriented. The full power of Markov decision models, adopted recently for AI planning, becomes apparent wit ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
with processoriented problems. The question of appropriate optimality criteria becomes more critical in this case; we argue that averagereward optimality is most suitable. While construction of averageoptimal policies involves a number of subtleties and computational difficulties, certain aspects
ProcessOriented Planning and AverageReward Optimality
"... hnp //www cs ubc ca/spider/cebty/craig html We argue that many AI planning problems should be viewed as processonented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goalonented The full power of Markov decision models, adopted recen ..."
Abstract
 Add to MetaCart
recently for AI planning, becomes apparent with processonented problems The question of appropriate opdmallry criteria becomes more cnncal in this case, we argue that average reward optimahty is most suitable While construction of averageoptimal policies involves a number of subtleties and computational
Scaling Averagereward Reinforcement Learning for Product Delivery
"... Reinforcement learning in realworld domains suffers from three curses of dimensionality: explosions in state space and action space, and high stochasticity. We give partial solutions to each of these curses that provide orderofmagnitude speedups in execution time over standard approaches. We demo ..."
Abstract
 Add to MetaCart
demonstrate our methods in the domain of product delivery. We present experimental results on refinements of a modelbased averagereward algorithm called HLearning. 1.
Scaling ModelBased AverageReward Reinforcement Learning for Product Delivery
"... Abstract. Reinforcement learning in realworld domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the statespace explosion, we introduce “tabular linear functions ” t ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Abstract. Reinforcement learning in realworld domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the statespace explosion, we introduce “tabular linear functions ” that generalize tilecoding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASHlearning, which is an afterstate version of HLearning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery an optimization problem that combines inventory control and vehicle routing. 1
On the undecidability of probabilistic planning and infinitehorizon partially observable Markov decision process problems
 Artificial Intelligence
"... Abstract We investigate the computability of problems in probabilistic planning and partially observable infinitehorizon Markov decision processes. The undecidability of the stringexistence problem for probabilistic finite automata is adapted to show that the following problem of plan existence i ..."
Abstract

Cited by 103 (0 self)
 Add to MetaCart
total reward models, averagereward models, and stateavoidance models are all shown to be undecidable. The results apply to corresponding approximation problems as well.
Decentralized MultiAgent Reinforcement Learning in AverageReward Dynamic DCOPs
"... Abstract. Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multiagent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding ..."
Abstract
 Add to MetaCart
Abstract. Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multiagent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP
SelfImproving Factory Simulation using Continuoustime AverageReward Reinforcement Learning
 Proceedings of the 14th International Conference on Machine Learning
, 1997
"... Many factory optimization problems, from inventory control to scheduling and reliability, can be formulated as continuoustime Markov decision processes. A primary goal in such problems is to find a gainoptimal policy that minimizes the longrun average cost. This paper describes a new averagerewar ..."
Abstract

Cited by 54 (13 self)
 Add to MetaCart
Many factory optimization problems, from inventory control to scheduling and reliability, can be formulated as continuoustime Markov decision processes. A primary goal in such problems is to find a gainoptimal policy that minimizes the longrun average cost. This paper describes a new
Extending Hierarchical Reinforcement Learning to ContinuousTime, AverageReward, and MultiAgent Models
, 2003
"... Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discretetime discounted reward semiMarkov decision process (SMDP) model. In this ..."
Abstract
 Add to MetaCart
Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discretetime discounted reward semiMarkov decision process (SMDP) model
Decentralized multiagent reinforcement learning in averagereward dynamic DCOPs (theoretical proofs
, 2014
"... Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multiagent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Exist ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multiagent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it
Results 1  10
of
163,407