Results 1  10
of
177,349
Average reward timed games
 In FORMATS: Formal Modeling and Analysis of Timed Systems, Lecture Notes in Computer Science 3829
, 2005
"... Abstract. We consider realtime games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zerosum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on d ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract. We consider realtime games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zerosum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played
Hierarchical Average Reward Reinforcement Learning Hierarchical Average Reward Reinforcement Learning
"... Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limit ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
limited to the discretetime discounted reward semiMarkov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior
General discounting versus average reward
 In Proc. 17th International Conf. on Algorithmic Learning Theory (ALT’06), volume 4264 of LNAI
, 2006
"... Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbit ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially
Average reward optimization objective in partially . . .
"... Another example of a controlled system (see Sec. 4.2) Figure 3. The first two plots describe the behavior of the system which rotates the state by a → 10 ◦ and b → −10 ◦ , with the reset states aligned in a way that taking the opposite action brings the system to its topmost point. The x and y in th ..."
Abstract
 Add to MetaCart
in the plots represent possible actions such that x = y. The bottom plot demonstrates how the average reward changes as a function of α and β, where the policies having 2 hidden states (S ∈ {1, 2}) are parametrized as:
Optimizing Average Reward Using Discounted Rewards
"... 1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close ..."
Abstract
 Add to MetaCart
1 Introduction Sequential decision making problems are usually formulated as dynamic programming problems in which the agent must maximize some measure of future reward. In many domains, it is appropriate to optimize the average reward. Often, discounted formulations with a discount factor fl close
Average Reward Optimization Objective In Partially Observable Domains
"... We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to averagereward computation is to have a welldefined stationary behavior of a system, so the ..."
Abstract
 Add to MetaCart
We consider the problem of average reward optimization in domains with partial observability, within the modeling framework of linear predictive state representations (PSRs) (Littman et al., 2001). The key to averagereward computation is to have a welldefined stationary behavior of a system, so
Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
, 1996
"... This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dyna ..."
Abstract

Cited by 130 (13 self)
 Add to MetaCart
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous
Autoexploratory Average Reward Reinforcement Learning
 Artificial Intelligence
, 1996
"... We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores the unexp ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores
Hierarchically Optimal Average Reward Reinforcement Learning
 In Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space de ned by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
averagereward HRL algorithms for nding hierarchically optimal policies.
Complexity of probabilistic planning under average rewards
 In Proceedings of the 17th International Joint Conference on Arti Intelligence
, 2001
"... A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly (instead of explicitly by enumerating the state space), for example as precondi ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
as preconditioneffect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. In many applications, average rewards over unit time is the relevant rationality criterion, as opposed
Results 1  10
of
177,349