Results 1  10
of
331
On the undecidability of probabilistic planning and infinitehorizon partially observable Markov decision process problems
 Artificial Intelligence
"... Abstract We investigate the computability of problems in probabilistic planning and partially observable infinitehorizon Markov decision processes. The undecidability of the stringexistence problem for probabilistic finite automata is adapted to show that the following problem of plan existence i ..."
Abstract

Cited by 103 (0 self)
 Add to MetaCart
total reward models, averagereward models, and stateavoidance models are all shown to be undecidable. The results apply to corresponding approximation problems as well.
Hlearning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward
, 1994
"... In this paper, we introduce a modelbased reinforcement learning method called Hlearning, which optimizes undiscounted average reward. We compare it with three other reinforcement learning methods in the domain of scheduling Automatic Guided Vehicles, transportation robots used in modern manufactu ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
manufacturing plants and facilities. The four methods differ along two dimensions. They are either modelbased or modelfree, and optimize discounted total reward or undiscounted average reward. Our experimental results indicate that Hlearning is more robust with respect to changes in the domain parameters
Stationary anonymous sequential games with undiscounted rewards
 NETGCOOP 2011: INTERNATIONAL CONFERENCE ON NETWORK GAMES, CONTROL AND OPTIMIZATION
, 2011
"... Stationary anonymous sequential games with undiscounted rewards are a special class of games that combines features from both population games (infinitely many players) with stochastic games. We extend the theory for these games to the cases of total expected cost as well as to the expected average ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Stationary anonymous sequential games with undiscounted rewards are a special class of games that combines features from both population games (infinitely many players) with stochastic games. We extend the theory for these games to the cases of total expected cost as well as to the expected average
RiskAverse Control of Undiscounted Transient Markov Models
, 2012
"... We use Markov risk measures to formulate a riskaverse version of the undiscounted total cost problem for a transient controlled Markov process. Using the new concept of a multikernel, we derive conditions for a system to be risktransient, that is, to have finite risk over an infinite time horizon. ..."
Abstract
 Add to MetaCart
We use Markov risk measures to formulate a riskaverse version of the undiscounted total cost problem for a transient controlled Markov process. Using the new concept of a multikernel, we derive conditions for a system to be risktransient, that is, to have finite risk over an infinite time horizon
Selfdetermination and persistence in a reallife setting: Toward a motivational model of high school dropout.
 Journal of Personality and Social Psychology,
, 1997
"... The purpose of this study was to propose and test a motivational model of high school dropout. The model posits that teachers, parents, and the school administration's behaviors toward students influence students' perceptions of competence and autonomy. The less autonomy supportive the so ..."
Abstract

Cited by 183 (19 self)
 Add to MetaCart
The purpose of this study was to propose and test a motivational model of high school dropout. The model posits that teachers, parents, and the school administration's behaviors toward students influence students' perceptions of competence and autonomy. The less autonomy supportive
Constrained Markov Decision Models With Weighted Discounted Rewards
 Math. of Operations Research
, 1993
"... This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove
Autoexploratory Average Reward Reinforcement Learning
 Artificial Intelligence
, 1996
"... We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores the unexp ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
We introduce a modelbased average reward Reinforcement Learning method called Hlearning and compare it with its discounted counterpart, Adaptive RealTime Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to Hlearning, which automatically explores
An Algorithm to Calculate Transient Distributions of Cumulative Reward
 Commun. in Statist. – Stochastic Models
, 1998
"... Markov reward models have been widely used to solve a variety of problems. In these models, reward rates are associated to the states of a continuous time Markov chain, and impulse rewards are associated to transitions of the chain. Reward rates are gained per unit time in the associated state, and ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Markov reward models have been widely used to solve a variety of problems. In these models, reward rates are associated to the states of a continuous time Markov chain, and impulse rewards are associated to transitions of the chain. Reward rates are gained per unit time in the associated state
1 Return on Reward
, 2013
"... Return on reward, i.e. the average economic performance per employee, which a company generates based on its average total labor expenses is a topic that seems to have received surprisingly little attention in the past. While companies manage their (often fictive) financial indicators (e.g. capital ..."
Abstract
 Add to MetaCart
Return on reward, i.e. the average economic performance per employee, which a company generates based on its average total labor expenses is a topic that seems to have received surprisingly little attention in the past. While companies manage their (often fictive) financial indicators (e.g. capital
Recursive Stochastic Games with Positive Rewards
"... Abstract. We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1exit Recursive Markov Decision Processes (1RMDPs) and Simple Stochastic Games (1RSSGs) with strictly positive rewards. These are a class of finitely presented countablestate z ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
state zerosum stochastic games, with total expected reward objective. They subsume standard finitestate MDPs and Condon’s simple stochastic games and correspond to optimization and game versions of several classic stochastic models, with rewards. Such stochastic models arise naturally as models
Results 1  10
of
331