| P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287--297, 1990. |
....little about their asymptotic complexity. It is known, however, that algorithms based on value iteration have no better than a pseudo polynomial run time on MDP prob An algorithm has pseudo polynomial run time complexity, if it runs in time polynomial in the unary representation of the lems [Tse90, Lit96]. In this paper, we analyze the basic value iteration procedure on the deterministic MDP problem under the average reward criterion, or the DMDP problem, and we establish several positive results. The DMDP problem is also known as the maximum (or minimum) mean cycle problem in a directed weighted ....
P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287--297, 1990.
....small reward y c . Thus value iteration may take many iterations to converge to an optimal or near optimal policy or bring the value of a vertex to within say a constant factor of its optimal value. This is basically formalized in the statement that value iteration is a pseudo polynomial algorithm [119], meaning that it has a run time polynomial in n, m; and W (versus log W ) see [119, 76] However, as we will see, on MDP(2) problems, value iteration variants or basically a sequence of value propagation or Bellman Ford operations [1] are used in binary search schemes to give polynomial time ....
....or near optimal policy or bring the value of a vertex to within say a constant factor of its optimal value. This is basically formalized in the statement that value iteration is a pseudo polynomial algorithm [119] meaning that it has a run time polynomial in n, m; and W (versus log W ) see [119, 76]) However, as we will see, on MDP(2) problems, value iteration variants or basically a sequence of value propagation or Bellman Ford operations [1] are used in binary search schemes to give polynomial time algorithms for solving the MDP(2) problem. In Chapter 7, value iteration is shown to ....
[Article contains additional citation context not shown here]
P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287-297, 1990.
....they formally establish NP hardness results for a variety of nite horizon POMDP problems, but only conjecture as to the undecidability of the POMDP in nite horizon case. The computational complexity of nite horizon control problems has received considerable attention recently, see for example [48,10,39,20]. For the in nitehorizon case, two questions, the complexity of goal state reachability with either nonzero probability or probability one, which reduce to reachability computations and are decidable, had been studied by Alur et al. 1] and Littman [26] Other work considered in nite horizon ....
.... case, two questions, the complexity of goal state reachability with either nonzero probability or probability one, which reduce to reachability computations and are decidable, had been studied by Alur et al. 1] and Littman [26] Other work considered in nite horizon fully observable MDPs [40,48], fully observable MDPs with exponentially many states but with compact representations [27,28] and the complexity of in nite horizon problems on stochastic games, which generalize MDPs [14,31] Littman, Goldsmith, and Mundhenk [29] analyze the complexity of propositional probabilistic planning ....
P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287-297, 1990.
....in which they formally establish NP hardness results for a variety of finite horizon problems, but only conjecture as to the undecidability of the infinite horizon case. The computational complexity of finite horizon control problems has received considerable attention recently, see for example [32,6,24,14]. For the infinite horizon case, two questions, the complexity of goal state reachability with either nonzero probability or probability one, which reduce to reachability computations and are decidable, had been studied by Alur et al. 1] and Littman [17] It is now well established that optimal ....
P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287--297, 1990.
....little about their asymptotic complexity. It is known, however, that algorithms based on value iteration have no better than a pseudo polynomial run time on MDP prob An algorithm has pseudo polynomial run time complexity, if it runs in time polynomial in the unary representation of the lems [Tse90, Lit96]. In this paper, we analyze the basic value iteration procedure on the deterministic MDP problem under the average reward criterion, or the DMDP problem, and we establish several positive results. The DMDP problem is also known as the maximum (or minimum) mean cycle problem in a directed weighted ....
P. Tseng. Solving H-horizon stationary Markov decision process in time proportional to log(H). Operations Research Letters, 9(5):287--297, 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC