Results 1 
6 of
6
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
, 2013
"... Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γdiscounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γdiscounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and SimplexPI that only changes the action in the state with maximal advantage. We show that Howard’s PI terminates after at most n(m − 1) ⌈ 1 1−γ log ( 1 1−γ iterations, improving by a factor O(log n) a result by Hansen et al. (2013), while SimplexPI terminates after at most n(m − 1) ⌈ n 1−γ log ()⌉ n iterations, improving by a factor 2 a result by Ye 1−γ (2011). Under some structural assumptions of the MDP, we then consider bounds that are independent of the discount factor γ. When the MDP is deterministic, we show that SimplexPI terminates after at most 2n 2 m(m − 1)⌈2(n − 1) log n⌉⌈2n log n ⌉ = O(n 4 m 2 log 2 n) iterations, improving by a factor O(n) a bound obtained by Post and Ye (2012). We generalize this result to stochastic MDPs: given a measure of the maximal transient time τt and the maximal time τr to revisit states in recurrent classes under all policies, we show that SimplexPI terminates after at most n 2 m(m − 1) (⌈τr log(nτr) ⌉ + ⌈τr log(nτt)⌉) ⌈τt log(n(τt + 1)) ⌉ = Õ(n2 τtτrm 2) iterations. We explain why similar results seem hard to derive for Howard’s PI. Finally, under the additional (restrictive) assumption that the state space is partitioned in two sets, corresponding to states that are transient (respectively recurrent) for all policies, we show that SimplexPI and Howard’s PI terminate after at most n(m − 1) (⌈τt log nτt ⌉ + ⌈τr log nτr⌉) = Õ(nm(τt + τr)) iterations.
MobilityInduced Service Migration in Mobile MicroClouds
"... Abstract—Mobile microcloud is an emerging technology in distributed computing, which is aimed at providing seamless computing/data access to the edge of the network when a centralized service may suffer from poor connectivity and long latency. Different from the traditional cloud, a mobile microcl ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Mobile microcloud is an emerging technology in distributed computing, which is aimed at providing seamless computing/data access to the edge of the network when a centralized service may suffer from poor connectivity and long latency. Different from the traditional cloud, a mobile microcloud is smaller and deployed closer to users, typically attached to a cellular basestation or wireless network access point. Due to the relatively small coverage area of each basestation or access point, when a user moves across areas covered by different basestations or access points which are attached to different microclouds, issues of service performance and service migration become important. In this paper, we consider such migration issues. We model the general problem as a Markov decision process (MDP), and show that, in the special case where the mobile user follows a onedimensional asymmetric random walk mobility model, the optimal policy for service migration is a threshold policy. We obtain the analytical solution for the cost resulting from arbitrary thresholds, and then propose an algorithm for finding the optimal thresholds. The proposed algorithm is more efficient than standard mechanisms for solving MDPs. Index Terms—Cloud computing, Markov decision process (MDP), mobile microcloud, mobility, service migration, wireless networks I.
unknown title
"... Noname manuscript No. (will be inserted by the editor) An exponential lower bound for Cunningham’s rule ..."
Abstract
 Add to MetaCart
(Show Context)
Noname manuscript No. (will be inserted by the editor) An exponential lower bound for Cunningham’s rule