### Table 1 Policy iteration

"... In PAGE 4: ... There are two related dynamic programming algorithms for indefinite-horizon MDPs: policy iteration and value iteration. Policy iteration is summarized in Table1 . It interleaves the dynamic-programming update, used for policy improvement, with policy evaluation.... ..."

### Table 1 Policy iteration

2001

"... In PAGE 4: ... There are two related dynamic programming algorithms for indefinite-horizon MDPs: policy iteration and value iteration. Policy iteration is summarized in Table1 . It interleaves the dynamic-programming update, used for policy improvement, with policy evaluation.... ..."

### Table 6: The Policy Iteration Algorithm

1997

"... In PAGE 29: ... Note that if we plug f into Equation (3), the new policy is unchanged|it is still the optimal policy. This de nes the algorithm known as policy iteration, shown in Table6 . It is easy to show that policy iteration converges in a xed number of iterations.... ..."

Cited by 172

### Table 6: The Policy Iteration Algorithm

"... In PAGE 20: ... Note that if we plug f into Equation (3), the new policy is unchanged|it is still the optimal policy. This de nes the algorithm known as policy iteration, shown in Table6 . It is easy to show that policy iteration converges in a xed number of iterations.... ..."

### Table 2: The policy iteration algorithm

"... In PAGE 18: ... The representation of the policy and utility functions are also structured, using decision trees. In standard policy iteration, the value of the candidate policy is computed on each iteration by solving a system of jSj linear equations (step 2 in Table2 ), which is computationally prohibitive for large real-world planning problems. Modified policy iteration replaces this step with an iterative approximation of the value function V #19 by a series of value functions V 0;V 1; .... ..."

### Table 2: The policy iteration algorithm

"... In PAGE 18: ... The policy is initially chosen at random, and the process terminates when no improvement can be found. The algorithm is shown in Table2 . This process converges to an optimal policy (Puterman 1994).... In PAGE 20: ... The representation of the policy and utility functions are also structured, using decision trees. In standard policy iteration, the value of the candidate policy is computed on each iteration by solving a system of jSj linear equations (step 2 in Table2 ), which is computationally prohibitive for large real-world planning problems. Modified policy iteration replaces this step with an iterative approximation of... ..."

### Table 2.1: The policy iteration algorithm

2007

### Table 3 Policy iterations for traditional algorithms in Example 3

1999

### Table 5.1 Cost comparisons between the centralized and distributed adaptive policy iteration algorithms. The systems used for the comparison are exible beams like that in Figure 5.15. The comparisons are between the cost per time step of of the adaptive policy iteration algorithm (Figure 5.1) and the distributed adaptive policy iteration algorithm (Figure 5.14).

1994

### Table 1: Optimal policy for the COFFEE domain and naturalness are described in some detail in [8]. The optimal policy and the corresponding value function V for this example are shown in Table 1, as computed by policy iteration using a discounting factor of 0:95. While policy iteration explicitly computes an action and value for each of the 64 states, the policy and value function exhibit regularities that permit the compact expression shown in the table.8 8This fact itself suggests that more reasonable implementations of policy iteration might exploit such structure | see Section 6.

"... In PAGE 27: ...1 Table 2: The policy computed using the abstract MDP version of the co ee problem is described in Table 2, which shows the action and value for each of the eight abstract states. When compared to the optimal policy for the original problem ( Table1 ), we see that an \optimal quot; action is chosen at all but one of the 64 states. As we would expect given the method of construction, the policy is optimal except in the state where it is raining and the robot can pick up the umbrella before going to the co ee shop | in the abstract policy the robot immediately heads for co ee ignoring the umbrella.... ..."