### Table 2: Parameters used in reinforcement learning

"... In PAGE 4: ... Note that Hall and Mars found that the SLA outperformed com- mon static policies, such as FCFS, EDF and SP. We then applied our RL system under the same simulation conditions, using the parameters shown in Table2 . The mea- sured mean delay for the batch algorithm is shown in Figure 4, and for the -greedy algorithm in Figure 5.... ..."

### TABLE II GENERAL LIKELIHOOD RATIO POLICY GRADIENT ESTIMATOR EPISODIC REINFORCE WITH AN OPTIMAL BASELINE.

2006

Cited by 9

### TABLE II GENERAL LIKELIHOOD RATIO POLICY GRADIENT ESTIMATOR EPISODIC REINFORCE WITH AN OPTIMAL BASELINE.

### Table 2: Parameters for the policy gradient algorithm in the ball acquisition learning task

2007

Cited by 5

### Table 2: Parameters for the policy gradient algorithm in the ball acquisition learning task

### TABLE II REINFORCEMENT LEARNING PARAMETERS

2004

Cited by 2

### Table 2. Learning rates used by the policy gradient algo- rithms in the experiments of this section.

"... In PAGE 11: ... The optimal solution is c 0:92 and = 0:001 ( B(c ; ) = 0:1003) corresponding to 1 0:16 and 2 ! 1. Table2 summarizes the learning rates used in the ex-... In PAGE 12: ... For a xed sample size, each method uses a xed learning rate vector, and the learning rates are not tuned dur- ing a trial. We tried many learning rates and those in Table2 yielded the best performance. The selected learning rates for BPNG are signi cantly larger than those for BPG and MCPG as shown in Table 2.... In PAGE 12: ... We tried many learning rates and those in Table 2 yielded the best performance. The selected learning rates for BPNG are signi cantly larger than those for BPG and MCPG as shown in Table2 . This explains why BPNG initially learns faster than BPG and MCPG, but eventually performs worse.... ..."

### Table 2. Learning rates used by the policy gradient algo- rithms in the experiments of this section.

"... In PAGE 11: ... The optimal solution is c 0:92 and = 0:001 ( B(c ; ) = 0:1003) corresponding to 1 0:16 and 2 ! 1. Table2 summarizes the learning rates used in the ex-... In PAGE 12: ... For a xed sample size, each method uses a xed learning rate vector, and the learning rates are not tuned dur- ing a trial. We tried many learning rates and those in Table2 yielded the best performance. The selected learning rates for BPNG are signi cantly larger than those for BPG and MCPG as shown in Table 2.... In PAGE 12: ... We tried many learning rates and those in Table 2 yielded the best performance. The selected learning rates for BPNG are signi cantly larger than those for BPG and MCPG as shown in Table2 . This explains why BPNG initially learns faster than BPG and MCPG, but eventually performs worse.... ..."

### Table 1. Four reinforcement learning algorithms, the counterpart of the Bellman equation for each, and each of the corresponding residual algorithms..

1995

"... In PAGE 7: ... Given the Bellman equation counterpart for a reinforcement learning algorithm, it is straightforward to derive the associated direct, residual gradient, and residual algorithms. As can be seen from Table1 , all of the residual algorithms can be implemented incrementally except for residual value iteration. Value iteration requires that an expected value be calculated for each possible action, then the maximum to be found.... In PAGE 7: ... This is clearly impractical, and appears to have been one of the motivations behind the development of Q-learning. Table1 also shows that for a deterministic MDP, all of the algorithms can be implemented without a model, except for residual value iteration. This may simplify the design of a learning system, since there is no need to learn a model of the MDP.... ..."

Cited by 145