| A. McGovern and J. E. B. Moss. Scheduling straight-line code using reinforcement learning and rollouts. In S. Solla, editor, Advances in Neural Information Processing Systems 11, Cambridge, MA, 1998. MIT Press. |
....= 0 r(B) 1 A B Figure 1: Transitions and rewards in the two state system. Klopf [Harmon et al. 1994] introduced advantaging updating which estimates the value of each state and the relative advantage of each action using separate approximation architectures. More recently McGovern and Moss [McGovern and Moss, 1998] have used temporal difference learning to develop an instruction scheduler for an optimising compiler. Their approach uses table lookup rather than function approximation, and combines possible successor states into a single feature vector which is mapped to a preference indicator. Bertsekas ....
Amy McGovern and Eliot Moss. Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts. Advances in Neural Information Processing (NIPS '98), 11, 1998.
....the merits of approximating the gradient of the Q function, whilst Baird [18] Harmon, and Klopf [19] introduced advantaging updating which estimates the value of each state and the relative advantage of each action using separate approximation architectures. More recently McGovern and Moss [20] have used temporal difference learning to develop an instruction scheduler for an optimising compiler. Their approach uses tablelookup rather than function approximation, and combines possible successor states into a single feature vector which is mapped to a preference indicator. Bertsekas ....
Amy McGovern and Eliot Moss. Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts. Advances in Neural Information Processing (NIPS '98), 11, 1998.
No context found.
A. McGovern and J. E. B. Moss. Scheduling straight-line code using reinforcement learning and rollouts. In S. Solla, editor, Advances in Neural Information Processing Systems 11, Cambridge, MA, 1998. MIT Press.
....and a set of (more than one) candidate instructions to add to the schedule. The scheduler appends each candidate to the partial schedule and then follows a fixed policy, p, to schedule the remaining instructions. When the schedule is complete, the scheduler evaluates the 2 In previous papers (McGovern and Moss, 1998; McGovern, Moss, and Barto, 1999) we referred to this as ORIG. This change is to reduce confusion. To appear in Machine Learning: Special Issue on Reinforcement Learning, 2000. 8 6 8 7 6 9 . 11 10 7 8 9 9 7.2 5.6 5 7 5 6 5 . rollout with policy p Partial schedule p Best average ....
McGovern, A. & Moss, E. (1998). Scheduling straight-line code using reinforcement learning and rollouts.
No context found.
Amy McGovern, Eliot Moss, and Andrew G. Barto. Scheduling straight-line code using reinforcement learning and rollouts. Technical Report 99-23, University of Massachusetts, Amherst, April 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC