| J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen. Softwaredirected register deallocation for simultaneous multithreading processors. In IEEE Transactions on Parallel and Distributed Systems, 10(9), Sept. 1999. |
....Wallace and Bagherzadeh [22] and later Monreal et al. 16] delay allocation of physical registers to the execution stage. This is complementary to our work, and can be combined with it to achieve even better resource utilization. Lozano and Gao [12] Martin et al. 15] and Lo et al. [11] use the compiler to analyze the code and pass on dead register information to the hardware, in order to deallocate physical registers. The latter approaches require instruction set support: special symbolic registers [12] register kill instructions [11, 15] or cloned versions of opcodes that ....
....Gao [12] Martin et al. 15] and Lo et al. 11] use the compiler to analyze the code and pass on dead register information to the hardware, in order to deallocate physical registers. The latter approaches require instruction set support: special symbolic registers [12] register kill instructions [11, 15], or cloned versions of opcodes that implicitly kill registers [11] Our approach does not require changes in the instruction set or compiler support; thus, it works with legacy application binaries. The third category of related work would include work that recycles load and store queue entries. ....
[Article contains additional citation context not shown here]
J. L. Lo, S. S. Parekh, S. J. Eggers, H. M. Levy, and D. M. Tullsen. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 10(9):922--933, September 1999.
....to sit in the cache for some time after commit. Some previous work has considered earlier deallocation of physical registers by using dead value information which exploits the fact that the last use of a register can be used for a deallocation marker instead of waiting for the next redefinition [9, 7]. Figure 7 shows these differences in pictorial form. The clear bar represents regions where the physical register is allocated but does not contain a valid value. The black bar shows where the physical register is allocated and must contain valid data (until a new value is committed to ....
....designs. Finally, this approach could be useful on simultaneous multithreaded processors which require very large logical register files to house the contents of the multiple thread contexts that are simultaneously live in the machine. Previous research has used a merged register renaming scheme [7], which means that the physical register file (which contains both architected and speculative state) must be extremely large. For example, for 4 threads at 32 registers each, the PRF would need to be larger than 128, and in particular it would be 128 plus the maximum number of in flight ....
Jack L. Lo, Sujay S. Parekh, Susan J. Eggers, Heny M. Levy, Dean M. Tullsen. Software-Directed Register Deallocation for Simultaneous Multithreaded Processors. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 9, September 1999, pp. 922-933.
....to sit in the cache for some time after commit. Some previous work has considered earlier deallocation of physical registers by using dead value information which exploits the fact that the last use of a register can be used for a deallocation marker instead of waiting for the next redefinition [Mart97, Lo99]. Figure 4.22 shows these differences in pictorial form. The clear bar represents regions where the physical register is allocated but does not contain a valid value. The black bar shows where the physical register is allocated and must contain valid data (until a new value is committed to ....
....designs. Finally, this approach could be useful on simultaneous multi threaded processors which require very large logical register files to house the contents of the multiple thread contexts that are simultaneously live in the machine. Previous research has used a merged register renaming scheme [Lo99], which means that the physical register file (which contains both architected and speculative state) must be extremely large. For example, for 4 threads at 32 registers each, the PRF would need to be larger than 128, and in particular it would be 128 plus the maximum number of in flight ....
Jack L. Lo, Sujay S. Parekh, Susan J. Eggers, Heny M. Levy, Dean M. Tullsen. Software-Directed Register Deallocation for Simultaneous Multithreaded Processors. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 9, September 1999, pp. 922-933.
No context found.
J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen. Softwaredirected register deallocation for simultaneous multithreading processors. In IEEE Transactions on Parallel and Distributed Systems, 10(9), Sept. 1999.
....by 90 for the SPEC95 benchmarks. Monreal et al. 17] focus on conserving renaming registers by delaying the pipeline stage at which physical registers for destination operands are allocated. They find a 25 reduction in the number of renaming registers with little loss in performance. Lo et al. [15] investigate deallocating registers on SMT after their last use via compiler inserted annotations. They observed up to an average speedup of 60 with the most efficient annotation mechanisms. They also found that deallocating the registers of idle contexts supports a 25 reduction in the number of ....
LO, J., PAREKH, S., EGGERS, S., LEVY, H., AND TULLSEN, D. Software -directed register deallocation for simultaneous multithreading processors. IEEE Transactions on Parallel and Distributed Systems (September 1999).
....modifications to an out of order superscalar necessary to support a four context SMT translated into only a 6 increase in chip area [27] 2.1.1. 2 SMT simulator core The SMT application level simulator is a detailed, stand alone, execution based simulator used extensively in previous SMT studies [22, 43, 44, 45, 46, 47, 59, 77, 81, 82, 83]. It models the processor pipeline and memory system in great detail. While the simulator excels at modelling user level code, it lacks the facilities necessary to accurately model an operating system. On a real machine, implicit or explicit user requests for OS service begin with a trap, ....
....by 90 for the SPEC95 benchmarks. Monreal et al. 51] focus on conserving renaming registers by delaying the pipeline stage at which physical registers for destination operands are allocated. They find a 25 reduction in the number of renaming registers with little loss in performance. Lo et al. [47] investigate deallocating registers on SMT after their last use via compiler inserted annotations. They observed up to an average speedup of 60 with the most efficient annotation mechanisms. They also found that deallocating the registers of idle contexts supports a 25 reduction in the number of ....
LO, J., PAREKH, S., EGGERS, S., LEVY, H., AND TULLSEN, D. Softwaredirected register deallocation for simultaneous multithreading processors. IEEE Transactions on Parallel and Distributed Systems 10, 9 (September 1999).
....each cycle. SMT works by converting thread level parallelism into instruction level parallelism, effectively feeding instructions from different threads into the functional units of a wide issue, out of order superscalar processor [42, 41] Over the last six years, SMT has been broadly studied [22, 23, 21, 45, 24, 43, 35] and Compaq has recently announced that the Alpha 21464 will include SMT [10] As a general purpose throughputenhancing mechanism, simultaneous multithreading is especially well suited to applications that are inherently multithreaded, such as database and Web servers, as well as multiprogrammed ....
J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen. Software-directed register deallocation for simultaneous multithreading processors. IEEE Transactions on Parallel and Distributed Systems, 10(9), September 1999.
....without intervening writes of the register. Calder, et al. 1] and Gabbay and Mendelson [5] show that value locality can be profiled efficiently. They also show that static value locality is highly predictable across different inputs, which we also found. Martin, et al. 10] and Lo, et al. [9] also recognize the utility of dead registers. Martin, et al. seek to identify dead registers in hardware to avoid writing useless information into them, while Lo, et al. make dead registers available for renaming. We are attempting to find ways to put the dead registers to work with useful data. ....
J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen. Softwaredirected register deallocation for simultaneous multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, to appear.
No context found.
J. L. Lo, S. S. Parekh, S. J. Eggers, H. M. Levy, and D. M. Tullsen. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Trans. Parallel and Distributed Systems, 10(9):922--933, September 1999.
No context found.
J. L. Lo, S. S. Parekh, S. J. Eggers, H. M. Levy, and D. M. Tullsen. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 10(9), 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC