| S. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. of the 27th Annual Intl. Symp. on Microarchitecture, pages 181-- 190, 1994. |
....framework 11 There have also been several multithreading techniques proposed for dynamicallyscheduled architectures. In the MultiScalar paradigm, there are multiple superscalar cores, called processing units, consisting of their own private register file, I cache and functional units [2] [3]. Each processing unit is assigned a task, which is a contiguous region of the dynamic instruction sequence. Tasks are created statically by partitioning the control flow graph of the program. During the execution of a program, register values can flow from one task to another. Each task has two ....
S. E. Breach, T.N. Vijaykumar and G. S. Sohi, "The Anatomy of the Register File in a Multiscalar Processor", in Proc. 27tht Ann. Int'l Symp. Microarchitecture, San Jose, CA, Dec. 1994.
....special issue on cache memories of the Transactions on Computers [73,84] 25 2.2 Processor paradigms 2.2.1 Multiscalar paradigm 2.2.1. 1 Managing complexity in multiscalar processors Work in the area of multiscalar processors, by Franklin and Sohi [20,21,22,24] and Sohi, Breach, and Vijaykumar [8,9,10,100,113,114,115], first recognized the complexity of implementing wide instruction issue in the context of centralized resources. They pointed out the difficulty of scaling instruction fetch and dispatch bandwidth, the register file, and the instruction window and associated issue mechanisms. The result of their ....
....to support parallel invalidation and verification. Forms of selective recovery appear in at least three other contexts. Firstly, although entire tasks are squashed, the multiscalar processor selectively repairs registers in a well orchestrated effort among the multiple, distributed register files [8]. Secondly, although all instructions are squashed after a branch misprediction, the instruction reuse buffer [98] selectively re executes instructions based on the state of the reuse buffer. Finally, the DMT architecture [2] performs selective re execution; instructions are re fetched from the ....
S. Breach, T. Vijaykumar, and G. Sohi. The Anatomy of the Register File in a Multiscalar Processor. 27th International Symposium on Microarchitecture, pages 181--190, November 1994.
....also be identi ed statically from the binary. At run time, register values are forwarded from one processor to another with the aid of a ring structure. Recovery from mis speculation is achieved by maintaining two copies of the registers, along with a set of register masks, in each processing unit [2]. The Dynamic Speculative Multithreaded processor [17] follows the Multiscalar approach and also incorporates additional hardware to allow threads to be generated dynamically at run time. Overall, these processors have sucient hardware support to tailor the architecture for speculative execution. ....
....bus. The bu er would potentially hold all the live registers after the last speculative thread until a new thread is initiated on the successor. In addition, this would also require further hardware support in the form of duplicate register sets in each processor to enable recovery from squashes [2]. Alternatively, a global register set may be maintained to store these values [24] but at the cost of maintaining a centralized structure. StartSync Sync Logic Register # Valid StartSync Sync Available Register ThreadMask Figure 5: Logic to check register availability. In our scheme, we ....
[Article contains additional citation context not shown here]
S. Breach, T. N. Vijaykumar, and G. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In 27th International Symposium on Microarchitecture (MICRO-27), pages 181-190, December 1994.
....applications using this speculative approach [23] Some designs are largely specialized towards speculation, like the Multiscalar [22] and Trace [20] processors. They add signi cant hardware, such as duplicate registers for each processor along with a bu ered ring network for communication [3], a centralized register le and a global free list [28] or a centralized global register set and per processor register sets [20] The other designs have less hardware support for speculation [10, 12, 23, 24] The philosophy of these speculative light systems is to augment the CMP with just ....
....would have to potentially hold all the live registers after the last speculative thread until a new thread is initiated on the successor. In addition, this approach would require further hardware support in the form of duplicate register sets in each processor to enable recovery from squashes [3]. Alternatively, a global register set may be maintained to store these values [20] but at the cost of maintaining an additional centralized structure. 20 StartSync Sync Logic Register # Valid StartSync Sync Available Register ThreadMask Figure 8: Logic to check register availability. ....
[Article contains additional citation context not shown here]
S. Breach, T. N. Vijaykumar, and G. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In 27th International Symposium on Microarchitecture (MICRO-27), pages 181-190, December 1994.
....(in parallel) In order to guarantee the sequential semantics, a Multiscalar processor performs the followings: ffl Enforce the strict sequential execution order in each processing unit. ffl Provide a single view of register file for distributed register files located on each processing unit [17]. ffl Support data dependence speculation by Address Resolution Buffer (ARB) 39, 52] ffl Speculate on the task level execution through hardware based path prediction [70] Since each task corresponds to a thread, TLP is extracted at runtime through aggressive speculation on data and control ....
Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 181--190, San Jose, California, November 30--December2, 1994. ACM SIGMICRO and IEEE-CS TC-MICRO.
....it can also be done from the binary. At run time, register values are forwarded from one processor to another with the aid of a ring structure, while recovery from mis speculation is achieved by maintaining two copies of the registers, along with a set of register masks, in each processing unit [7]. Overall, both these processors have su cient hardware support that tailors the architecture for speculative execution, thereby enabling them to achieve high performance on existing sequential binaries without the need for full re compilation of the source program. A direct consequence of this, ....
....generates the register. We could allow the values to be stored, by using a bu ered communication mechanism, rather than using a simple broadcast bus. But this would also require further hardware support in the form of duplicate register sets in each processor to enable recovery from squashes [7]. Alternatively, a global register set may be maintained to store the values [69] but at the cost of maintaining a centralized structure. In our scheme, we add minimal hardware to support a consumer initiated approach, where communication occurs when the consumer needs the register. To support ....
[Article contains additional citation context not shown here]
S.E. Breach, T. N. Vijaykumar, and G.S. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In 27th International Symposium on Microarchitecture (MICRO-27), pages 181-190, December 1994.
....their own cleanup code we recently added exception handling to Haskell for this purpose [10, 11] Multiscalar processors perform a considerable amount of speculative evaluation and must clean up their internal state when a speculative evaluation is terminated. For example, Breach et al. [1] describe an architecture which tracks dependencies between different stages of the processor. Terminating one stage automatically terminates those stages which have used values produced by the terminating stage. Like our technique, cleanup is performed automatically; unlike our approach work done ....
S. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In 27th Annual International Symposium on Microarchitecture (MICRO-27), pages 181--190. ACM press, 1994.
....speculation, while in section 4, we present an implementation framework for this method. In section 5, we provide experimental data on the dynamic behavior of memory dependences and present an evaluation of an implementation of the method we propose within the context of a Multiscalar processor [3,4,7,20]. Finally, in section 6 we list what, in our opinion, are the contributions of this work and offer concluding remarks. In the discussion that follows we are concerned with data dependence speculation; accordingly, we use the terms data dependence speculation, data speculation, and speculation ....
....resulting in an aggregate execution rate of multiple instructions per cycle. In this organization, the instruction window is bounded by the first instruction in the earliest executing task and the last instruction in the latest executing task. More details of the Multiscalar model can be found in [3,4,7,8,20]. In a Multiscalar processor, dependences may be characterized as intra task (within a task) or inter task (between individual tasks) The results herein are all simulated executions in which intra task memory data dependences are not speculated, but inter task memory data dependences are freely ....
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 181--190, Dec. 1994.
....Hardware provides for the synchronization and the forwarding of data around the ring. Both register and memory state have to be communicated to maintain sequential semantics across a single address space and register file. The hardware to perform this has been discussed in other publications [1][5] The global sequencer does not examine each instruction in a task before predicting the next task; rather it predicts the starting address of the next task to be executed using information from the task header of the most recently predicted task and dynamic prediction hardware. This predicted ....
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual International Symposium on Microarchitecture, December 1994.
....complexity of superscalar processors. The trace window organization proposed in [4] is the basis for the microarchitecture presented here. Conceivably, other register file and memory organizations could be superimposed on this organization; e.g. the original multiscalar distributed register file [12], or the distributed speculative versioning cache [13] So far we have discussed microarchitectures that distribute the instruction window based on task or trace boundaries. Dependence based clustering is an interesting alternative [14] 15] Similar to trace processors, the window and execution ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. 27th Intl. Symp. on Microarchitecture, pages 181--190, Nov 1994.
....or special signal wait instructions. Implicit synchronization and communication occur as side effects of other instructions. Note that an explicit communication mechanism can be built from an explicit synchronization mechanism. The Multiscalar architecture provides implicit register forwarding [2], resulting in implicit synchro16 nization and communication. The last assignment to a register in each thread is forwarded to younger threads. When younger threads read the same register, they will stall until the new value is received from an older thread. Each node determines whether it needs ....
Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. The anatomy of the register file in a Multiscalar processor. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 181--190, San Jose, California, November 30--December 2, 1994.
....threads are identified statically by a compiler. Register values are forwarded from one processor to another with the aid of a ring structure, while recovery from mis speculation is achieved by maintaining two copies of the registers, along with a set of register masks, in each processing unit [1]. Overall, both of these processors have sufficient hardware support that tailors the architecture for speculative execution, thereby enabling them to achieve high performance on existing sequential binaries without the need for re compilation. A direct consequence of this, however, is that a ....
....generates the register. We could allow the values to be stored, by using a buffered communication mechanism, rather than using a simple broadcast bus. However, this would require further hardware support in the form of duplicate register sets in each processor to enable recovery from squashes [1]. Alternatively, a global register set may be maintained to store the values [9] but at the cost of maintaining a centralized structure. In our approach, we add minimal hardware to support the consumer initiated form of communication. Specifically, the SS has simple logic that allows a consumer ....
S. Breach, T. N. Vijaykumar, and G. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In 27th International Symposium on Microarchitecture (MICRO-27), pages 181--190, December 1994.
No context found.
S. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. of the 27th Annual Intl. Symp. on Microarchitecture, pages 181-- 190, 1994.
No context found.
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In Microarchitecture, pages 181--190, November 1994.
No context found.
S. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. of the 27th Annual Intl. Symp. on Microarchitecture, pages 181-- 190, 1994.
No context found.
Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 181--190, 1994.
....instructions, as such the pipeline cannot establish data dependencies among the skipped instructions and the postreconvergent instructions. Previous schemes that fetch instructions out of order face similar problems: The Multiscalar architecture uses the compiler to specify register dependencies [3]. The Dynamic Multithreading architecture employs value speculation and intricate recovery [1] In conventional out of order pipelines rename stage, instructions map their architectural destination register to a new physical register, and place the new, architectural to physical rename map in ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 181--190, Nov. 1994.
....state among each other. Explicit threads, however, execute as in shared memory multiprocessors and only share memory and not register state. Because register communication in Multiplex is identical to that in Multiscalar, we do not discuss register communication any further and refer the reader to [6,27]. In this section we focus on memory data communication and speculation among threads. Both explicit CMPs and proposals for scalable implicit CMPs [21,28,12] rely on snoopy bus based protocols to maintain memory data integrity. In both CMPs, the CPUs private caches enable efficient data sharing ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 27), pages 181--190, Nov. 1994.
....and memory values among each other and explicit threads communicate only memory values. Multiplex uses Multiscalar s register communication mechanism for register dependencies among implicit threads, and we do not discuss the details of the register communication mechanism and refer the reader to [8,29]. In this section we focus on memory data communication among both implicit and explicit threads. In both explicit and implicit modes, the CPUs private caches enable efficient data sharing by making copies of accessed data close to each CPU. The main responsibility of the memory system in both ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 27), pages 181--190, Nov. 1994.
....a number of different implementations of this method. In Section 5, we provide an evaluation of A Dynamic Approach to Improve the Accuracy of Data Speculation, A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, G. S. Sohi 2 these implementations within the context of a Multiscalar processor [15, 16, 17, 18]. Finally, we provide a summary of this work in Section 6 and offer concluding remarks. 2 Data Speculation As a program executes, data values are produced and consumed by instructions of the program; such values are conveyed from the producer to the consumer by binding the value to a named ....
.... recent dynamically scheduled superscalar processors which implement data speculation of memory references (albeit with no regard for the accuracy of this data speculation) 23,24] In a dynamically scheduled processing model with multiple (dynamic) program counters, such as the Multiscalar model [15,16,17,18], the problem of data speculation is especially important. In the Multiscalar model, multiple program counters are used to sequence through the static (sequential) program in parallel, with heavy use of control and data speculation. Here, a load may be issued before it is even known if any ....
[Article contains additional citation context not shown here]
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. MICRO-27, pages 181-- 190, December 1994.
....dependences involving the register ri, which holds the variable i, are unambiguously known. Consequently, memory dependences are typically honored via dynamic speculation and verification by the hardware [37] 38] and register dependences are typically honored via synchronization and communication [18] [39] 95] as specified by the compiler. It is important to note that if a memory dependence is known at compile time then the dependence need not be speculated on, but may be honored via synchronization and communication, like register dependences. Similarly, it may be advantageous to employ ....
....In Section 4.5, I describe the details of the implementation of register communication generation. In Section 4.6, I discuss related work. 4. 1 Register communication model The Multiscalar architecture provides a distributed physical register file implementation of a single logical register file [18]. I now explain the abstract model of communication of register values among the different physical register files to maintain the semantics of a single logical register file. In the set of all architectural registers, there are two mutually exclusive and collectively exhaustive subsets: 1) The ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Conference Record of the 25th Annual International Symposium on Microarchitecture, pages 181--190, San Jose, CA, Nov. 1994. Association for Computing Machinery.
....computed by another task executing on a different PU. Intra task dependences are handled by the processing units, similar to superscalar processors. In the case of inter task register data dependences, a producer task communicates the required value to the consumer task when it has been computed [3]. In the case of inter task memory data dependences, memory dependence speculation is employed; a task begins by speculating that it does not depend on any previous task for memory values and executes loads from the specified addresses. If the speculation is incorrect (i.e. a previous task ....
....a task: 1) many memory dependences are unknown or ambiguous at compile time and (2) including a data dependence within a task may result in a task with more successors than desired. There are many data dependence detection techniques for memory dependences through memory disambiguation schemes [3] [20] These techniques work well for programs that do not employ intricate pointers. Due to the prevalence of pointers in most of our benchmarks, we rely on the memory dependence synchronization mechanism [11] to avoid excessive squashing and the ARB to ensure correctness. But register ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Proceedingsof the 25th Annual International Symposium on Microarchitecture, pages 181--190, San Jose, CA, Nov. 1994. Association for Computing Machinery.
....on overall performance. In Section 6, we offer concluding remarks. 2 Register Communication Model The register file of a Multiscalar processor provides the appearance of a logically centralized register file, yet is implemented as physically decentralized register files, queues, and control logic [4]. Each processing unit has its own set of hardware registers; hence, each task has its own renamed version of the hardware registers. This approach allows a Multiscalar processor to exploit intra task register communication locality within a single processing unit and to recover the precise ....
S. Breach, T. Vijaykumar, and G. Sohi. The anatomy of the register file in a multiscalar processor. In Conference Record of the 25th Annual International Symposium on Microarchitecture, pages 181--190, San Jose, CA, November 1994. Association for Computing Machinery.
No context found.
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The anatomy of the register file in a multiscalar processor. In Proc. MICRO-27, pages 181--190, December 1994.
No context found.
S. E. Breach, T. N. Vijaykumar, G. S. Sohi, "The Anatomy of the Register File in a Multiscalar Processor," In Proc. MICRO-27, pp.181-190, December, 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC