26 citations found. Retrieving documents...
M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Complexity-Effective Superscalar Processors - Palacharla (1997)   (161 citations)  (Correct)

....into two clusters with bypasses between clusters taking an extra cycle to complete. The selection logic steers instructions buffered in a central window to the execution cluster based on dependences. The exact steering algorithm used has not been made public. Multiscalar processors [Bre,FS92, Fra93,SBV95] pioneered the concept of using decentralized processor resources to reduce complexity. Multiple clusters, each similar in structure to a narrow superscalar, are used to execute different portions of the serial program. The different portions of the program are called tasks and can be ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-- Madison, November 1993.


Compiler Optimization of Scalar Value Communication.. - Zhai, Colohan.. (2002)   (5 citations)  (Correct)

....[8, 26] parallelization and has been exploited in previous work [6, 22, 38] All schemes for TLS support include some form of DOACROSS synchronization, although few use the compiler to optimize this aspect of speculative execution. The most relevant related work is the Wisconsin Multiscalar [11, 27, 35] compiler, which performs synchronization and scheduling for register values [35] The Multiscalar effort also evaluated hardware support for automatically detecting and synchronizing data dependences [23] The Multiscalar scheduler was designed with Multiscalar tasks in mind, and these usually ....

FRANKLIN, M. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.


Design and Evaluation of an Optimistic CPU: The WarpEngine - Littin (2000)   (Correct)

....in the DEC Alpha 21264 [Gwennap, 1996] but it can be restrictive when considering larger instruction windows. 2.3.4 Multiscalar One solution to the problems encountered when increasing the size of the IRB is to run several streams of instructions in parallel. The Multiscalar paradigm [Franklin, 1993; Sohi et al. 1995] evolved from the Expandable Split Window for re order buffers concept proposed by Franklin and Sohi [1992] Annotated code is broken into tasks that are executed as though they are independent programs with their own thread of control. A large instruction window is produced by ....

Franklin, M. [1993]. The Multiscalar Architecture. PhD thesis, University of Wisconsin-- Madison.


Trace Processors: Exploiting Hierarchy And Speculation - Rotenberg (1999)   (3 citations)  (Correct)

....recovery from data misspeculation. Control flow hierarchy and existing data speculation support are leveraged to manage the complexity of exploiting control independence. The trace processor architecture draws from a substantial and influential body of research. The multiscalar execution paradigm [20,22,100] and its adaptation from compiler defined tasks to dynamic traces [112,103,96,81] lay solid foundations for the trace 20 processor. These and other related work are discussed at length in Chapter 2 Related Work . My contributions are two fold. First, I fully develop a trace processor ....

....trace cache. Two trace cache papers appear in a special issue on cache memories of the Transactions on Computers [73,84] 25 2.2 Processor paradigms 2.2.1 Multiscalar paradigm 2.2.1. 1 Managing complexity in multiscalar processors Work in the area of multiscalar processors, by Franklin and Sohi [20,21,22,24] and Sohi, Breach, and Vijaykumar [8,9,10,100,113,114,115] first recognized the complexity of implementing wide instruction issue in the context of centralized resources. They pointed out the difficulty of scaling instruction fetch and dispatch bandwidth, the register file, and the instruction ....

M. Franklin. The Multiscalar Architecture. PhD Thesis, University of Wisconsin - Madison, November 1993.


Value Locality And Speculative Execution - Lipasti (1997)   (31 citations)  (Correct)

....restrictions on parallel issue. Recent work has focused primarily on reducing the latency of specific types of instructions (usually loads from memory) by rearranging pipeline stages [9, 10] initiating memory accesses earlier [11] or speculating that dependences to earlier stores do not exist [12, 13, 14, 15]. The most relevant prior work in the area of eliminating data flow dependences consists of the Tree Machine [16,17] which uses a value cache to store and look up the results of recurring arithmetic expressions to eliminate redundant computation (the value cache, in effect, performs common ....

....the former is speculative disambiguation, which optimistically assumes that an earlier definition does not alias with a current use, and provides a mechanism for checking the accuracy of that assumption. Speculative disambiguation has been implemented both in software [13] as well as in hardware [12, 14, 15]. Another example of this type of speculation occurs implicitly in most control speculative processors, whenever execution proceeds speculatively past a join in the control flow graph where multiple reaching definitions for a storage location are live [1] By speculating past that join, the ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-Madison, 1993.


Design And Evaluation Of A Multiscalar Processor - Breach (1998)   (10 citations)  (Correct)

....not be restricted to only those situations in which guarantees may be given about the control and data dependences in the program. 1. 7 Multiscalar Processors Much of the multiscalar work to date has been invested to introduce the fundamental ideas and mechanisms upon which the paradigm is based [25]. A comprehensive design and evaluation of the aspects of a multiscalar processor that are essential for high performance has been lacking. The foundation for such an investigation, though, has been laid in recent work that considers the roles of hardware and software for realistic implementations ....

....and potential benefit in terms of capacity to salvage correct work. The most complex and potentially beneficial approach is to discard only instructions in each task that have violated control and or data dependences, and to reissue these instructions as well as those dependent upon them [25]. A slightly less complex and possibly beneficial alternative is to discard all instructions in tasks beyond the mis speculation, but to rely on instruction reuse to reduce the amount of work that must be performed again [85] However, if mis speculations are an uncommon event, such complex ....

[Article contains additional citation context not shown here]

Manoj Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin - Madison, 1993.


The Anatomy of the Register File in a Multiscalar Processor - Scott Breach Vijaykumar (1994)   (15 citations)  (Correct)

....of the register file is described. Illustrative examples detailing important aspects of the operation of the register file and an evaluation of its effectiveness are provided. 1 Introduction The Multiscalar architecture is a novel architecture for exploiting instruction level parallelism (ILP) [1] [2] that speculatively executes multiple operations in parallel, yet provides the semblance of sequential execution. An implementation of this architecture, a Multiscalar processor, is a collection of execution engines which share a common register namespace. Each execution engine, called a ....

....addresses, the return addresses, and the create mask of some task) A full crossbar interconnects the stages to twice as many banks of interleaved data cache. Each bank has been fixed in the configuration of 8 kbytes of direct mapped data cache in 64 byte blocks and 256 address resolution entries [1] (which constitutes a total data cache of 64 kbytes and 128 kbytes for 4 stage and 8 stage processors respectively) The control flow prediction has been fixed in a PAs configuration [7] with 4 targets per prediction and 6 outcome histories. The prediction storage is composed of a first level ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-Madison, Tech Report 1196, November 1993.


Design And Evaluation Of A Multiscalar Processor - Breach (1998)   (10 citations)  (Correct)

....not be restricted to only those situations in which guarantees may be given about the control and data dependences in the program. 1. 7 Multiscalar Processors Much of the multiscalar work to date has been invested to introduce the fundamental ideas and mechanisms upon which the paradigm is based [25]. A comprehensive design and evaluation of the aspects of a multiscalar processor that are essential for high performance has been lacking. The foundation for such an investigation, though, has been laid in recent work that considers the roles of hardware and software for realistic implementations ....

....and potential benefit in terms of capacity to salvage correct work. The most complex and potentially beneficial approach is to discard only instructions in each task that have violated control and or data dependences, and to reissue these instructions as well as those dependent upon them [25]. A slightly less complex and possibly beneficial alternative is to discard all instructions in tasks beyond the mis speculation, but to rely on instruction reuse to reduce the amount of work that must be performed again [85] However, if mis speculations are an uncommon event, such complex ....

[Article contains additional citation context not shown here]

Manoj Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin - Madison, 1993.


A Dynamic Approach to Improve the Accuracy of Data Speculation - Andreas Moshovos (1996)   (2 citations)  (Correct)

....a number of different implementations of this method. In Section 5, we provide an evaluation of A Dynamic Approach to Improve the Accuracy of Data Speculation, A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, G. S. Sohi 2 these implementations within the context of a Multiscalar processor [15, 16, 17, 18]. Finally, we provide a summary of this work in Section 6 and offer concluding remarks. 2 Data Speculation As a program executes, data values are produced and consumed by instructions of the program; such values are conveyed from the producer to the consumer by binding the value to a named ....

....processor carries out data speculation) The precise state recovery hardware (used to recover from a control misspeculation) also serves the purpose of recovery from a data misspeculation. Likewise, in a Multiscalar processor, memory disambiguation hardware (the Address Resolution Buffer, or ARB [15,13]) is responsible for detecting a misspeculation, and the precise state recovery hardware is used to recover from a data misspeculation. To minimize the net cost of data misspeculation, we need to improve the accuracy of data speculation. A data speculation is erroneous if there is indeed a true ....

[Article contains additional citation context not shown here]

M. Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin-Madison, Madison, WI 53706, November 1993.


The Potential for Using Thread-Level Data Speculation to.. - Steffan, Mowry (1998)   (72 citations)  (Correct)

....re executing the thread which performed the failed speculative load. While instruction level data speculation has received much attention [5, 9, 20] the only relevant work on threadlevel data speculation for non numeric codes when we performed our study was the Wisconsin Multiscalar architecture [3, 4, 21]. This tightly coupled ring architecture assigns threads around the ring in program order, provides a hardware mechanism for forwarding register values between processors, and uses a centralized structure called the address resolution buffer (ARB) 4, 21] to detect data dependences through ....

.... in count ; hash[hash function(c) hash[hash function( if ( fout count ; putchar( g if (free entries . ffree entries = g . g (b) TLDS execution of compress Epoch 1 Epoch 2 Epoch 3 Epoch 4 hash[10] hash[21] hash[30] hash[25] hash[3] = hash[19] hash[33] hash[10] attempt commit( attempt commit( attempt commit( attempt commit( Violation Redo Processor1 Processor2 Processor3 Processor4 Epoch 4 hash[25] hash[10] attempt commit( Epoch 5 = hash[30] ....

[Article contains additional citation context not shown here]

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.


Compiling for the Multiscalar Architecture - Vijaykumar (1998)   (24 citations)  (Correct)

....the appearance of sequential program order. 16 2.3.1 Control flow Dynamic prediction unravels the control flow among tasks and each predicted task is assigned to a PU for execution. Execution proceeds by assigning tasks to PUs. After assigning a task for execution, a control flow speculation [37] [38] 55] 81] 95] is made which predicts one of the many possible successors of the task, similar to branch prediction employed by superscalar processors [56] 90] 114] 115] Since the tasks are derived from a sequential program and are predicted in the original program order (similar to ....

....rtmp, ri, 1 st rtmp, 0[rptr2] Continue: add ri, ri, 1 blt ri, rn, Loop Out: 19 But all the dependences involving the register ri, which holds the variable i, are unambiguously known. Consequently, memory dependences are typically honored via dynamic speculation and verification by the hardware [37] [38] and register dependences are typically honored via synchronization and communication [18] 39] 95] as specified by the compiler. It is important to note that if a memory dependence is known at compile time then the dependence need not be speculated on, but may be honored via ....

[Article contains additional citation context not shown here]

M. Franklin. The Multiscalar Architecture. Ph.D. thesis, University of WisconsinMadison, Madison, WI 53706, Nov. 1993.


Compiling for the Multiscalar Architecture - Vijaykumar (1998)   (24 citations)  (Correct)

....which leads 3 to the identification of more independent instructions and wider processors can execute more independent instructions simultaneously. But, larger windows and wider processors may be harder to engineer at high clock speeds, limiting performance. The Multiscalar architecture [36] [38] 95] is a novel microarchitecture to achieve highperformance on general purpose, sequential programs. To engineer a large window and a wide processor at high clock speed, the Multiscalar architecture splits one wide processor into many narrow processing units and one large window into many ....

....any interconnect may be used, since data values and other information can be communicated from a predecessor task to a successor task only (and not the other way) a simple unidirectional ring may be easier to engineer. 30 communicate data values and other information from one unit to another [36] [38] Apart from the PUs, there is a hardware predictor which predicts the tasks to be executed. The sequencer assigns tasks to PUs in the predicted program order and keeps track of the program order among them. The PUs are also connected to the memory disambiguation mechanism called the address ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of WisconsinMadison, Nov. 1993.


Task Selection for a Multiscalar Processor - Vijay (1998)   (17 citations)  (Correct)

.... processors may be harder to engineer at high clock speeds due to quadratic wire delays, limiting overall performance (e.g. the DEC Alpha 21264 already implements a two cluster pipeline because bypassing across a single larger cluster would not fit within a cycle) The Multiscalar architecture [5] [6] 14] advocates a distributed processor organization to avail of the advantages of large windows and wide issue pipeline without impeding improvements in clock speeds. The key idea is to split one large window into multiple smaller windows and one wideissue processing unit (PU) into multiple ....

M. Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin-Madison, Madison, WI 53706, Nov. 1993.


Complexity-Effective Superscalar Processors - Palacharla (1998)   (161 citations)  (Correct)

....two clusters with bypasses between clusters taking an extra cycle to complete. The selection 115 logic steers instructions buffered in a central window to the execution cluster based on dependences. The exact steering algorithm used has not been made public. Multiscalar processors [Bre,FS92, Fra93,SBV95] pioneered the concept of using decentralized processor resources to reduce complexity. Multiple clusters, each similar in structure to a narrow superscalar, are used to execute different portions of the serial program. The different portions of the program are called tasks and can be ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-- Madison, November 1993.


The Potential for Thread-Level Data Speculation in.. - Steffan, Mowry (1997)   (14 citations)  (Correct)

....of transistors that can be integrated onto a single VLSI chip continues its dramatic rate of increase, processor architects are faced with the pleasant challenge of finding the best way to translate these additional resources into improved performance. While there have been several proposals [6, 7, 14, 15, 23, 28, 27], perhaps one of the more compelling options is to integrate multiple processors onto a single chip [5, 20, 17, 14, 7] From a VLSI perspective, single chip multiprocessors are attractive because their distributed nature allows the bulk of the interconnections to be localized, thus avoiding the ....

....are faced with the pleasant challenge of finding the best way to translate these additional resources into improved performance. While there have been several proposals [6, 7, 14, 15, 23, 28, 27] perhaps one of the more compelling options is to integrate multiple processors onto a single chip [5, 20, 17, 14, 7]. From a VLSI perspective, single chip multiprocessors are attractive because their distributed nature allows the bulk of the interconnections to be localized, thus avoiding the delays associated with long wires [20] While single chip multiprocessing will clearly increase computational ....

[Article contains additional citation context not shown here]

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.


Exceeding the Dataflow Limit via Value Prediction - Lipasti, Shen (1996)   (136 citations)  (Correct)

....restrictions on parallel issue. Recent work has focused primarily on reducing the latency of specific types of instructions (usually loads from memory) by rearranging pipeline stages [9, 10] initiating memory accesses earlier [11] or speculating that dependences to earlier stores do not exist [12, 13, 14, 15]. The most relevant prior work in the area of eliminating data flow dependences consists of the Tree Machine [16,17] which uses a value cache to store and look up the results of recurring arithmetic expressions to eliminate redundant computation (the value cache, in effect, performs common ....

....the former is speculative disambiguation, which optimistically assumes that an earlier definition does not alias with a current use, and provides a mechanism for checking the accuracy of that assumption. Speculative disambiguation has been implemented both in software [13] as well as in hardware [12, 14, 15]. Another example of this type of speculation occurs implicitly in most control speculative processors, whenever execution proceeds speculatively past a join in the control flow graph where multiple reaching definitions for a storage location are live [1] By speculating past that join, the ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-Madison, 1993.


A Feasibility Study of Hierarchical Multithreading - Zahran, Franklin (2002)   Self-citation (Franklin)   (Correct)

....microarchitectural techniques and compiler optimizations. But exploiting parallelism at this fine granularity and from a single thread seems to reach its limit, and the need for exploiting parallelism at different granularities and from multiple threads arises. Many proposals such as multiscalar [4][11] trace processing [10] superthreading [12] and clustered multithreading [3] have been proposed to exploit thread level parallelism (TLP) However, all of these proposals exploit TLP at a single granularity only. In this paper we investigate the potential of a hierarchical multithreading ....

....thread ordering. No explicit synchronization operations are necessary. The purpose of identifying threads in such a model is to indicate that those threads are good candidates for parallel execution in a multithreaded processor. Prior proposals using sequential threads are the multiscalar model [4][11] the superthreading model [12] the trace processing model [10] and the dynamic multithreading model [1] In the sequential threads model, threads can be nonspeculative or speculative from the control point of view. A thread is non speculative if it is guaranteed to commit, and is ....

[Article contains additional citation context not shown here]

M. Franklin, The Multiscalar Processor. PhD thesis, University of Wisconsin-Madison, 1993.


Dynamic Speculation and Synchronization of Data Dependencies - Moshovos, al. (1997)   (85 citations)  Self-citation (Architecture)   (Correct)

....speculation, while in section 4, we present an implementation framework for this method. In section 5, we provide experimental data on the dynamic behavior of memory dependences and present an evaluation of an implementation of the method we propose within the context of a Multiscalar processor [3,4,7,20]. Finally, in section 6 we list what, in our opinion, are the contributions of this work and offer concluding remarks. In the discussion that follows we are concerned with data dependence speculation; accordingly, we use the terms data dependence speculation, data speculation, and speculation ....

....In the latter case, the effects of the speculation must be undone. Consequently, some means are required for detecting erroneous speculation and for ensuring correct behavior. Several mechanisms that provide this functionality, in either software and or hardware, have been proposed [7,8,9,10,16,18]. Though data dependence speculation may improve performance when it is successful, it may as well lead to performance degradation because a penalty is typically incurred on mis speculation. Consequently, to gain the most out of data dependence speculation we would like to use it as aggressively ....

[Article contains additional citation context not shown here]

M. Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin -Madison, Madison, WI 53706, Nov. 1993.


Synthesis Of The Kestrel Multiscalar Processor - Padmaja Nandula (1998)   Self-citation (Thesis)   (Correct)

No context found.

M.Franklin, The Multiscalar Architecture. PhD thesis, University of WisconsinMadison, 1993. 77


Register Communication Strategies for the Multiscalar.. - Vijaykumar Scott (1996)   (3 citations)  Self-citation (Architecture)   (Correct)

....these hardware and compiler techniques on a Multiscalar processor configuration. The key result we obtained is that aggressive hardware support for register data speculation can be out performed by simpler hardware supplemented by compiler analyses. 1 Introduction The Multiscalar architecture[1] [2] is a novel paradigm to exploit instruction level parallelism. Sequential programs are partitioned into code fragments called tasks, which are assigned to a collection of processing units connected via a uni directional ring for communication. Each processing unit executes the instructions of its ....

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin-Madison, November 1993.


Compiler Optimization of Value Communication for Thread-Level.. - Zhai (2005)   (Correct)

No context found.

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.


Hardware Support for Thread-Level Speculation - Steffan (2003)   (Correct)

No context found.

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.


Procedure Cloning and Integration for Converting Parallelism.. - So, Dean (2003)   (Correct)

No context found.

M. Franklin. The Multiscalar Architecture. PhD Thesis, University of Wisconsin-Madison, November 1993.


Memory Dependence Prediction - Andreas Ioannis Moshovos   (Correct)

No context found.

M. Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin-Madison, Madison, WI 53706, November 1993.


Hardware Support for Thread-Level Speculation - Steffan (2003)   (Correct)

No context found.

M. Franklin. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC