40 citations found. Retrieving documents...
A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeoung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, 13(3):48--61, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Case Study of Shared Mmeory and Message Passing: The Triangle.. - Lew   (Correct)

....for Computer Science. Alewife is a distributed sharedmemory, cache coherent multiprocessor. Each processor is tightly coupled with a memory bank. Each node contains its own memory bank, part of which is used as a portion of the single shared address space. Each node consists of a Sparcle processor [3] clocked at 20MHz, a 64KB direct mapped cache with 16 byte cache lines, a communications and memory management unit (CMMU) a floating point coprocessor, an Elko series mesh routing chip (EMRC) from Caltech, and 8MB of memory [2] The EMRCs implement a direct network [24] with a two dimensional ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. " IEEE Micro, June 1993, pp. 48-61.


Handling Long-latency Loads in a Simultaneous Multithreading.. - Tullsen, Brown (2001)   (9 citations)  (Correct)

....of hardware multithreading primarily because it hides short latencies (which can often dominate performance on a uniprocessor) much more effectively. For example, neither fine grain multithreaded architectures [2, 8] which context switch every cycle, nor coarse grain multithreaded architectures [1, 12], which context switch only on long latency operations, can hide the latency of a single cycle integer add if there is not sufficient parallelism in the same thread. What has not been shown previously is that an SMT processor does not necessarily handle very long latency operations as well as ....

....examined in this paper. One important reason for that has been the inability of pre 2000 instantiations of the SPEC benchmarks to put significant pressure on a reasonable cache hierarchy. Less aggressive models of multithreading are less prone to such problems. Coarse grain multithreading [1, 12] is aimed only at the long latency load problem, and makes no attempt to address any other machine latency. Because coarse grain architectures allow only one thread to have access to execution resources at any time, they alway flush stalled threads completely from the machine. Fine grain ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, June 1993.


The `Uniform Heterogeneous Multi-threaded' Processor Architecture - Towner, May (2001)   (Correct)

.... order execution, and register renaming to extract instruction level parallelism (ILP) the performance of a single thread can be greatly improved. A common design approach has been to take an existing uni processor implementing these features, and add extra hardware to implement multi threading [6, 7, 8, 9]. Such processors are more complex than equivalent uni processors. In this paper we attack poor single thread performance from a different angle. Rather than trying to make the processor more effective at executing a single thread, we try to ensure that multiple threads are always available. We ....

....tolerated by the multi threaded processor nodes. The first commercial multi threaded processor, the HEP [2] was originally designed for this purpose, and the later derivative, the Tera [18] is used similarly. Other examples of multi threaded processors in multi processor systems include Sparcle [8], MAJC [19] and the Para PC [20] There are a number of ways in which multi threaded processors may be differentiated from each other. These include their execution unit design (e.g. pipelined, superscalar, VLIW) their instruction issue policy, and their memory consistency model (e.g. CREW) ....

Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald Yeoung, Geoffrey D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, 13(3):48--61, June 1993.


Architectural Support for an Efficient Implementation of a.. - Grahn, Stenström (1995)   (Correct)

....directory protocol requests take longer times as a result of software handler invocations. To achieve a fast software handler start up, the processor needs a fast interrupt mechanism. It may be implemented either as a conventional interrupt hardware or as a multithreaded processor, e.g. Sparcle [2] which is used in the Alewife [1] In this study we assume the context switch times from the optimized Sparcle processor described in [2] and use them as a base for how fast a software handler starts to execute. Further, the access times of the interrupt and send buffers are the same as for the ....

....the processor needs a fast interrupt mechanism. It may be implemented either as a conventional interrupt hardware or as a multithreaded processor, e.g. Sparcle [2] which is used in the Alewife [1] In this study we assume the context switch times from the optimized Sparcle processor described in [2] and use them as a base for how fast a software handler starts to execute. Further, the access times of the interrupt and send buffers are the same as for the SLC, i.e. 3 pclocks. The default latency of a software handler is 50 pclocks. In these 50 pclocks the following actions are included: ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors," IEEE Micro, 13(3):48-61, June 1993.


Synchronisation in a Multithreaded Processor - Sen, Muller, May (2000)   (Correct)

....suspended 4 S. Sen, H. Muller and D. May Synchronisation in a Multithreaded Processor SRCTID PEERTID = PEER SRC TID NOTIFY SIGNAL Figure 2: Synchronisation table on the current synchronisation to resume. Previous multithreaded architectures such as the Denelcor HEP [5] and MIT Alewife [6] use a derivative of the I structure to synchronise using data. This is accomplished by tagging each memory word with a full empty bit which can be tested. While this is an effective method for dealing with communication via shared memory it is expensive to tag every memory word. In addition, ....

Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald Yeung, Godfrey D'Souza, and Mike Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. In IEEE Micro, pages 48--61, 1993.


Reducing Data and Control Transfer Overhead through.. - Bhoedjang, Verstoep, .. (2000)   (Correct)

....application level performance advantage of NIlevel multicasting. Some papers compare host level and NI level multicasting using microbenchmarks, but they use host level multicast schemes that are less aggressive than the scheme used by LCI host and LCI mixed. Research systems such as the Alewife [1] have attempted to reduce the nominal cost of an interrupt. Even without hardware support, operating systems can dispatch interrupts more efficiently by saving less processor state (and less often) 15] These proposals, however, have not found their way into commercial architectures and operating ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro, 13(3):48--61, June 1993.


Simulation Study of Multithreaded Virtual Processor - Lee, Kwak, Carlson, Al. (1998)   (Correct)

....in systems such as HEP [16] and Tera [3] However, these systems require considerable modification to the underlying architecture. There also has been an effort to integrate multithreading support on an existing processorMIT Alewife machine uses a modified SPARC processor called Sparcle [1]. However, Sparcle is based on an outdated processor design; therefore, it is unclear what effect multithreading will have on modern superscalar architectures. Multithreading has also been extensively used strictly as a programming paradigm (i.e. software controlled multithreading) on ....

Agarwal, A. et al., "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors," Proc. of Workshop on Scalable Shared Memory Multiprocessor, Kluwer Academic Publishers, 1991.


A Parallel Object-Oriented System for Realizing Reusable and.. - Lim (1993)   (7 citations)  (Correct)

....(e.g. by having a small processor state) We think that if a machine like CM 5 were to use MIPS R4000 [178] instead of a Sparc, the overheads of our implementation would be much less. Otherwise, a processor design targeted for multiprocessor also holds better promise. One such example is Sparcle [93] which supports fast context switching by partitioning the set of register windows among multiple threads. 43 Altering a C compiler was beyond the scope of our work and we could not find any reliable C compiler that does not use the Sparc register windows. 182 Chapter 4 Abstractions and ....

Anant Agarwal et al. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro, 13(3):48--61, June 1993.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....is required, the latency is hidden by using even more streams. A very similar architecture is used in the more recent Tera system [18, 15, 16] The MIT Alewife takes a slightly different approach: threads are switched only upon a remote access or a synchronization failure, not on every instruction [3, 4]. In all these architectures the role of the operating system is limited to mapping multiple threads to each PE. A third class of systems with hardware scheduling that amounts to local queues is represented by the MIT J Machine. This is a message driven architecture, where threads are created and ....

....8 Alliant PEs) A server process executes periodically on each cluster, 68 PE 1 PE 2 PE 3 PE 4 i = 1 . x: a[0] a[1] f(x) i = 2 . x: a[1] a[2] f(x) i = 3 . x: a[2] a[3] f(x) i = 4 . x: a[3] a[4]: f(x) x: a[4] a[5] f(x) x: a[5] a[6] f(x) x: a[6] a[7] f(x) x: a[7] a[8] f(x) i = 5 i = 6 i = 7 i = 8 i = 9 . x: a[8] a[9] f(x) ....

[Article contains additional citation context not shown here]

A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: an evolutionary processor design for large-scale multiprocessors". IEEE Micro 13(3), pp. 48--61, Jun 1993. 122


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

.... the Denelcor HEP, was based on this design [325, 180, 194] and a very similar architecture is used in the more recent Tera system [11, 9, 10] The MIT Alewife takes a slightly different approach: threads are switched only upon a remote access or a synchronization failure, not on every instruction [2, 3]. In all these architectures the role of the operating system is limited to mapping multiple threads to each PE. A third class of systems with hardware scheduling that amounts to local queues is represented by the MIT J Machine. This is a message driven architecture, where threads are created and ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: an evolutionary processor design for large-scale multiprocessors". IEEE Micro 13(3), pp. 48--61, Jun 1993. 72


TTAs: Missing the ILP complexity wall - Corporaal (1999)   (3 citations)  (Correct)

....switching quickly to another runnable thread of execution. Some of these architectures are presented as descendents from dataflow architectures [3, 4] They tend to avoid the dynamic token matching. Others are inspired by concurrent object oriented computation, like the J machine [5] and Sparcle [6]. More recently several research groups have proposed usage of multi processor architectures running multiple threads extracted from a single (sequential) program, with the aim to exceed the amount of instruction level parallelism exploitable from a single thread [7, 8, 9, 10, 11] Each thread is ....

Anant Agarwal et al. Sparcle: an evolutionary processor design for large-scale multiprocessors. IEEE Micro, pages 48--61, June 1993.


Processor Mechanisms for Software Shared Memory - Carter   (Correct)

....as H Threads, and selects an instruction to issue based on operand and instruction availability. This allows zero cycle multithreading between the threads running on each cluster, eliminating the context switch overhead found on many earlier multithreaded processors, such as the Alewife SPARCLE [2]. To implement zero cycle multithreading, the MAP chip replicates the instruction decode and register read stages of the pipeline for each thread slot and adds an additional pipeline stage, known as the synchronization stage, to the pipeline, as shown in Figure 3.3. Instructions from each of the ....

Anant Agarwal et al. Sparcle: an evolutionary processor design for large-scale multiprocessors. IEEE Micro, 13(3):48-61, June 1993


Efficient Strategies for Software-Only Directory Protocols.. - Grahn, Stenström (1995)   (7 citations)  (Correct)

....is how many cycles we charge for the software handlers. First, to achieve a fast start up of the software handlers, the processor must have a fast interrupt mechanism. It can be implemented either as conventional interrupt hardware in the processor or as a multithreaded processor, e.g. Sparcle [2] which is used in the MIT Alewife [1] In this study we take the latency numbers from the optimized Sparcle processor described in [2] and use them as a base for how fast a software handler can be dispatched. Conservatively, the state and directory information of a block are not cached in the ....

....have a fast interrupt mechanism. It can be implemented either as conventional interrupt hardware in the processor or as a multithreaded processor, e.g. Sparcle [2] which is used in the MIT Alewife [1] In this study we take the latency numbers from the optimized Sparcle processor described in [2] and use them as a base for how fast a software handler can be dispatched. Conservatively, the state and directory information of a block are not cached in the simulations. To partly compensate for the slow access to the directory and state memory, we simulate a direct path between the memory and ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors", IEEE Micro, 13(3):48-61, June 1993.


A Queuing Model of Multithreading: A Case Study - Vlassov Thorelli   (Correct)

..... 5 1: Introduction Most of recent scalable shared memory architectures typically provide different combinations of latency reducing and tolerating mechanisms, such as coherent cacheing, weak ordering, data prefetching, and multithreading [3, 6, 8]. Multithreading [11, 15, 18] is used for hiding long memory latency in multiprocessor systems, and aims to increase system efficiency. A number of threads are allocated to a processing node which switches thread contexts according to some context switch policy, such as switch on cache misses, ....

....synchronization latencies caused by synchronization locks. These latencies are more application dependent and mainly unpredictable. The node of the Efficient Architecture for Running Threads, EARTH [16] the node of the Multiple Executing Thread Architecture, META [7] and the Sparcle processor [3] of the MIT Alewife machine can be mentioned as examples of the processors with block multithreading. Multi Threaded Architectures, MTAs, are still the subject of considerable interest and many aspects of these architectures have been studied and continue to be studied in both industry and ....

A. Agarwal, J. Kubiatowicz, D.Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and Mike Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors", IEEE Micro, June 1993, pp.48-61.


Design and Implementation of a Multi-purpose Cluster System Network .. - Ang (1999)   (Correct)

....that we studied. Alewife Alewife [1, 58] an earlier experimental machine built at MIT, is similar to our work in that it supports both message passing and shared memory. Its communication support is programmable in an interesting way. Alewife uses a modified Sparc processor, called Sparcle [2], which has hardware multi threading and fast interrupt support. This makes it feasible for its CMMU [57] the cache and memory management unit which also serves as the network interface unit, to interrupt Sparcle and have software take over some of its functions. For instance, this is used to ....

....become incoherent. The most common way of maintaining coherence among caches of a bus based SMP is through bus snooping techniques which maintains book keeping at the cache line granularity [7] In distributed implementations, cache coherence is typically maintained with a directory based approach [65, 66, 1, 2, 62] which again keeps cache line granularity state information. Compared to uncached shared memory, caching shared memory takes advantage of data locality, both temporal and spatial, at the cost of a fair amount of book keeping and design complexity. NIU access to system bus is needed to make this ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro, 13(3), May/Jun 1993.


Non-Preemptive Scheduling of Real-Time Threads on.. - Jonsson, Lönn, Shin (1998)   (Correct)

.... existing real time systems can be classified as MLC architectures, for instance, real time operating systems with support for user or kernel level threads [10, 11] In addition, thread level parallelism is believed to be the only alternative to maintain the performance growth of microprocessors [12, 13]. Because one can predict an increasing demand for computing power in modern real time applications, such as multimedia and Virtual Reality workloads [14] it is very likely that multithreaded architectures will also be used widely in future real time systems. In order to analyze the real time ....

....of a number of components. For contemporary computer architectures, the context level can be classified as being on chip, primary memory or address space. An on chip context is typically found in multiple context, multithreaded architectures where the registers are either duplicated in hardware [12, 19], or partitioned in software [20] so that the processor can simultaneously hold multiple active threads. The cost for switching between two on chip contexts on a pipelined processor is in the range of 5 10 clock cycles. This includes the cost for flushing the instruction pipeline, saving ....

[Article contains additional citation context not shown here]

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors," IEEE Micro, vol. 13, no. 3, pp. 48--61, June 1993.


The Named-State Register File: Implementation and Performance - Nuth, Dally (1995)   (7 citations)  (Correct)

....concurrent threads in order to mask communication and synchronization latencies. Most parallel applications frequently pass data among processors. Fine grain programs send messages every 75 to 100 instructions [12] each of which may require a round trip latency of more than 100 instruction cycles [3]. Threads also often synchronize with other threads to exchange data. A thread may only run 20 to 80 instructions [6] between synchronization points, and may wait an unbounded amount of time at any synchronization point [20] Stalling for every remote access or synchronization point would waste a ....

....processor to quickly switch among those threads, although switching outside of that small set is no faster than on a conventional processor. Multithreaded processors may interleave successive instructions from different threads on a cycle by cycle basis [27,16,24,19] or as blocks of instructions [8,3]. Although the techniques introduced in this paper are applicable to both forms of multithreading, this discussion will concentrate on block multithreading. 3.1 Segmented Register Files Figure 2 describes a typical implementation of a multithreaded processor [27, 16,3,28] This processor ....

[Article contains additional citation context not shown here]

Anant Agarwal et al. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, June 1993.


A System Area Network Characterization In A Commercial Cluster - Booth (1998)   (Correct)

....without kernel modifications or special privileges. Standard Unix interfaces, compilers, and linkers are used. As a result of this flexibility, Treadmarks has been ported to IBM, DEC, HP, SGI, and Sun systems. Other related low latency prototype work includes the Alewife Machine [29] Sparcle [28], and StarT NG [30] Page 8 of 42 2.4 The LogP Model The LogP Model [1, 16, 17] describes a distributed memory multiprocessor system in which processors physically communicate using point to point messages. The LogP Model provides a tool to investigate design trade offs in communication ....

A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro, pp. 48-61, June 1993.


Characterizing a New Class of Threads in Scientific .. - Rodrigues, Murphy, ..   (Correct)

No context found.

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeoung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, 13(3):48--61, 1993.


Proceedings of 12th Intl Conference on Parallel.. - Initial Observations Of   (Correct)

No context found.

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, June 1993.


Performance Implication of Fine-Grained Synchronization in .. - Merino, Vlassov, al.   (Correct)

No context found.

Agarwal, A.; Kubiatowicz, J.D.; Kranz, D.; Lim, B.H.; Yeung, D.; DSouza, G. and Parkin, M.: "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors", Laboratory for Computer Science, Massachussets Institute of Technology, 1993


Compiling for Instruction Cache Performance on a.. - Kumar, Tullsen (2002)   (Correct)

No context found.

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, June 1993.


Superscalar Performance in a Multithreaded Microprocessor - Gunther (1993)   (3 citations)  (Correct)

No context found.

A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: an evolutionary processor design for large-scale multiprocessors," IEEE Micro, vol. 13, no. 3, pp. 48--61, June 1993.


Parallelization of Fine-grained Irregular DAGs - Chong, Sharma, Brewer, Saltz   (Correct)

No context found.

Anant Agarwal, J. Kubiatowicz, D. Kranz, Beng-Hong Lim, D. Yeung, G. D'Souza, and Mike Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, pages 48--61, June 1993.


Asynchrony in parallel computing: From dataflow to.. - Silc, Robic, Ungerer (1997)   (2 citations)  (Correct)

No context found.

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, Sparcle: An evolutionary processor design for large-scale multiprocessors, IEEE Micro, 13 (June 1993), pp. 4861.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC