11 citations found. Retrieving documents...
J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. Willow: a scalable shared memory multiprocessor. In Proceedings. Supercomputing '92, pages 336--345, November 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....is a large body of work aimed at improving the performance of cachecoherent shared memory architectures. For example, researchers have studied adaptive or user compiler selectable cache coherence mechanisms that use different coherency protocols for different sharing patterns [Carter et al. 91, Bennett et al. 92, Stenstrom et al. 93] Some machines like the KSR 1 [KSR 92] provide processor instructions to prefetch or poststore data, or load data in a state that facilitates future writes. Most of these techniques try to improve performance by giving the application more explicit control over how and when ....

....that a significant fraction of the total traffic used by the shared memory architectures for these benchmarks is explicit synchronization. We argue that those machines would benefit greatly from architectural support, such as adaptive or user selectable cache coherence protocols [Carter et al. 91, Bennett et al. 92] full empty bits on memory words [Agarwal et al. 91, Alverson et al. 90] a network or network interface dedicated to barrier synchronization and reduction (e.g. the control network on the CM 5 [TMC 91b] or direct access to message passing primitives as implemented on the Alewife machine ....

[Article contains additional citation context not shown here]

J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. Willow: a scalable shared memory multiprocessor. In Proceedings. Supercomputing '92, pages 336--345, November 1992.


A Comparison of Message Passing and Shared Memory.. - Klaiber, Levy (1994)   (14 citations)  (Correct)

....that a significant fraction of the total traffic used by the shared memory architectures for these benchmarks is explicit synchronization. We argue that those machines would benefit greatly from architectural support, such as adaptive or user selectable cache coherence protocols [Carter et al. 91, Bennett et al. 92] full empty bits [Agarwal et al. 91, Alverson et al. 90] on memory words, a network or network interface dedicated to barrier synchronization and reduction (e.g. the control network on the CM 5 [TMC 91] or direct access to message passing primitives as implemented on the Alewife machine ....

J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. Willow: a scalable shared memory multiprocessor. In Proceedings. Supercomputing '92, pages 336--345, November 1992.


Carlsberg: A Distributed Execution Environment Providing.. - Koch, Fowler (1994)   (1 citation)  (Correct)

....These advantages have prompted the design and or construction of a large number of parallel computer systems with hardware 1 and software [4,5,11,14,27,29,34] implementations of the shared memory abstraction. Scaleable Shared Memory has become the rallying cry of numerous research projects [2,3,8,25,26]. On the other hand, coherent shared memory can have performance problems compared with alternative abstractions such as raw message passing, object based distributed languages, and remote procedure call or object invocation. These recognized performance problems motivate current [21,22] and ....

J.K. Bennett, S. Dwarkadas, J.A. Greenwood, and E. Speight. Willow: A scalable shared memory multiprocessor. In Proceedings of Supercomputing '92, pages 336--345, November 1992.


On the Synchronization Mechanisms in Distributed Shared.. - Ramachandran, Singhal (1994)   (Correct)

....A DSM system can be implemented either in hardware or software. Early DSMs were software based and were implemented on multicomputers whose architecture provided no support for a shared memory environment [19, 23, 27, 3] Increasingly, support for DSM is being provided in hardware for efficiency [2, 11, 12, 1, 17]. This has enabled the development of distributed memory computers with hardware support for shared memory. Software implemented systems are designed to provide the DSM on existing distributed memory systems. These systems were implemented either as an entire operating system or as a library of ....

....provide mechanisms for accessing the distributed memory as if it were a single address space. A variety of architectures exist. In some systems, local memory of a processor is used as its cache and the local memories of the rest of the system are viewed as the global memory for that processor [11, 1, 6]. Other systems support a hierarchical view of memory similar to the cache main secondary memory hierarchy in traditional operating systems. In these systems, the memory hierarchy is divided into memory that is local to a processor, local to a cluster of processors, and global to the entire ....

[Article contains additional citation context not shown here]

J. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. "Willow: A Scalable Shared Memory Multiprocessor ". In Proceedings of Supercomputing '92, pages 336--345, Nov 1992.


Efficient Distributed Shared Memory Based On Multi-Protocol.. - Carter (1993)   (45 citations)  (Correct)

....variable is present before it is accessed. Thus, Midway is able to detect access violations without taking page faults, which eliminates the time spent handling interrupts. Hardware DSMs Recently, a number of designs for hardware distributed shared memory machines have been published [ALKK90, BDGS92, BFKR92, DSF88, LLG 90, WL92, WHL92] Munin is most related to the DASH project [LLG 90] from which it adopted the concept of release consistency. Unlike Munin, DASH uses a write invalidate protocol for all consistency maintenance. Munin uses the flexibility of its software ....

....protocols and migration when appro 106 priate. The differences between DASH s implementation of release consistency and Munin s implementation of release consistency was explained in detail in Section 2.1, and the effect on performance is detailed in Section 4.6.1. The Willow multiprocessor [BDGS92] is a scalable shared memory architecture designed to support over a thousand commercial microprocessors. It attacks the bottlenecks often found in large scale shared memory multiprocessor systems (inefficient synchronization, memory latency and bandwidth limitations, bus contention, cache ....

J.K. Bennett, S. Dwarkadas, J.A. Greenwood, and E. Speight. Willow: A scalable shared memory multiprocessor. In Proceedings of Supercomputing '92, pages 336--345, November 1992.


The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture - Bedichek (1994)   (6 citations)  (Correct)

....Multiprocessors are generally more difficult to design and build than multicomputers. However, many researchers believe that multiprocessors are easier to program [85, 93, 52] This has led a number of researchers to study how multiprocessors with hundreds of processors should be designed [58, 2, 12, 8]. Multiprocessors require hardware either to keep track of the location and status of cache lines as they move through the system, or to effect extremely low latency internode communication, as in the Cray T3D. Software can make multicomputers perform as multiprocessors [59] However, performance ....

John K. Bennett, Sandhya Dwarkadas, Jay Greenwood, and Evan Speight. Willow: A scalable shared memory multiprocessor. Proceedings. Supercomputing '92, pages 336--345, November 1992.


The Potential of Compile-Time Analysis to Adapt the Cache.. - Mounes-Toussi, Lilja (1995)   (1 citation)  (Correct)

....network. In addition, it recognizes the potential for the compiler to select updating instead of invalidating. The scheme proposed by Goshe and Simhadri [14] uses only run time information to choose be2 tween updating and invalidating. In distributed shared memory multiprocessor systems, Willow [7], Munin [6] and DASH [19] provide the hardware to support updating and invalidating, and they acknowledge the capability of the compiler to select either one of these strategies. Furthermore, the scheme impelmented on the Galactica Net [31] uses a combined software and hardware approach for ....

J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. Willow: A scalable shared memory multiprocessor. International Conference on Supercomputing, pages 336--345, 1992.


Synchronization, Coherence, and Consistency for High Performance .. - Dwarkadas (1992)   Self-citation (Bennett Dwarkadas)   (Correct)

....that is both time and space efficient with respect to existing methods. This technique is used to develop and evaluate architectural design decisions for a hierarchical bus based shared memory architecture. Several of these ideas are incorporated into the Willow shared memory multiprocessor [12]. The dissertation is organized as follows. Chapter 2 provides an overview of previous and on going research efforts relevant to this work. Chapter 3 describes the execution driven simulator and validates the results generated against an existing multiprocessor, and a cycle level simulator. ....

....efficient large scale shared memory multiprocessors [54, 3, 20, 42] This work augments these efforts with a view to studying the advantages of exploiting the snooping ability of bus based architectures. Preliminary results contributing to the design of the Willow multiprocessor are presented in [12]. We explore the advantages of an architecture based on a hierarchy of buses, with caches and memory distributed throughout the hierarchy. The hierarchy removes the bottleneck of a single bus resource, while retaining part of the snooping ability of a 50 bus based architecture. It also allows ....

[Article contains additional citation context not shown here]

J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight. Willow: A Scalable Shared-Memory Multiprocessor. In Supercomputing '92, to appear, November 1992.


The Effects of Architecture on the Performance of Latency.. - Rajat Mukherjee   Self-citation (Bennett Greenwood)   (Correct)

....shared memory systems. We present a survey of related research in Section 6 and present our conclusions in Section 7. The context switching technique described in this paper has been implemented in SALSA, an operating system for Willow, a large scale hierarchical shared memory multiprocessor [2]. Further details on this work can be found in [10, 11] 2 Simulation Environment Our simulation environment used MPSAS, a detailed instruction level simulator that was developed at Sun Microsystems, Inc. and modified for Willow. The kernel, run time system and programs were compiled using the ....

John K. Bennett, Sandhya Dwarkadas, Jay G. Greenwood, and Evan W. Speight. Willow: A Scalable Shared Memory Multiprocessor. In International Conference on Supercomputing, Minneapolis, MN, November 1992.


Techniques for Reducing Consistency-Related.. - Carter, Bennett.. (1993)   (59 citations)  Self-citation (Bennett)   (Correct)

....shared variable is present before it is accessed. Thus, Midway is able to detect access violations without taking page faults, which eliminates the time spent handling interrupts. 9. 2 Hardware DSMs Recently, a number of designs for hardware distributed shared memory machines have been published [2, 9, 13, 22, 41, 57, 58]. We limit our discussion to those systems that are most related to the work presented in this paper. We have adopted from the DASH project [41] the concept of release consistency. The differences between DASH s implementation of release consistency and Munin s implementation of release ....

J.K. Bennett, S. Dwarkadas, J.A. Greenwood, and E. Speight. Willow: A scalable shared memory multiprocessor. In Proceedings of Supercomputing '92, pages 336--345, November 1992.


ASPEN: High-Performance Hardware Support for Distributed.. - Maxham (1994)   Self-citation (Bennett)   (Correct)

....composed of large DRAM modules (exact sizes are not given) The goal was to limit bus traffic to communication between processors. Where Aspen uses a toroidal network, other designers have used a hierarchy of busses. The Encore Gigamax [36] Stanford Paradigm [11] and Rice Willow architecture [7] propose such an interconnect designed to provide bandwidth scalability. Paradigm and Willow use an additional network for data movement. Paradigm s network is used as an alternative to the busses for communication between leaf processors, while Willow s relocates data to the nearest cache visible ....

J.K. Bennett, S. Dwarkadas, J.G. Greenwood, and E. Speight. Willow: a scalable shared memory multiprocessor. In International Conference on Supercomputing, Minneapolis, MN, November 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC