| John Heinlein, Kourosh Gharachorloo, and Anoop Gupta. Integrating multiple communication paradigms in high performance multiprocessors. Technical Report CSL-TR-94-604, Stanford University, Computer Systems Laboratory, February 1994. |
....and the Hewlett Packard Research Grants Program, and the University of Utah. These observations have led a number of researchers to propose building programmable multiprocessor cache controllers that can execute a variety of caching protocols [7, 31] support multiple communication paradigms [10, 18], or accept guidance from software [20, 25] Programmable controllers would seem at first glance to be an ideal combination of software s greater flexibility and hardware s greater speed dedicated hardware could be used to handle the common cases efficiently, while software could be used to ....
....not to assume any specific hardware implementation of the programmable cache controller or any specific machine architecture so as not to bias our results. The reference system model that we used, illustrated in Figure 1, is a generalization of the combined features of Alewife [10, 11] FLASH [18, 20], and Typhoon [25] We assume a distributed shared memory (DSM) model, whereby the main memory in the machine is equally distributed across the processing nodes. Every block of physical memory has associated with it a home node, which is the node at which the main memory copy of the block ....
J. Heinlein, K. Gharachorloo, and A. Gupta. Integrating multiple communication paradigms in high performance multiprocessors. Technical Report CSL-TR-94-604, Stanford Computer Systems Laboratory, February 1994.
....as a compiled SRAM array, so the comparison is highly unfavorable to the network interface, which is dominated by the area of random gate array logic. 3 300MHz processors are common in 1997. 4 Heinlein describes a scheme for safely passing both virtual and physical addresses to an NI in [27] A two node FUGU machine exists running a subset of our applications. We used the machine to run the microbenchmarks behind Tables 4 4 and 4 5 and for gross calibration of the simulator against applications. Most of the results in Chapter 7 come from the simulator. 6.1.2 Fast Simulator Most of ....
....an injectdma operation. The handling of the physical address from the TLB probe operation is a security hole that we accept in the prototype. A real implementation would pass the physical address securely from the TLB to the DMA engine, for instance using the technique described by Heinlein in [27]. void dma handler(handlername [ arg0 [ arg3] addrexp, lenexp) f hatomic handler codei user active global( hthread codei g The DMA block address in the handler, addrexp, and the block length, lenexp are expressions in terms of variables available at the receiver, including arg0 ....
John Heinlein, Kourosh Gharachorloo, and Anoop Gupta. Integrating Multiple Communication Paradigms in High Performance Multiprocessors. Technical Report CSL-TR-94-604, Stanford, February 1994.
....shared main memory. An optical bus system is employed, that provides a cycle time of 2 ns ( Siemens AG 1992] Although the architecture provides a shared memory view to the user, much of the actual communication between nodes is done via messages (as also seen in the Stanford FLASH architecture [Heinlein, Gharachorloo Gupta 1994]) Though a detailed description of the architecture is beyond the scope of this paper 3 there are four properties which lead to important design choices: ffl The MSparc processors instruction set is fully compatible with that of the Sparc architecture ( Damm et al. 1994] Sparc International ....
Heinlein, J., Gharachorloo, K. & Gupta, A. [1994], Integrating multiple communication paradigms in high performance multiprocessors, Technical Report CSL-TR-94-604, Computer Systems Laboratory, Stanford University.
....propose to use this access for the purpose of synchronisation, rather for the purpose of running a user level custom handler, so that an application specific protocol can be implemented. Similarly, the Flash architecture allows user level handlers to allow application specific coherency protocols [8]. In [13] replace synchronisation associated with a write followed by a read with message passing sends and receives thereby providing fine grain synchronisation. Similarly research with [8] has concentrated on supporting both message passing and shared memory programming models. Hybrid message ....
....the Flash architecture allows user level handlers to allow application specific coherency protocols [8] In [13] replace synchronisation associated with a write followed by a read with message passing sends and receives thereby providing fine grain synchronisation. Similarly research with [8] has concentrated on supporting both message passing and shared memory programming models. Hybrid message passing shared memory models can therefore be used to replace barriers and implicit shared memory data transfer. In contrast we do not propose to support message passing. Our approach is to ....
J. Heinlein, K. Gharachorloo, and A. Gupta. Integrating multiple communication paradigms in high performance multiprocessors. Technical Report CSL-TR-94604, Computer Systems Laboratory, Stanford University, Stanford, CA 94305-4070, February 1994.
.... customized for each type of protocol This paper focuses on the hardware and software mechanisms for efficiently supporting message passing protocols on the FLASH architecture (see [8] for an evaluation of cache coherence protocols on FLASH; our earlier work on message passing in FLASH appears in [7]) Our goal has been to provide a general set of mechanisms that apply to a wide variety of messaging protocols, ranging from simple memory copy and simple active message (e.g. fetch and increment) to more complex protocols such as Intel NX [18] and MPI (Message Passing Interface) 15] By ....
John Heinlein, Kourosh Gharachorloo, and Anoop Gupta. Integrating multiple communication paradigms in high performance multiprocessors. Technical Report CSL-TR-94-604, Stanford University, Computer Systems Laboratory, February 1994.
.... without sacrificing protection; achieve transfer bandwidth and latency comparable to a message passing machine containing dedicated hardware support for this task; and operate in harmony with other key attributes of the machine including cache coherence, virtual memory, and multiprogramming [HGG93]. We achieve high performance because MAGIC efficiently streams data to the receiver. The performance is further improved by the elimination of processor interrupts and system calls in the common case, and by the avoidance of extra copying of message data. To distinguish a user level message from ....
John Heinlein, Kourosh Gharachorloo, and Anoop Gupta. Integrating Multiple Communication Paradigms in High Performance Multiprocessors. Stanford University Technical Report, to appear.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC