DMCA
Effects of communication latency, overhead, and bandwidth in a cluster architecture (1997)
Cached
Download Links
Venue: | In Proceedings of the 24th Annual International Symposium on Computer Architecture |
Citations: | 108 - 6 self |
Citations
1420 | The SPLASH-2 programs: characterization and methodological considerations. ISCA,
- Woo, Ohara, et al.
- 1995
(Show Context)
Citation Context ...put sizes, so we escape the difficulties of attempting to size the machine parameters down to levels appropriate for the small problems feasible on a simulator and then extrapolating to the real case =-=[45]-=-. These issues have driven a number of efforts to develop powerful simulators [38, 39], as well as to develop flexible hardware prototypes [24]. The drawback of a real system is that it is most suited... |
1054 | Active messages: a mechanism for integrating communication and computation
- Eicken, Culler, et al.
- 1992
(Show Context)
Citation Context ...Presidential Faculty Fellowship. The authors can be contacted at frmartin, vahdat, culler, teag@cs.berkeley.edu. reflective memory operations [7, 21], and providing lean communication software layers =-=[35, 43, 44]-=-. Recently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flash vs. Dash), greater opportunity for specialization (e.g. Tempe... |
597 | U-Net: A User-Level Network Interface For Parallel And Distributed Computing
- Eicken, Basu, et al.
- 1995
(Show Context)
Citation Context ...Presidential Faculty Fellowship. The authors can be contacted at frmartin, vahdat, culler, teag@cs.berkeley.edu. reflective memory operations [7, 21], and providing lean communication software layers =-=[35, 43, 44]-=-. Recently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flash vs. Dash), greater opportunity for specialization (e.g. Tempe... |
560 | LogP: Towards a realistic Model for Parallel Computation,
- Culler, Karp, et al.
- 1993
(Show Context)
Citation Context ...d to allow the latency, overhead,per-messagebandwidth and per-byte bandwidth to be adjusted independently. This four-parameter characterization of communication performance is based on the LogP model =-=[2, 14]-=-, the framework for our systematic investigation of the communication design space. By adjusting these parameters, we can observe changesin the execution time of applications on a spectrum of systems ... |
349 | The Stanford FLASH Multiprocessor.
- Kuskin, Ofelt, et al.
- 1994
(Show Context)
Citation Context ...focused on improving various aspects of communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller =-=[5, 10, 29, 41]-=- or the cache controller [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicat... |
311 | High Performance Messaging On Workstations: Illinois Fast Messages (FM) For Myrinet
- Pakin, Lauria, et al.
- 1995
(Show Context)
Citation Context ...Presidential Faculty Fellowship. The authors can be contacted at frmartin, vahdat, culler, teag@cs.berkeley.edu. reflective memory operations [7, 21], and providing lean communication software layers =-=[35, 43, 44]-=-. Recently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flash vs. Dash), greater opportunity for specialization (e.g. Tempe... |
309 | Tempest and Typhoon: User-Level Shared Memory.
- Reinhardt, Larus, et al.
- 1994
(Show Context)
Citation Context ...ecently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flash vs. Dash), greater opportunity for specialization (e.g. Tempest =-=[38]-=-), or a cleaner communication interface (e.g., T3E vs. T3D). At the same time, a number of investigations are focusing on bringing the communication performance of clusters closer to that of the more ... |
287 | LogGP: Incorporating long messages into the LogP model.
- Alexandrov, Ionescu, et al.
- 1995
(Show Context)
Citation Context ...d to allow the latency, overhead,per-messagebandwidth and per-byte bandwidth to be adjusted independently. This four-parameter characterization of communication performance is based on the LogP model =-=[2, 14]-=-, the framework for our systematic investigation of the communication design space. By adjusting these parameters, we can observe changesin the execution time of applications on a spectrum of systems ... |
274 | Protocol Verification as a Hardware Design Aid
- Dill, Drexler, et al.
- 1992
(Show Context)
Citation Context ...The dark spots in Figure 4f indicate the presence of “hot” objects which are visible from multiple points in the scene. Parallel Mur': In this parallel version of a popular protocol verification tool =-=[18, 42]-=-, the exponential space of all reachable protocol states are explored to catch protocol bugs. Each processor maintains a work queue of unexplored states. A hash function maps states to “owning” proces... |
267 | Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer
- BLUMRICH, LI, et al.
- 1994
(Show Context)
Citation Context ...controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus =-=[7, 31]-=-, providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work was supported in part by the Defense Advanced Research Proje... |
194 | Monsoon: an explicit token-store architecture.
- Papadopoulos, Culler
- 1990
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
193 | The MIT Alewife machine: architecture and performance.
- Agarwal, Bianchini, et al.
- 1995
(Show Context)
Citation Context ...communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller =-=[1, 20, 32]-=-, to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], pr... |
145 | Synchronization and communication in the t3e multiprocessor,” in
- Scott
- 1996
(Show Context)
Citation Context ...focused on improving various aspects of communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller =-=[5, 10, 29, 41]-=- or the cache controller [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicat... |
141 |
and the NOW Team. A Case for NOW (Networks of Workstations).
- Anderson, Culler, et al.
- 1995
(Show Context)
Citation Context ...ce (e.g., T3E vs. T3D). At the same time, a number of investigations are focusing on bringing the communication performance of clusters closer to that of the more tightly integrated parallel machines =-=[3, 12, 21, 35, 43]-=-. Moving forward from these research alternatives, a crucial question to answer is how much do the improvements in communication performance actually improve application performance. The goal of this ... |
132 |
The Manchester Prototype Dataflow Computer.
- Gurd, Kirkham, et al.
- 1985
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
123 | The DASH prototype: Implementation and performance
- Lenoski, Laudon, et al.
- 1992
(Show Context)
Citation Context ...communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller =-=[1, 20, 32]-=-, to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], pr... |
112 | Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors.
- Lebeck, Wood
- 1995
(Show Context)
Citation Context ...ke the one reported here in the DSM context. The Wind Tunnel team has explored a number of cooperative shared memory design points relative to the Tempest interface through simulation and prototyping =-=[30, 38]-=-, focused primarily on protocols. In principle, the wind tunnel provides enough power to systematically determine application sensitivity to the LogGP parameters within a given protocol, although the ... |
98 | TNet: A Reliable System Area Network.
- Horst
- 1995
(Show Context)
Citation Context ...essor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support =-=[5, 26, 29, 37]-=-, supporting This work was supported in part by the Defense Advanced Research Projects Agency (N00600-93-C-2481, F30602-95-C-0014), the National Science Foundation (CDA 9401156), Sun Microsystems, Cal... |
92 | High-performance sorting on networks of workstations.
- Arpaci-Dusseau, Arpaci-Dusseau, et al.
- 1997
(Show Context)
Citation Context ...ighboring processors. The communication to computation ratio is determined by the size of the graph. NOW-sort: The version of NOW-sort used in this study sorts records from disk-to-disk in two passes =-=[4]-=-. The sort is highly tuned, setting a the MinuteSort world record in 1997. The sorting algorithm contains two phases. In the first phase, each processor reads the records from disk and sends them to t... |
92 |
Complete computer simulation: The SimOS approach
- Rosenblum, Herrod, et al.
- 1995
(Show Context)
Citation Context ...ers down to levels appropriate for the small problems feasible on a simulator and then extrapolating to the real case [45]. These issues have driven a number of efforts to develop powerful simulators =-=[38, 39]-=-, as well as to develop flexible hardware prototypes [24]. The drawback of a real system is that it is most suited to investigate design points that are “slower” than the base hardware. Thus, to perfo... |
90 |
Architectural requirements of parallel scienti c applications with explicit communication,”
- Cypher, Ho, et al.
- 1993
(Show Context)
Citation Context ... systematically determine application sensitivity to the LogGP parameters within a given protocol, although the small local memory of the underlying CM5 may limit the study to small data sets. Cypher =-=[16]-=- described the characteristics of a set of substantial message passing applications also showing that application behavior varies widely. The applications were developedin the context of fairly heavy ... |
87 | Supporting systolic and memory communication in iWarp
- Borkar, Cohn, et al.
- 1990
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
87 | Memory Channel network for PCI
- Gillett
- 1996
(Show Context)
Citation Context ...erson was also supported by a National Science Foundation Presidential Faculty Fellowship. The authors can be contacted at frmartin, vahdat, culler, teag@cs.berkeley.edu. reflective memory operations =-=[7, 21]-=-, and providing lean communication software layers [35, 43, 44]. Recently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flas... |
63 |
The KSR1: Bridging the gap between shared memory and MPPs.
- Frank, Rothnie, et al.
- 1993
(Show Context)
Citation Context ...communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller =-=[1, 20, 32]-=-, to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], pr... |
63 |
LogP Quantified: The Case for Low-Overhead Local Area Networks
- Keeton, Patterson, et al.
- 1995
(Show Context)
Citation Context ... the system becomes similar to a switched LAN implementation. Currently, 100 s of overhead with latency and gap values similar to our network is approximately characteristic of TCP/IP protocol stacks =-=[27, 28, 43]-=-. At this extreme, applications slow down from 2x to over 50x. Clearly, efforts to reduce cluster communication overhead have been successful. Further, all but one of our applications demonstrate a li... |
59 | An architecture of a dataflow single chip processor
- Sakai, Yamaguchi, et al.
- 1989
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
58 | Fast Parallel Sorting Under LogP: Experience with the CM-5
- Dusseau, Culler, et al.
- 1996
(Show Context)
Citation Context ...lelized when scaled from 16 to 32 processors. Each application is discussed brieflybelow. Radix Sort: sorts a large collection of 32-bit keys spread over the processors, and is thoroughly analyzed in =-=[19]-=-. It progresses as two iterations of three phases. First, each processor determines the local rank for one digit of its keys. Second, the global rank of each key is calculated from local histograms. F... |
56 | Empirical evaluation of the CRAY-T3D: A compiler perspective
- Arpaci, Culler, et al.
- 1995
(Show Context)
Citation Context ...focused on improving various aspects of communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller =-=[5, 10, 29, 41]-=- or the cache controller [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicat... |
53 |
Assessing Fast Network Interfaces.”
- Culler, Lui, et al.
- 1996
(Show Context)
Citation Context ...ur cluster, the Berkeley NOW, are summarized in Table 1. For reference, we also provide measured LogGP characteristics for two tightly integrated parallel processors, the Intel Paragon and Meiko CS-2 =-=[15]-=-. 3 Methodology In this section we describe the empirical methodology of our study. The experimental apparatus consists of commercially available hardware and system software, augmented with publicly ... |
50 |
Parallelizing the Murphi Verifier
- Stern, Dill
- 1997
(Show Context)
Citation Context ...The dark spots in Figure 4f indicate the presence of “hot” objects which are visible from multiple points in the scene. Parallel Mur': In this parallel version of a popular protocol verification tool =-=[18, 42]-=-, the exponential space of all reachable protocol states are explored to catch protocol bugs. Each processor maintains a work queue of unexplored states. A hash function maps states to “owning” proces... |
41 | The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
- Holt, Heinrich, et al.
- 1995
(Show Context)
Citation Context ..., the investment may be better directed toward improving the performance of the communication system. 6 Related Work The Flash team recently conducted a study with very similar goals under simulation =-=[25]-=-. This study focuses on understandingthe performance requirements for a communication controller for a cachecoherentdistributed memory machine. It is difficult to make a direct quantitative comparison... |
27 |
The Paragon Implementation of the NX Message Passing Interface
- Pierce, Regnier
- 1994
(Show Context)
Citation Context ...er [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors =-=[6, 8, 37]-=-, providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work was supported in part by the Defense Advanced Research Projects Agency (N00600-93-C-2481, F30602-95-C-0014), th... |
24 |
The Performance Impact of Flexibility
- Heinrich, Kuskin, et al.
- 1994
(Show Context)
Citation Context ...e on a simulator and then extrapolating to the real case [45]. These issues have driven a number of efforts to develop powerful simulators [38, 39], as well as to develop flexible hardware prototypes =-=[24]-=-. The drawback of a real system is that it is most suited to investigate design points that are “slower” than the base hardware. Thus, to perform the study we must use a prototype communication layer ... |
16 |
Message passing on the Meiko CS-2
- Barton, Cownie, et al.
- 1994
(Show Context)
Citation Context ...er [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors =-=[6, 8, 37]-=-, providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work was supported in part by the Defense Advanced Research Projects Agency (N00600-93-C-2481, F30602-95-C-0014), th... |
15 |
HPAM: An Active Message Layer for a Network of Workstations
- Martin
- 1994
(Show Context)
Citation Context ...programming model does not provide automatic replication with cache coherence,a number of the applications perform applicationspecific software caching. The language has been ported to many platforms =-=[2, 34, 43, 44]-=-. The sources for the applications, compiler, and communication layer can be obtained from a publically available site 1 . 3.2 Technique The key experimental innovation is to modify the communication ... |
12 |
The Epsilon-2 Hybrid Dataflow Architecture
- Grafe, Hoch
- 1990
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
12 | Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines
- Lumetta, Krishnamurthy, et al.
- 1995
(Show Context)
Citation Context ...e has been reached before. If the state is new, the processor adds it to the work queue to be validated against an assertion list. Connected Components: First, a graph is spread across all processors =-=[33]-=-. Each processor then performs a connected components on its local subgraph to collapse portions of its components into representative nodes. Next, the graph is globally adjusted to point remote edges... |
11 |
Myrinet|A Gigabet-per-Second LocalArea Network
- Boden, Cohen, et al.
- 1995
(Show Context)
Citation Context ...er [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors =-=[6, 8, 37]-=-, providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work was supported in part by the Defense Advanced Research Projects Agency (N00600-93-C-2481, F30602-95-C-0014), th... |
11 | Avalanche: A Communication and Memory Architecture for Scalable Parallel Computing
- Carter, Davis, et al.
- 1995
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
11 |
The Importance of Non-Data-Touching Overheads in TCP/IP
- Kay, Pasquale
- 1993
(Show Context)
Citation Context ...the system becomes similar to a switched LAN implementation. Currently, 100 s of overhead with latency and gap values similar to our network is approximately characteristic of TCP/IP protocol stacks =-=[27, 28, 43]-=-. At this extreme, applications slow down from 2x to over 50x. Clearly, efforts to reduce cluster communication overhead have been successful. Further, all but one of our applications demonstrate a li... |
8 |
The J-Machine Architecture and Evaluation
- Dally, Keen, et al.
- 1993
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
5 |
The Network Architecture of the CM-5
- Pierre, Wond, et al.
- 1992
(Show Context)
Citation Context ...controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus =-=[7, 31]-=-, providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work was supported in part by the Defense Advanced Research Proje... |
5 |
Semen evaluation.
- Ax, Dally, et al.
- 2000
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
4 | LogPQuantitied: The Case for Low-Overhead Local Area Networks - Keeton, Patterson, et al. - 1995 |
2 |
The Importance of Non-Data-TouchingOverheads in TCP/IP
- Kay, Pasquale
- 1993
(Show Context)
Citation Context ... the system becomes similar to a switched LAN implementation. Currently, 100 s of overhead with latency and gap values similar to our network is approximately characteristic of TCP/IP protocol stacks =-=[27, 28, 43]-=-. At this extreme, applications slow down from 2x to over 50x. Clearly, efforts to reduce cluster communication overhead have been successful. Further, all but one of our applications demonstrate a li... |
2 |
The DASH Prototype: Implementationand Performance
- Lenoski, Laudon, et al.
- 1992
(Show Context)
Citation Context ...communication performance. These investigations cover a vast spectrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller =-=[1, 20, 32]-=-, to incorporating messaging deep into the processor [9, 11, 12, 17, 22, 23, 36, 40], integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], pr... |
2 |
Active messages: A mechanismfor integratedcommunicationandcomputation
- Eicken, Culler, et al.
- 1992
(Show Context)
Citation Context ...Presidential Faculty Fellowship. The authors can be contacted at frmartin, vahdat, culler, teag@cs.berkeley.edu. reflective memory operations [7, 21], and providing lean communication software layers =-=[35, 43, 44]-=-. Recently, we have seen a shift to designs that accept a reduction in communication performance to obtain greater generality (e.g., Flash vs. Dash), greater opportunity for specialization (e.g. Tempe... |
2 | Hennessy.The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors - Singh, Rothberg, et al. - 1995 |
1 |
Supporting Systolic and MemoryCommunicationin iWarp
- Borkar
- 1990
(Show Context)
Citation Context ...ctrum of alternatives, ranging from integrating message transactions into the memory controller [5, 10, 29, 41] or the cache controller [1, 20, 32], to incorporating messaging deep into the processor =-=[9, 11, 12, 17, 22, 23, 36, 40]-=-, integrating the network interface on the memory bus [7, 31], providing dedicated message processors [6, 8, 37], providing various kinds of bulk transfer support [5, 26, 29, 37], supporting This work... |
1 | the NOW Tcnm. A Case for NOW - Culler - 1995 |
1 | High-Performance Sorting on Networks of Workstations - Axpaci-Dusseau, Arpaci-Dussenu, et al. - 1997 |
1 | Message Passing on tho Meiko CS-2 - Barton, Crownie, et al. - 1994 |
1 | Virtual Memory Mnppcd Network Intctfnce for the Vll [151 - Bhmuich, Li, et al. - 1994 |
1 | Myrinet-A Gigabet-per-Second Local-Area Network - Seizovic, Su - 1995 |
1 | Eicken. LogP: Towards aRealistic Model of Parallel Computation - Culler, Karp, et al. |
1 | Assessing Fast Nehvork Interfaces - Culler, Liu, et al. - 1996 |
1 | S.Keen,andM.D.Noakes.The J-MachineArchitecture andEvaluation - Dally, J - 1993 |
1 | Protocol Verification as a HardwareDesign Aid - Hu, Yang - 1992 |
1 | Fast Pamllel Sorting UnderLogP: Experiencewitb the CM-5 - Schauser, Martin - 1996 |
1 | TheKSR 1: Bridgingthe Gap Between Shared Memory and MPPs - Frank, BurkhardII, et al. - 1993 |
1 | The Manchester P ototype Dataflow Computer - Gurd, Kerkham, et al. - 1985 |
1 |
The Performance Impact of Flexibility
- Heinrich, Kuskin, et al.
- 1994
(Show Context)
Citation Context ...e on a simulator and then extrapolating to the real case [45]. These issues have driven a number of efforts to develop powerful simulators [38, 39], as well as to develop flexible hardware prototypes =-=[24]-=-. The drawback of a real system is that it is most suited to investigate design points that are “slower” than the base hardware. Thus, to perform the study we must use a prototype communication layer ... |
1 | Dynamic Self-Invalidation: Reducing Coherenceoverhead in Shared-Memory Multiprocessors - Lebeck, Wood - 1995 |
1 | The Network Architecture ofthe CM-5 - Pierre, Wond, et al. - 1992 |
1 | Towards Modeling the Performance ofa Fast Connected Components Algorithm on Parallel Machines - Lumetta, Krishnamurthy, et al. |
1 | Synchronizationand Communicationin theT3EMultiprocessor - Scott - 1996 |
1 | Activehlessages: aMechanismforIntegratedCommunicationandComputation - Eicken, Culler, et al. - 1992 |