MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Accelerating shared virtual memory via general-purpose network interface support (2001) [4 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Angelos Bilas, Dongming Jiang
ACM Transactions on Computer Systems
http://www.eecg.toronto.edu/~bilas/files/genima-tocs.ps.gz
Add To MetaCart

Abstract:

Clusters of symmetric multiprocessors (SMPs) are important platforms for high performance computing. With the success of hardware cache-coherent distributed shared memory (DSM), a lot of e#ort has also been made to support the coherent shared address space programming model in software on clusters. Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the performance of software virtual memory (SVM) is still far from that achieved on hardware DSM systems. The goal of this paper is to improve the performance of SVM on system area network clusters by considering communication and protocol layer interactions. We first examine what are the important communication system bottlenecks that stand in the way of improving parallel performance of SVM clusters; in particular, which parameters of the communication architecture are most important to improve further relative to processor speed, which ones are already adequate on modern systems for most applications, and how will this change with technology in the future. We find that the most important communication subsystem cost to improve is the overhead of generating and delivering interrupts for asynchronous protocol processing.

Citations

784 Myrinet: A Gigabit-per-second Local Area Network – Boden, Cohen, et al. - 1995
477 TreadMarks: Distributed shared memory on standard workstations and operating systems – Keleher, Dwarkadas, et al. - 1994
269 Virtual memory mapped network interface for the SHRIMP multicomputer – Blumrich, Li, et al. - 1994
236 Multi-level adaptive solutions to boundary-value problems – Brandt - 1977
174 A comparison of sorting algorithms for the connection machine CM-2 – Blelloch, Leiserson, et al. - 1991
137 Performance evaluation of two home-based lazy release consistency protocols for shared memory virtual memory systems – Zhou, Iftode, et al. - 1996
118 The virtual interface architecture – Dunning, Regnier, et al. - 1998
111 A hierarchical O(N log N) force calculation algorithm – Barnes, Hut - 1986
108 FFT’s in external or hierarchical memory – Bailey - 1990
95 Effects of communication latency, overhead, and bandwidth in a cluster architecture – Martin, Vahdat, et al. - 1997
93 Improving Release-Consistent Shared Virtual Memory Using Automatic Update – Iftode, Dubnicki, et al. - 1996
82 SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory – Erlichson, Nuckolls, et al. - 1996
81 Parallel visualization algorithms: performance and architectural implications – Singh, Gupta, et al. - 1994
70 VMMC2: efficient support for reliable, connection-oriented communication – Dubnicki, Bilas, et al. - 1997
68 Methodological Considerations and Characterization of the SPLASH-2 Parallel Application Suite – Woo, Ohara, et al. - 1995
64 Active Messages: a Mechanism for Integrated Communication and Computation – Eicken, Culler - 1992
64 Volume rendering on scalable shared-memory mimd architectures – Neih, Levoy - 1992
60 Application restructuring and performance portability across shared virtual memory and hardwarecoherent multiprocessors – Jiang, Shan, et al. - 1997
59 Decoupled Hardware Support for Distributed Shared Memory – Reinhardt, Pfile, et al. - 1996
54 Understanding Application Performance on Shared Virtual Memory Systems – Iftode, Singh, et al.
47 VM-Based Shared Memory on Low-Latency, Remote-Memory-Access Networks – Kontothanassis, Hunt, et al. - 1997
45 Home-based svm protocols for smp clusters: design and performance – Samanta, Bilas, et al. - 1998
44 Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation – Zhou, Iftode, et al. - 1997
42 Fine-Grain Software Distributed Shared Memory on SMP Clusters – Scales, Gharachorloo, et al. - 1998
38 Design Issues and Tradeoffs for Write Buffers – Skadron, Clark - 1997
37 Using MemoryMapped Network Interfaces to Improve the Performance of Distributed Shared Memory – Kontothanassis, Scott - 1996
35 Hiding Communication Latency and Coherence Overhead in Software DSMs – Bianchini, Kontohanassis, et al. - 1996
33 Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers – Karlsson, Stenstrom - 1996
31 User-Space Communication: A Quantitative Study – Araki, Bilas, et al. - 1998
31 Fast Interrupt Priority Management for Operating System Kernels – Stodolsky, Bershad, et al. - 1993
29 Overview of network memory channel for PCI – Gillett, Collins, et al. - 1996
28 Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations – Schoinas, Falsafi, et al. - 1996
27 Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results and Implications – Singh, Hennessy - 1992
24 The effects of communication parameters on end performance of shared virtual memory clusters – Bilas, Singh - 1997
23 Scheduling communication on an SMP node parallel machine – Falsafi, DA - 1997
23 Augmint: A multiprocessor simulation environment for intel x86 architectures – Sharma, Nguyen, et al. - 1996
21 Implications of hierarchical N-body techniques for multiprocessor architecture – Singh, Gupta, et al. - 1995
14 Performance Monitoring in a Myrinet-connected Shrimp Cluster – Liao, Martonosi, et al. - 1998
13 Accelerating shared virtual memory using commodity ni support to avoid asynchronous message handling – Bilas, Liao, et al. - 1999
13 VMMC-2: e#cient support for reliable, connection-oriented communication – Dubnicki, Bilas, et al. - 1997
11 Design issues and tradeo s for write bu ers – Skadron, Clark - 1997
10 ServerNet SAN I/O architecture – Horst, Garcia - 1997
9 Limits to the performance of software shared memory: A layered approach – BILAS, JIANG, et al. - 1999
9 Hierarchical N-body methods – Hernquist - 1988
8 Supporting a coherent shared address space across SMP nodes: An application-driven investigation – Bilas, Iftode, et al. - 1996
8 Telegraphos: A Substrate for High Performance Computing on Workstation Clusters – Katevenis, Markatos, et al. - 1997
8 The fast messages (fm) 2.0 streaming interface – Pakin, Buchanan, et al. - 1996
6 The effects of latency and occupancy on the performance of dsm multiprocessors – Holt, Heinrich, et al. - 1995
6 Architectural and application bottlenecks in scalable DSM multiprocessors – Holt, Singh, et al. - 1996
5 The SGI Origin2000: a scalable cc-numa server – Laudon, Lenoski - 1997