An important aspect of a high-speed network system is the ability to transfer data directly between the network interface and application bu#ers. Such a direct data path requires the network interface to "know " the virtual-to-physical address translation of a user bu#er, i.e., the physical memory location of the bu#er. This paper presents an e#-cient address translation architecture, User-managed TLB (UTLB), which eliminates system calls and device interrupts from the common communication path. UTLB also supports application-specific policies to pin and unpin application memory. We report micro-benchmark results for an implementation on Myrinet PC clusters. A trace-driven analysis is used to compare the UTLB approach with the interrupt-based approach. It is also used to study the e#ects of UTLB cache size, associativity, and prefetching. Our results show that the UTLB approach delivers robust performance with relatively small translation cache sizes.
|
3218
|
Computer Architecture A Quantitative Approach
– Hennessy, Patterson
- 2000
|
|
936
|
Active Messages: A Mechanism for Integrated Communication and Computation
– Eicken, Culler, et al.
- 1992
|
|
800
|
Myrinet: A gigabit-per-second local area network
– Boden, Cohen, et al.
- 1995
|
|
537
|
U-Net: A User-Level Network Interface for Parallel and
– Eicken, Basu, et al.
- 1995
|
|
325
|
Tempest and Typhoon: User-Level Shared Memory
– Reinhardt, Larus, et al.
- 1994
|
|
321
|
The stanford flash multiprocessor
– Kuskin, Ofelt, et al.
- 1994
|
|
313
|
Fbufs: A highbandwidth cross-domain transfer facility
– Druschel, Peterson
- 1993
|
|
296
|
High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet
– Pakin, Lauria, et al.
- 1995
|
|
271
|
A virtual memory mapped network interface for the SHRIMP multicomputer
– Blumrich, Li, et al.
- 1994
|
|
176
|
A Study of Integrated Prefetching and Caching Strategies
– Cao, Felten, et al.
- 1995
|
|
149
|
Aspects of Cache Memory and Instruction Buffer Performance
– Hill
- 1987
|
|
139
|
PRAM: A scalable shared memory
– Lipton, Sandberg
- 1988
|
|
139
|
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems
– Zhou, Iftode, et al.
- 1996
|
|
115
|
Synchronization and communication in the t3e multiprocessor
– Scott
- 1996
|
|
115
|
Autonet: A high-speed, self-configuring local area network using point-to-point links
– Schroeder, Birrell, et al.
- 1991
|
|
100
|
Implementation and Performance of Integrated Application-Controlled File Caching, Prefetching, and Disk Scheduling
– Cao, Felton, et al.
- 1996
|
|
92
|
An implementation of the hamlyn sender managed interface architecture
– Buzzard, Jacobson, et al.
- 1996
|
|
84
|
Giving applications access to Gb/s networking
– Smith, Traw
- 1993
|
|
78
|
A tightly-coupled processor-network interface
– Henry, Joerg
- 1992
|
|
78
|
Meiko CS-2 interconnect elan – elite design
– Homewood, McLaren
- 1993
|
|
78
|
Vaxclusters: A closely-coupled distributed system
– Kronenberg, Levy, et al.
- 1986
|
|
69
|
Methodological considerations and characterization of the splash2 parallel application suite
– Woo, Ohara, et al.
- 1995
|
|
59
|
Design and implementation of virtual memory-mapped communication on myrinet
– Dubnicki, Bilas, et al.
- 1997
|
|
58
|
Performing remote operations efficiently on a local computer network
– Spector
- 1982
|
|
57
|
Understanding Application Performance on Shared Virtual Memory
– Iftode, Singh, et al.
- 1996
|
|
57
|
Cache and memory Hierarchy Design: A Performance-Directed Approach
– Przybylski
- 1990
|
|
53
|
Effects of buffering semantics on i/o performance
– Brustoloni, Steenkiste
- 1996
|
|
49
|
The architecture of an integrated local network
– Leach, Levine, et al.
- 1983
|
|
45
|
Eicken, “Incorporating memory management into user-level network interfaces,” http://www2.cs.cornell.edu/UNet/papers/unetmm.pdf
– Basu, Welsh, et al.
- 1996
|
|
45
|
Software support for virtual memory-mapped communication
– Dubnicki, Iftode, et al.
- 1996
|
|
43
|
The Peregrine high-performance RPC system
– Johnson, Zwaenepoel
- 1993
|
|
42
|
Thorsten von Eicken. Incorporating memory management into user-level network interfaces
– Basu, Welsh
- 1997
|
|
36
|
Efficiently adapting to sharing patterns in software DSMs
– Monnerat, Bianchini
- 1998
|
|
33
|
Protected, user-level dma for the shrimp network interface
– Blumrich, Dubnicki, et al.
- 1996
|
|
30
|
Overview of network memory channel for pci
– Gillet, Collins, et al.
- 1996
|
|
27
|
The paragon implementation of the NX message passing interface
– Pierce, Regnier
- 1994
|
|
26
|
Telegraphos: High-performance networking for parallel processing on workstation clusters
– Markatos, Katevenis
- 1996
|
|
23
|
Experiences with a high-speed network adapter: A software perspective
– Druschel, Davie, et al.
- 1994
|
|
14
|
Performance monitoring in a Myrinet-connected SHRIMP cluster
– Liao, Martonosi, et al.
- 1998
|
|
14
|
Address translation mechanisms in network interfaces
– Schoinas, Hill
- 1998
|
|
13
|
Design choices in the shrimp system: An empirical study
– Blumrich, Alpert, et al.
- 1998
|
|
11
|
E ects of bu ering semantics on I/O performance
– Brustoloni, Steenkiste
- 1996
|
|
9
|
Efficient Connection-Oriented Communication on High-Performance Networks
– Damianakis
- 1998
|
|
8
|
Shrimp project update: Myrinet communication
– Dubnicki, Bilas, et al.
- 1998
|
|
8
|
Aspects of Cache Memory and Instruction Bu er Performance
– Hill
- 1987
|
|
7
|
The Unified Management of Memory in the V Distributed System
– Cheriton
- 1988
|
|
7
|
Virtual Interface Architecture Specification, Version 1.0
– CompaqIntelMicrosoft
- 1997
|
|
5
|
Liviu Iftode, and Jaswinder Pal Singh. Home-Based SVM Protocols for SMP Clusters: Design and Performance
– Samanta, Bilas
- 1998
|
|
3
|
The peregrine highperformance rpc system. Software: Practice and Expenence
– Johnson, Zaenepoel
- 1993
|
|
1
|
Vmmc-2: Efficient support for reliable, connnection-oriented communication
– Dubnicki, Bilas, et al.
- 1997
|