Download:
|
by Xiaodong Zhang, Xiaohan Qin
IEEE Transactions on Software Engineering
http://www.cs.washington.edu/homes/xqin/numa.ps
Add To MetaCart
Abstract:
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable Uniform Memory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to different processors, are performed through interconnection networks such as a multistage switching network. The efficiency of these basic operations determines the parallel processing performance on a NUMA multiprocessor. This paper presents several analytical models to predict and evaluate the overhead of interprocessor communication, process scheduling, process synchronization and remote memory access where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared memory multiprocessor. Both analytical and experimental results give a comprehensive and clear understanding of the various effects, which are important for the effective use of a NUMA shared memory multiprocessor. The results in this paper may be used to determine optimal strategies in developing an efficient programming environment for a NUMA system. Index Terms--- barrier, interprocessor communication, interconnection network, NUMA architectures, pre-scheduling, self-scheduling, remote memory access, shared memory, UMA architectures.
Citations
|
112
|
Validity of the single-processor approach to achieving large scale computing capabilities
– Amdahl
|
|
74
|
Experimental Comparison of Memory Management Policies for NUMA Multiprocessors
– Ellis
- 1991
|
|
63
|
The IBM research parallel processor prototype (RP3): Introduction and architecture
– PFISTER, BRANTLEY, et al.
- 1985
|
|
62
|
Plus: A Distributed Shared-Memory System
– Bisiani, Ravishankar
|
|
47
|
Measuring Parallel processor Performance
– Karp, Flatt
- 1990
|
|
42
|
Performance of parallel processors
– Flatt, Kennedy
- 1989
|
|
26
|
Multiprocessor Performance
– Gelenbe
- 1989
|
|
19
|
Effects of synchronization barriers on multiprocessor performance
– Axelrod
|
|
16
|
Cm*: A modular, multi-microprocessor
– Swan, Fuller, et al.
- 1977
|
|
15
|
Architecture of the Cedar parallel supercomputer
– Yew
- 1986
|
|
11
|
System effects of interprocessor communication latency in multicomputers
– Zhang
- 1991
|
|
10
|
Performance evaluation and prediction for parallel algorithms on the bbn gp1000
– Bodin, Windheiser, et al.
- 1990
|
|
8
|
Barrier synchronization over multistage interconnection networks
– Lee
- 1990
|
|
7
|
Behavior of the Butterfly parallel processor in the presence of memory hot spots
– Thomas
- 1986
|
|
7
|
Performance measurement and modeling to evaluate various effects on a shared memory multiprocessor
– Zhang
- 1991
|
|
2
|
et al, "A Simple Mechanism for Efficient Barrier Synchronization
– Birk
- 1990
|
|
2
|
An Overview of the Butterfly GP1000: A Large-Scale Parallel UNIX Computer
– Howe
- 1988
|
|
2
|
Distributed Task Processing and Performance on a NUMA Shared Memory Multiprocessor
– Zhang, Srinivasan
- 1990
|
|
1
|
et al, "Design and performance of generalized interconnection network
– Bhuyan
- 1986
|
|
1
|
et al, "Performance of multiprocessor interconnection network
– Bhuyan
- 1989
|
|
1
|
Cheriton et al, "Paradigm: a highly scalable shared-memory multicomputer architecture
– R
- 1991
|
|
1
|
Numerical Methods for Nonlinear Optimization and Nonlinear Equations
– Jr, Schnabel
|
|
1
|
et al, "Cedar -- a large scale multiprocessor
– Gajski
- 1983
|
|
1
|
et al, "Software management of Cm* -- a distributed multiprocessor
– Jones
- 1977
|
|
1
|
Experimental Studies on Different Programming Models on the BBN GP1000
– Wu
- 1991
|
|
1
|
Vranesic et al, "Hector: a hierarchically structured shared memory multiprocessor
– G
- 1991
|
|
1
|
Parallel partition and simulation for large-scale circuits on a local memory multicomputer
– Zhang
- 1990
|