MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Performance prediction and evaluation of parallel processing on a NUMA multiprocessor (1991) [8 citations — 2 self]

Download:
Download as a PDF | Download as a PS
by Xiaodong Zhang, Xiaohan Qin
IEEE Transactions on Software Engineering
http://www.cs.washington.edu/homes/xqin/numa.ps
Add To MetaCart

Abstract:

Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable Uniform Memory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to different processors, are performed through interconnection networks such as a multistage switching network. The efficiency of these basic operations determines the parallel processing performance on a NUMA multiprocessor. This paper presents several analytical models to predict and evaluate the overhead of interprocessor communication, process scheduling, process synchronization and remote memory access where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared memory multiprocessor. Both analytical and experimental results give a comprehensive and clear understanding of the various effects, which are important for the effective use of a NUMA shared memory multiprocessor. The results in this paper may be used to determine optimal strategies in developing an efficient programming environment for a NUMA system. Index Terms--- barrier, interprocessor communication, interconnection network, NUMA architectures, pre-scheduling, self-scheduling, remote memory access, shared memory, UMA architectures.

Citations

112 Validity of the single-processor approach to achieving large scale computing capabilities – Amdahl
74 Experimental Comparison of Memory Management Policies for NUMA Multiprocessors – Ellis - 1991
63 The IBM research parallel processor prototype (RP3): Introduction and architecture – PFISTER, BRANTLEY, et al. - 1985
62 Plus: A Distributed Shared-Memory System – Bisiani, Ravishankar
47 Measuring Parallel processor Performance – Karp, Flatt - 1990
42 Performance of parallel processors – Flatt, Kennedy - 1989
26 Multiprocessor Performance – Gelenbe - 1989
19 Effects of synchronization barriers on multiprocessor performance – Axelrod
16 Cm*: A modular, multi-microprocessor – Swan, Fuller, et al. - 1977
15 Architecture of the Cedar parallel supercomputer – Yew - 1986
11 System effects of interprocessor communication latency in multicomputers – Zhang - 1991
10 Performance evaluation and prediction for parallel algorithms on the bbn gp1000 – Bodin, Windheiser, et al. - 1990
8 Barrier synchronization over multistage interconnection networks – Lee - 1990
7 Behavior of the Butterfly parallel processor in the presence of memory hot spots – Thomas - 1986
7 Performance measurement and modeling to evaluate various effects on a shared memory multiprocessor – Zhang - 1991
2 et al, "A Simple Mechanism for Efficient Barrier Synchronization – Birk - 1990
2 An Overview of the Butterfly GP1000: A Large-Scale Parallel UNIX Computer – Howe - 1988
2 Distributed Task Processing and Performance on a NUMA Shared Memory Multiprocessor – Zhang, Srinivasan - 1990
1 et al, "Design and performance of generalized interconnection network – Bhuyan - 1986
1 et al, "Performance of multiprocessor interconnection network – Bhuyan - 1989
1 Cheriton et al, "Paradigm: a highly scalable shared-memory multicomputer architecture – R - 1991
1 Numerical Methods for Nonlinear Optimization and Nonlinear Equations – Jr, Schnabel
1 et al, "Cedar -- a large scale multiprocessor – Gajski - 1983
1 et al, "Software management of Cm* -- a distributed multiprocessor – Jones - 1977
1 Experimental Studies on Different Programming Models on the BBN GP1000 – Wu - 1991
1 Vranesic et al, "Hector: a hierarchically structured shared memory multiprocessor – G - 1991
1 Parallel partition and simulation for large-scale circuits on a local memory multicomputer – Zhang - 1990