MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A Framework for Multiprocessor Performance Characterization and Calibration (1992) [2 citations — 0 self]

Download:
pdf | ps
by Arun K. Nanda, Arun K. Nanda
ftp://ftp.cps.msu.edu/pub/acs/reports/Thesis/nanda.ps.gz
Add To MetaCart

Abstract:

In parallel programs using the shared-variable paradigm, run-time communication overhead manifests itself along three principal dimensions, namely, shared data accesses (including memory contention, cache misses and non-local memory access latencies), inter-process synchronization operations, and global barrier synchronizations. Performance measurements to quantify the rate at which communication costs for an algorithm increases as more processors are used is integral to the study of an algorithm's efficiency and scalability. In this thesis, we explore the problem of performance characterization of a multiprocessor in the context of the shared-variable programming model with emphasis on characterizing the dynamic run-time behavior. We have developed a hierarchical model to characterize multiprocessor system performance using a multi-phase computation structure with concurrent asynchronous execution within a phase. Two sets of system characterization parameters have been proposed that completely describe the static and dynamic behavior of a given input workload on a target multiprocessor system. The characterization parameters are calibrated by experimental measurements on the input workload. A series of

Citations

979 An introduction to probability theory and its application, volume I – Feller - 1967
848 Memory coherence in shared virtual memory systems – Li, Hudak - 1989
705 SPLASH: Stanford Parallel Applications for Shared Memory – Singh, Weber, et al. - 1992
375 Algorithms for Scalable Synchronization on Shared-memory Multiprocessors – Mellor-Crummey, Scott - 1991
269 Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities – Amdahl - 1967
218 Dependence graphs and compiler optimizations – Kuck, Kuhn, et al. - 1981
213 The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers – Berry, Chen, et al. - 1989
186 Hot-spot Contention and Combining in Multistage Interconnection Networks – Pfister, Norton - 1985
179 A Fast Mutual Exclusion Algorithm – Lamport - 1983
143 Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors – Goodman, Vernon, et al. - 1989
140 The Livermore Fortran kernels: a computer test of the numerical performance range – McMahon - 1986
127 The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors – Anderson, Lazowska, et al.
126 Mirage: A Coherent Distributed Shared Memory Design – Fleisch, Popek - 1989
116 A survey of cache coherence schemes for multiprocessors – Stenstrom - 1990
113 Performance of processor-memory interconnections for multiprocessors – Patel - 1981
110 A characterisation of sharing in parallel programs and its application to coherency protocol evaluation – Eggers, Katz - 1988
108 The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum – Cox, Fowler - 1989
106 Allocating independent subtasks on parallel processors – Kruskal, Weiss - 1984
104 Dhrystone: a synthetic systems programming benchmark – Weicker - 1984
92 Distributing Hot-Spot Addressing in LargeScale Multiprocessors – Yew, Tzeng, et al. - 1987
81 The NYU Ultracomputer { designing an MIMD shared memory parallel computer – Gottlieb, Grishman, et al. - 1983
81 E cient Synchronization on Multiprocessors with Shared Memory – Kruskal, Rudolph, et al. - 1988
75 Supercomputer performance evaluation and the Perfect Benchmarks – CYBENKO, KIPP, et al. - 1990
71 Impossibility and universality results for wait-free synchronization – Herlihy - 1988
68 A Hypercube Shared Virtual Memory System – Li, Schaefer - 1989
63 The IBM research parallel processor prototype (RP3): Introduction and architecture – PFISTER, BRANTLEY, et al. - 1985
62 Plus: A Distributed Shared-Memory System – Bisiani, Ravishankar
60 Machine characterization BASed on an abstract high level machine – Saavedra-Barrera, Smith, et al. - 1989
59 A synthetic benchmark – Curnow, Wichmann - 1976
57 Two algorithms for barrier synchronization – Hensgen, Finkel, et al. - 1988
46 Adaptive backoff synchronization techniques – Agarwal, Cherian - 1989
46 The fuzzy barrier: a mechanism for high speed synchronization of processors – Gupta - 1989
39 SPEC Benchmark Suite: Designed for today's advanced system – Uniejewski - 1989
38 Coherence of Distributed Shared Memory: Unifying – Ramachandran, Ahamad, et al. - 1989
37 Vector access performance in parallel memories using a skewed storage scheme – Harper, Jump - 1987
36 ªPerformance of Synchronous Parallel Algorithms with Regular Structures,º – Madala, Sinclair - 1991
35 How not to lie with statistics: the correct way to summarize benchmark results – Fleming, Wallace - 1986
34 The Monarch parallel processor hardware design – Rettberg, Crowther, et al. - 1990
33 The butter y barrier – Brooks - 1986
33 On the Effective Bandwidth of Interleaved Memories in Vector Processing Systems – Oed, Lange - 1985
32 Characterizing Computer Performance with a Single Number – Smith - 1988
31 Square Multiprocessor: Early Experiences and Performance – Kendall - 1992
28 The performance of spin lock alternatives for shared memory multiprocessors – Anderson - 1990
27 The NAS Kernel Benchmark Program – Bailey, Barton - 1985
27 The LINPACK benchmark: An explanation – Dongarra - 1987
27 Computer Performance Evaluation Methodology – Heidelberger, Lavenberg - 1984
27 The prime memory system for array access – Lawrie, Vora - 1982
26 Multiprocessor Performance – Gelenbe - 1989
26 Synchronization with multiprocessor caches – Lee, Ramachandran - 1990
25 Performance Observability – Malony - 1990