MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Comparative modeling and evaluation of CC-NUMA and COMA on hierarchical ring architectures (1995) [6 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Xiaodong Zhang, Yong Yan
IEEE Transactions on Parallel and Distributed Systems
http://www.cs.wm.edu/~zhang/../hpcs/WWW/HTML/publications/./papers/TR-94-03-01.ps.Z
Add To MetaCart

Abstract:

Parallel computing performance on scalable shared-memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Non-Uniform Memory Access (CCNUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an efficient interconnection network in hardware. This paper focuses on comparative performance modeling and evaluation of CC-NUMA and COMA on a hierarchical ring shared-memory architecture. Analytical models for the two memory systems for comparative evaluation are presented. Intensive performance measurements on data migrations have been conducted on the KSR-1, a COMA hierarchical ring shared-memory machine. Experimental results support the analytical models, and we present practical observations and comparisons of the two cache coherence memory systems. Our analytical and experimental results show that a COMA system balances the work load well. However the overhead of frequent data movement may match the gains obtained from improving load balance. We believe our performance results could be further generalized to the two memory systems on a hierarchical network architecture. Although a CC-NUMA system may not automatically balance the load at the system level, it provides an option for a user to explicitly handle data locality for a possible performance improvement.

Citations

187 The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers – Reinhardt, Hill, et al. - 1993
149 DDM - A Cache-Only Memory Architecture – Hagersten, Landin, et al. - 1992
98 The DASH Prototype: Logic Overhead and Performance – Lenoski, Laudon, et al. - 1993
40 Evaluating the Memory Overhead Required for COMA Architectures – JOE, HENNESSY - 1994
26 Approximate analysis of single and multiple ring networks – Bhuyan, Ghosal, et al. - 1989
24 The performance of cache-coherent ring-based multiprocessors – Barroso, Dubois - 1993
17 Latency metric: An experimental method for measuring and evaluating program and architecture scalability – Zhang, Yan, et al. - 1994
8 Spin-Lock Synchronization on the Butterfly and KSR1 – Zhang, Castaneda, et al. - 1994
8 Comparative performance evaluation of hot spot contention between min-based and ring-based shared-memory architectures – Zhang, Yan, et al. - 1995
4 Execution behavior analysis and performance improvement in shared-memory architectures – Zhang, He, et al. - 1993
3 Cache consistency in hierarchical ring-based multiprocessors – Farkas, Vranesic, et al. - 1992
3 Short-Packet Transfer Performance in Local Area Ring Network – Loucks, Hamacher, et al. - 1985
3 Hector: a hierarchically structured shared-memory ` multiprocessor – al - 1991
2 An empirical comparison of the KSR and DASH multiprocessors", Supercomputing 93 – al - 1993
2 Comparative performance of cache-coherent NUMA and COMA architectures – Stenstrom, Joe, et al. - 1992