MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Reducing remote conflict misses: NUMA with remote cache versus COMA (1997) [25 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Zheng Zhang, Josep Torrellas
in Proceedings of International Symposium on High Performance Computer Architecture
ftp://ftp.cs.uiuc.edu/pub/research-groups/csrd/iacoma/numarc.ps
Add To MetaCart

Abstract:

Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-write migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to two reasons. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set in NUMA-RC. Second, COMA memory accesses are more expensive. Of course, our results are affected by the applications used, which have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications.

Citations

149 DDM - A Cache-Only Memory Architecture – Hagersten, Landin, et al. - 1992
133 STiNG: A CC-NUMA Computer System for the Commercial Marketplace – Lovett, Clapp - 1996
111 Comparative performance evaluation of cachecoherent NUMA and COMA architectures – Stenstrom, Joe, et al. - 1992
69 An Argument for Simple COMA – Saulsbury, Wilkinson, et al. - 1995
62 Simulation of Multiprocessors: Accuracy and Performance – Goldschmidt - 1993
50 The S3.mp Scalable Shared Memory Multiprocessor – Nowatzyk - 1995
40 Evaluating the Memory Overhead Required for COMA Architectures – JOE, HENNESSY - 1994
16 COMA-F: A Non-Hierarchical Cache Only Memory Architecture – Joe - 1995
4 The SPLASH-2 Programs: Chracterization and Methodological Considerations – Woo, Ohara, et al. - 1995