MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Reducing Remote Conflict Misses in Shared-Memory Multiprocessors: NUMA with Remote Cache and COMA

Download:
Download as a PDF | Download as a PS
unknown authors
ftp://ftp.csrd.uiuc.edu/pub/misc/zzhang/research/l3coma.ps
Add To MetaCart

Abstract:

Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-write migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to several reasons beyond the fact that COMA memory accesses are more expensive. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set. Second, simple data mastership assignment algorithms in COMA may cause what we call false replication. Finally, many of the Splash2 applications have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications.

Citations

149 DDM - A Cache-Only Memory Architecture – Hagersten, Landin, et al. - 1992
133 STiNG: A CC-NUMA Computer System for the Commercial Marketplace – Lovett, Clapp - 1996
111 Comparative performance evaluation of cachecoherent NUMA and COMA architectures – Stenstrom, Joe, et al. - 1992
69 An Argument for Simple COMA – Saulsbury, Wilkinson, et al. - 1995
62 Simulation of Multiprocessors: Accuracy and Performance – Goldschmidt - 1993
50 The S3.mp Scalable Shared Memory Multiprocessor – Nowatzyk - 1995
40 Evaluating the Memory Overhead Required for COMA Architectures – JOE, HENNESSY - 1994
16 COMA-F: A Non-Hierarchical Cache Only Memory Architecture – Joe - 1995
4 The SPLASH-2 Programs: Chracterization and Methodological Considerations – Woo, Ohara, et al. - 1995