Reducing remote conflict misses: NUMA with remote cache versus COMA (1997) [25 citations — 0 self]
Abstract:
Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-write migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to two reasons. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set in NUMA-RC. Second, COMA memory accesses are more expensive. Of course, our results are affected by the applications used, which have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications.
Citations
| 149 | DDM - A Cache-Only Memory Architecture – Hagersten, Landin, et al. - 1992 |
| 133 | STiNG: A CC-NUMA Computer System for the Commercial Marketplace – Lovett, Clapp - 1996 |
| 111 | Comparative performance evaluation of cachecoherent NUMA and COMA architectures – Stenstrom, Joe, et al. - 1992 |
| 69 | An Argument for Simple COMA – Saulsbury, Wilkinson, et al. - 1995 |
| 62 | Simulation of Multiprocessors: Accuracy and Performance – Goldschmidt - 1993 |
| 50 | The S3.mp Scalable Shared Memory Multiprocessor – Nowatzyk - 1995 |
| 40 | Evaluating the Memory Overhead Required for COMA Architectures – JOE, HENNESSY - 1994 |
| 16 | COMA-F: A Non-Hierarchical Cache Only Memory Architecture – Joe - 1995 |
| 4 | The SPLASH-2 Programs: Chracterization and Methodological Considerations – Woo, Ohara, et al. - 1995 |

