Results 1 -
2 of
2
Data Forwarding in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 1995 International Conference on Supercomputing
, 1995
"... Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches. This paper addresses two main issues. First, it presents a framework for a compiler algorithm for forwarding. Second, using address traces, it evaluates the performance impact of different levels of support for forwarding. Our simulations of a 32-processor machine show that a slightlyoptimistic support for forwarding speeds up five applications by, on average, 50% for large caches and 30% for small caches. For large caches, many sharing read misses can be eliminated, while for smaller caches, forwarding ...
Speeding up the Memory Hierarchy in Flat COMA Multiprocessors
- In Proc. of the 3rd IEEE Symposium on High-PerformanceComputer Architecture (HPCA3
, 1997
"... Scalable Flat Cache Only Memory Architectures (Flat COMA) are designed for reduced memory access latencies while minimizing programmer and operating system involvement. Indeed, to keep memory access latencies low, neither the programmer needs to perform clever data placement nor the operating system ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Scalable Flat Cache Only Memory Architectures (Flat COMA) are designed for reduced memory access latencies while minimizing programmer and operating system involvement. Indeed, to keep memory access latencies low, neither the programmer needs to perform clever data placement nor the operating system needs to perform page migration. The hardware automatically replicates the data and migrates it to the attraction memories of the nodes that use it. Unfortunately, part of the latency of memory accesses is superfluous. In particular, reads often perform unnecessary attraction memory accesses, require too many network hops, or perform necessary attraction memory accesses inefficiently. In this paper, we propose relatively inexpensive schemes that address these three problems. To eliminate unnecessary attraction memory accesses, we propose a small direct-mapped cache called Invalidation Cache (IVC). To reduce the number of network hops, the IVC is augmented with hint pointers to processors. T...

