(Enter summary)
Abstract: Integrating 1 support for block data transfer has become an im-
portant emphasis in recent cache-coherent shared address space
multiprocessors. This paper examines the potential performance
benefits of adding this support. A set of ambitious
hardware mechanisms is used to study performance gains in
five important scientific computations that appear to be good
candidates for using block transfer. Our conclusion is that the
benefits of block transfer are not substantial for hardware cache-... (Update)
Context of citations to this paper: More
...and an upper triangular matrix. The factorization uses blocking to exploit temporal locality w.r.t. individual submatrix elements [12]. Originally designed to run on shared memory systems, this benchmark can only be used on a single SMP node of the IBM SP. Some...
.... N body method) FFT (complex 1 D version of the radix # # six step FFT algorithm [Bai90] LU (blocked LU decomposition, see [WSH94] for more details) CLU (blocked LU decomposition with contiguous allocation of data, more optimized version of LU) Radix (integer radix...
Cited by: More
Integrating Non-blocking Synchronisation in Parallel.. - Tsigas, Zhang (2002)
(Correct)
Efficient Runtime Support for Cluster-Based Distributed Shared.. - Speight (1997)
(Correct)
DSZOOM - Low Latency Software-Based Shared Memory - Radovic, Hagersten (2001)
(Correct)
Active bibliography (related documents): More All
0.6: The Performance Advantages of Integrating Block Data Transfer in.. - Woo (1996)
(Correct)
0.2: Comparing MPI Performance of SCI and VIA - Seifert, Balkanski, Rehm (2000)
(Correct)
0.2: ILP versus TLP on SMT - Mitchell, Carter, Ferrante, Tullsen (1999)
(Correct)
Similar documents based on text: More All
0.6: Coherent Block Data Transfer in the FLASH Multiprocessor - Heinlein, Bosch, Jr.. (1997)
(Correct)
0.5: A Comparison of MPI, SHMEM and Cache-coherent Shared.. - Hongzhang Shan And
(Correct)
0.5: Real-Time Block Transfer Under a Link Sharing Hierarchy - Xie, Lam (1996)
(Correct)
Related documents from co-citation: More All
13: SPLASH: Stanford parallel applications for shared memory (context) - Singh, Weber et al. - 1992
11: The Stanford FLASH Multiprocessor (context) - Kuskin - 1994
10: Integrating Message-Passing and Shared-Memory; Early Experience
- Kranz, Johnson et al. - 1993
BibTeX entry: (Update)
S.C. Woo, J.P. Singh, and J.L. Hennessy. The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 219--231, October 1994. http://citeseer.ist.psu.edu/woo94performance.html More
@inproceedings{ woo94performance,
author = "Steven Cameron Woo and Jaswinder Pal Singh and John L. Hennessy",
title = "The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors",
booktitle = "{\em Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems}",
address = "San Jose, California",
pages = "219--229",
year = "1994",
url = "citeseer.ist.psu.edu/woo94performance.html" }
Citations (may not include all citations):
362
The Stanford FLASH Multiprocessor (context) - Kuskin - 1994 ACM DBLP
357
The Directory-Based Cache Coherence Protocol for the DASH Mu.. (context) - Lenoski, Laudon et al. - 1990
301
The Midway Distributed Shared Memory System (context) - Bershad, Zekauskas et al. - 1993 ACM
268
Tempest and Typhoon: User-level Shared Memory
- Reinhardt, Lares et al. - 1994 DBLP
267
Multi-Level Adaptive Solutions to Boundary-Value Problems (context) - Brandt - 1977
212
APRIL: A Processor Architecture for Mul- tiprocessing
- Agarwal, Lim et al. - 1990
182
A Comparison of Sorting Algorithms for the Connection Machin..
- Blelloch - 1991
138
SPLASH: Stanford Parallel Applications for Shared Memory (context) - Singh, Weber et al. - 1992
96
Integrating Message-Passing and Shared-Memory: Early Experie..
- Kranz - 1993 ACM DBLP
61
FFTs in External or Hierarchical Memory
- Bailey - 1990 ACM DBLP
45
Simulation of Multiprocessors: Accuracy and Performance (context) - Goldschmidt - 1993
15
Scaling Parallel Programs for Multiprocessors: Methodology a.. (context) - Singh, Hennessy et al. - 1993
8
An Evaluation of Software Distributed Shared Memory for Next.. (context) - Dwarkadas, Keleher et al. - 1993
8
The Performance Advantages of Integrating Message Passing in.. (context) - Woo, Singh et al. - 1993 ACM
5
The NAS Parallel Benchmarks
- Bailey - 1991
2
Cray TD System Architecture and Overview (context) - Inc, Architecture et al. - 1993
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.princeton.edu/~jps/papers/appls-arch.html): More
Scope Consistency : A Bridge between Release Consistency and.. - Iftode (1996)
(Correct)
Understanding Application Performance on Shared Virtual Memory.. - Iftode (1996)
(Correct)
Irregular Applications under Software Shared Memory - Iftode, Singh, Li (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC