|
711
|
Advanced compiler design and implementation
– Muchnick
- 1997
|
|
498
|
The nas parallel benchmarks
– Bailey, Barton, et al.
- 1991
|
|
160
|
Parallel programming in Split-C
– Culler, Dusseau, et al.
- 1993
|
|
145
|
The Livermore Fortran Kernels: A computer test of the numerical performance range
– McMahon
- 1986
|
|
143
|
Communication optimization and code generation for distributed memory machines
– Amarasinghe, Lam
- 1993
|
|
142
|
An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines
– Tseng
- 1993
|
|
127
|
Optimizing Compilers for Modern Architectures
– Allen, Kennedy
- 2002
|
|
120
|
Efficient and correct execution of parallel programs that share memory
– Shasha, Snir
- 1988
|
|
109
|
Combining loop transformations considering caches and scheduling
– Wolf, Maydan, et al.
- 1996
|
|
103
|
Co-Array Fortran for parallel programming
– Numrich, Reid
- 1998
|
|
59
|
What are race conditions? some issues and formalizations
– Netzer, Miller
- 1992
|
|
49
|
Global communication analysis and optimization
– Chakrabarti, Gupta, et al.
- 1996
|
|
35
|
Analyses and optimizations for shared address space programs
– Krishnamurthy, Yelick
- 1996
|
|
27
|
UPC performance and potential: A NPB experimental study
– El-Ghazawi, Cantonnet
- 2002
|
|
27
|
Communication optimizations for parallel C programs
– Zhu, Hendren
- 1998
|
|
25
|
A performance analysis of the Berkeley UPC Compiler
– Chen, Bonachea, et al.
- 2003
|
|
24
|
An Evaluation of Current High-Performance Networks
– Bell, Bonachea, et al.
- 2003
|
|
17
|
Co-array Fortran performance and potential: An NPB experimental study
– Coarfa, Dotsenko, et al.
- 2003
|
|
15
|
GASNet specification
– Bonachea
- 2002
|
|
15
|
A global communication optimization technique based on data-flow analysis and linear algebra
– Kandemir, Banerjee, et al.
- 1999
|
|
14
|
et al. Titanium: A highperformance Java dialect
– Yelick
- 1998
|
|
11
|
Evaluating support for global address space languages on the cray x1
– Bell, Chen, et al.
- 2004
|
|
9
|
Efficient Matrix Chain Ordering in Polylog Time
– Bradford, Rawlins, et al.
- 1994
|
|
9
|
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
– Cantonnet, Yao, et al.
- 2003
|
|
8
|
Proposal for extending the UPC memory copy library functions and supporting extensions to GASNet, v1.0
– Bonachea
- 2004
|
|
8
|
et al. Titanium language reference manual
– Hilfinger
- 2001
|
|
7
|
Loop induction variable canonicalization in parallelizing compilers
– Liu, Lo, et al.
- 1996
|
|
6
|
MuPC: A run time system for unified parallel c
– Savant, Seidel
- 2002
|
|
4
|
C/C++ reference manual
– Cray
|
|
4
|
et al. UPC implementation of an unbalanced tree search benchmark
– Prins, Huan, et al.
- 2003
|
|
3
|
X1 system overview
– Cray
|
|
2
|
et al. Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures
– Berlin, Huan, et al.
- 2003
|
|
2
|
et al. A new algorithm for partial redundancy elimination based on SSA form
– Chow, Chan, et al.
- 1997
|
|
2
|
et al. Effective representation of aliases and indirect memory operations in ssa form
– Chow, Chan, et al.
- 1996
|
|
2
|
et al. A HPF compiler for the IBM SP2
– Gupta, Midkiff, et al.
- 1995
|
|
2
|
Canadian Task Force on the Periodic Health Examination (1994) Canadian Guide to Clinical Preventive Health Care (pp. 620–631). Ottowa: Canada Communication Group
– Lin
- 2003
|
|
2
|
Array prefetching for irregular array accesses in titanium
– Su, Yelick
- 2004
|