Results 1 -
4 of
4
Implementation of NAS Parallel Benchmarks in High Performance Fortran
"... We present an HPF implementation of BT, SP, LU, FT, CG and MG of the NPB2.3-serial benchmark set. The implementation is based on HPF performance model of the benchmark specific primitive operations with distributed arrays. We present profiling and performance data on SGI Origin 2000 and compare the ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We present an HPF implementation of BT, SP, LU, FT, CG and MG of the NPB2.3-serial benchmark set. The implementation is based on HPF performance model of the benchmark specific primitive operations with distributed arrays. We present profiling and performance data on SGI Origin 2000 and compare the results with NPB2.3. We discuss advantages and limitations of HPF and pghpf compiler.
Porting and Performance Evaluation of Irregular Codes using OpenMP
- IN PROCEEDINGS OF THE FIRST EUROPEAN WORKSHOP ON OPENMP, SEPTEMBER/OCTOBER
, 1999
"... In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and lo ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory configurations are now available. Thus, OpenMP stands to beapromising medium for developing scalable and portable parallel programs. In this paper
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
- JOURNAL OF INSTRUCTION-LEVEL PARALLELISM
, 2003
"... Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us to closely approach hand-coded performance for tightly-coupled line sweep applications including the NAS SP and BT benchmark codes. From lightly-modified copies of standard serial versions of these benchmarks, dHPF generates MPI-based parallel code that is within 4% of the performance of the hand-crafted MPI implementations of these codes for a 102³ problem size (Class B) on 64 processors. We describe and quantitatively evaluate the impact of partitioning, communication and memory hierarchy optimizations implemented by dHPF that enable us to approach hand-coded performance with compiler-generated parallel code.