11 citations found. Retrieving documents...
George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Performance Analysis of Multiprocessors Memory System - Temam   (Correct)

....is such that it should be considered with the same care. 2.2 An Example of Real Numerical loop nest The purpose of this section is to illustrate the influence of communications on a real numerical loop nest. This loop nest is extracted from the routine EFLUX of FLO52 a Perfect Club benchmark [4]. C parallel DO 3 N=1,4 DO 3 J=2,JL DO 3 I=2,IL DW(I,J,N) FS(I,J,N) FS(I 1,J,N) 3 CONTINUE 8 0 5 10 15 20 25 30 20 15 10 5 0 1 1.5 2 # procs m Figure 5: Influence of Data Layout. C parallel DO 5 J=2,jl DO 5 I=2,IL XX = X(I,J,1) X(I 1,J,1) YX = X(I,J,2) ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Cache Awareness in Blocking Techniques, Part II - Temam, Fricker, Jalby   (Correct)

....the same cache line is equal to L S C S . So, such phenomena have a very low probability to occur, but they might be very difficult to trace back in a program. Therefore they should be detected at compile time or run time. For instance, in subroutine HYD (and other subroutines) of the Perfect [10] benchmark ADM, ping pong phenomena were detected. 14 A 61 decrease of the average memory access time was obtained by slightly varying the base addresses of arrays P and PS in HYD (see figure 8.3.2.3f) 15 Spatial self interferences can also frequently occur because of the widespread use of ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Improving Single-Process Performance with Multithreaded.. - Farcy, Temam (1996)   (3 citations)  (Correct)

....For the first category, we selected Gcc, Compress, Simulator, Contour, Grep, Latex and Segmentation 6 . These seven programs were only used to increase the load on the processor and no specific performance evaluation was done on those. For numerical codes, we picked three of the Perfect Club [5] benchmarks: FLO52, ARC2D, MDG. Since this study is focused on analyzing the behavior of multiple shared context threads, statistics are collected over parallel sections only, even though all instructions are executed. Considering our goal was to evaluate the best achievable performance with a ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Software Assistance for Data Caches - Temam, Drach (1995)   (14 citations)  (Correct)

....information is incorrect. As a consequence, the AMAT metric (Average Memory Access Time) instead of the CPI metric (Cycles Per Instruction) was used. Benchmarks The benchmarks used are all numerical codes: real application benchmarks ADM, MDG, BDN, DYF, ARC, FLO, TRF from the Perfect Club Suite [6], the Livermore Loops benchmark LIV, the NAS and Slalom benchmarks, and two numerical primitives Matrix Vector multiply MV and Sparse Matrix Vector multiply SpMV. Notations and Parameters A baseline cache configuration called Standard or Stand. has been used on many graphs. It corresponds to the ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Using Virtual Lines to Enhance Locality Exploitation, Part III - Temam, Jegou (1994)   (8 citations)  (Correct)

....time of cache C 2 : A collection of 10 benchmarks has been used. Though it is difficult to choose a representative set of benchmarks, we tried to combine different types of codes to illustrate the maximum number of behaviors. Some numerical codes have been picked in the Perfect Club Suite [1] (benchmarks AP,ARC,BDNA,WS) some codes are Unix tools (CC is the gnu cc compiler, CPR is the compress utility, TEX is the LaTeX compiler) and other codes are numerical primitives (LL is a Lawrence Livermore loop, SPMV is sparse matrix vector multiply loop, and MM is matrix matrix multiply ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Cache Interference Phenomena - Temam, Fricker, Jalby (1994)   (53 citations)  (Correct)

....any type of reuse can be disrupted. In general, the larger the reuse distance for a given datum, the higher the probability it is flushed from cache, because the more data are loaded before reuse occurs. Consider loop 2a which is a modified version of a loop in ARC2D, a Perfect Club benchmark [1]. The reuse distance associated with the spatial reuse of LDA(k) is equal to 1 iteration of loop k (unlikely to be disrupted) while the reuse distance associated with the temporal reuse of LDA(k) is equal to KU iterations of loop k (more likely to be disrupted, depending on the value of KU ) ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Cache Awareness in Blocking Techniques - Temam, Fricker, Jalby (1998)   (1 citation)  (Correct)

....2 (for the sake of clarity, the scale has been modified) 2.3.4 Putting it all together In this section, the number of cache misses of loop 2.3.4a is computed, based on the techniques presented in the previous sections. This loop is a modified version of a loop in ARC2D, a Perfect Club benchmark [16]. In this loop, several interference phenomena occur simultaneously (internal crossinterferences, external cross interferences of self dependences group dependences) The leading dimension of arrays F; U is KU . One parameter, ffi, is used to characterize the distance between the different DO ....

....L S C S . So, such phenomena have a very low probability to occur, but the associated performance degradation might be very difficult to trace back in a program. Therefore they should be detected at compile time or run time. For instance, in subroutine HYD (and other subroutines) of the Perfect [16] benchmark ADM, ping pong phenomena were detected. 10 The average memory access time was divided by 2, simply by slightly varying the base addresses of two arrays P and PS in subroutine HYD (see figure 3.2f) 11 Spatial self interferences can also frequently occur because of the widespread use ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Proceedings of IEEE Supercomputing'90 Conference, pages 254--266, 1990.


Software Assistance for Data Caches - Temam, Drach (1995)   (14 citations)  (Correct)

....Thanks to this tool, all array references in each code were instrumented. For more details on the issues of source code tracing see [17] Benchmarks The benchmarks used are all numerical codes: real application benchmarks ADM, MDG, BDN, DYF, ARC, FLO, TRF from the Perfect Club Suite [4], the Livermore Loops benchmark LIV, the NAS and Slalom benchmarks, and two numerical primitives Matrix Vector multiply MV and Sparse Matrix Vector multiply SpMV. Notations and Parameters A baseline cache configuration called Standard or Stand. has been used on many graphs. It corresponds to the ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254-- 266, 1990.


Streaming Prefetch - Temam (1995)   (4 citations)  (Correct)

....were passed to the simulator. To come as close as possible to the conditions of a superscalar processor, we assumed one load store request is sent to the cache every cycle, without disruption (due to branches or else) Benchmarks and traces Seven benchmarks from the Perfect Club Suite [2] were used. For each benchmark, a 50 million entry trace was extracted. In our case, source code trace would have been the easiest solution, but because hardware implementation issues had to be finely studied, we decided against it and extracted objet code traces to get al..l load store references. ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


Using Virtual Lines to Enhance Locality Exploitation - Temam, Jegou (1994)   (8 citations)  (Correct)

....time of cache C 2 : A collection of 10 benchmarks has been used. Though it is difficult to choose a representative set of benchmarks, we tried to combine different types of codes to illustrate the maximum number of behaviors. Some numerical codes have been picked in the Perfect Club Suite [1] (benchmarks AP,ARC,BDNA,WS) some codes are Unix tools (CC is the gnu cc compiler, CPR is the compress utility, TEX is the LaTeX compiler) and other codes are numerical primitives (LL is a Lawrence Livermore loop, SPMV is sparse matrix vector multiply loop, and MM is matrix matrix multiply ....

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.


with the Spa package [2] (the target file of - Compress Contained   (Correct)

No context found.

George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputing Performance Evaluation and the Perfect Benchmarks. In Supercomputing '90, pages 254--266, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC