### Table 1: Level 3 BLAS operations.

in GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark

1998

"... In PAGE 33: ...Table1 0: GEMM-ratios for the IBM ESSL library on IBM SP2 thin node and wide node. Underlying routines: ESSL.... In PAGE 34: ...Table1 1: GEMM-ratios on a single node of the Intel PARAGON. Underlying routines: Paragon Basic Math Library and KD-GEMV.... In PAGE 35: ...Table1 2: GEMM-ratios for machine speci#0Cc libraries on SGI Indy with MIPS R4000 and R4400 processor. Underlying routines: SGI library libblas.... In PAGE 36: ...Table1 3: GEMM-ratios on a single node of the Parsytec GC#2FPP for the original level 3 BLAS model implementations from netlib. Underlying routines: BP-DGEMM and original netlib BLAS #28second column#29, BP-DGEMM, POL-DGEMV and original netlib BLAS #28third column#29.... In PAGE 37: ...Table1 4: Multiprocessor performance of the GEMM-Based DSYR2K on the ALLIANT FX#2F2816. Dimensions DSYR2K DGEMM K N p M#0Dops S p E p M#0Dops E GEMM 512 512 1 34.... In PAGE 37: ...88 0.50 Table1 5: Performance in M#0Dops for NEC SX-3. Leading dimension of arrays is 512.... In PAGE 38: ...Table1 6: Performance in M#0Dops for NEC SX-3. Leading dimension of arrays are 530.... In PAGE 38: ...5 1986.9 Table1 7: Benchmark results for canonical input #0Cle DMARK01 Machines GEMM- Original Vendor DGEMM Comments based netlib supplied M#0Dops IBM RS6K 530H 0.73 0.... In PAGE 39: ...Table1 8: Benchmark results for canonical input #0Cle DMARK02 Machines GEMM- Original Vendor DGEMM Comments based netlib supplied M#0Dops IBM RS6K 530H 0.83 0.... ..."

Cited by 51

### Table 2: Level 3 BLAS parameter lists.

in GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark

1998

Cited by 51

### Table 5: Level 3 BLAS performance indicator

"... In PAGE 6: ... Table 4 describes the ASC MSRC machines discussed in this report, as they were con- #0Cgured during these timings. Table5 shows the compute kernel indicators, while tables 6, 7, and 8 show the perfor- mance of various message passing libraries across the systems. The measurement labeled F MM is our #5Cachievable peak quot; for uniprocessor #0Doating point performance, whichwehave arbitrarily chosen to be a matrix-matrix multiplication of order 500.... ..."

### Table 5.5: The effect of block size on factorization time. Results from DEC 3000-400. Level 2 BLAS Level 3 BLAS Level 3 BLAS Level 3 BLAS

### Table 9. Results with Level 1 BLAS divided by those with Level 3 BLAS and block size 32.

### Table 10. Results with Level 2 BLAS divided by those with Level 3 BLAS and block size 32.

### Table 9. Results with Level 1 BLAS divided by those with Level 3 BLAS and block size 32.

in The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations

### Table 10. Results with Level 2 BLAS divided by those with Level 3 BLAS and block size 32.

in The design of MA48, a code for the direct solution of sparse unsymmetric linear systems of equations

### Table 3: Examples of Level 3 BLAS Operations Function BLAS Name Operation Sparse Vector Update SAXPYI

1990

Cited by 6