| A. Henriksson and I. Jonsson. High-Performance Matrix Multiplication on the IBM SP High Node. Master Thesis, UMNAD 98.235, Department of Computing Science, Umea University, S-901 87 Umea, June 1998. |
....dimensions inherited from the block formats. The fixed leading dimensions guarantee that there is no or only small performance differences between different transpose arguments of the GEMM operation (AB; A T B; AB T ; A T B T ) For a thorough explanation of these kernels, see [7] and [8]. In our view the input matrices which an application (algorithm) will use define how their submatrices will be loaded. Knowing the full algorithm and the machine architecture allow us to choose nb so that misalignment never or rarely occur. C11 C12 C21 C22 A11 A12 A21 A22 B11 B12 B21 ....
....information (next set of blocks) in the argument list to the RB GEMM subroutine. A third way of doing prefetching is to in advance build a batch list of the multiplications to be performed. Using this list, finding the next set is trivial. The last method is used in our C implementation [8]. function [C(1 : M; 1 : N ) RB GEMM (C(1 : M; 1 : N) A(1 : M; 1 : K) B(1 : K; 1 : N) m; n; k) if m = 1 and n = 1 and k = 1 then fLeaf nodeg DGEMM(C(1 : M; 1 : N) A(1 : M; 1 : K) B(1 : K; 1 : N) fCall kernel routine g else if m = maxfm; n; kg then m1 = m=2; m2 = m Gamma m1; f# ....
[Article contains additional citation context not shown here]
A. Henriksson and I. Jonsson. High-Performance Matrix Multiplication on the IBM SP High Node. Master Thesis, UMNAD 98.235, Department of Computing Science, Umea University, S-901 87 Umea, June 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC