| J. Demmel, LAPACK: a Portable Linear Algebra Library for High-Performance Computers, Concurrency: Practice and Experience, vol. 3(6) (1991), 655--666. |
.... To simplify the creation of efficient algorithms on these different architectures computational primitives have been developed like the basic linear algebra subroutines (BLAS) BLAS1, BLAS2, and BLAS3 [16] Recently the BLAS 3 standard was fully exploited in the definition of the LAPACK library [17]. The arrival of parallel distributed memory architectures increased the complexity of defining numerical libraries which can be efficiently exploited by these architectures. In fact it is well known that just a few complete application codes are ported to these new architectures, because of the ....
J. Demmel, LAPACK: a Portable Linear Algebra Library for High-Performance Computers, Concurrency: Practice and Experience, vol. 3(6) (1991), 655--666.
....requires blocking in space. In general, full space time blocking is required to give a universal implementation of data locality that will lead to good performance on both distributed and heirarchical memory machines. This strategy is used in the implementation of the BLAS 3 primitives in LAPACK [7]. t t CPU CPU t Cache memory Main memory Node 1 Node 2 calc comm mem Figure 7: The fundamental time constants of a heirarchical memory parallel computer The directives in High Performance Fortran essentially specify data locality, so we believe that an HPF compiler can use the concepts of this ....
Demmel, J. (1991). LAPACK: A Portable Linear Algebra Library for High-Performance Computers. Concurrency: Practice and Experience 3, 655-666.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC