| D. Bailey, K. Lee, and H. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, The Journal of Supercomputing #, pp. 357371, 1990. 16 |
....enables portable application code to obtain high performance provided that an optimized library (e.g. AGZ94, KHM94] is available and affordable. Developing an optimized library, however, is a difficult and time consuming task. Even excluding algorithmic variants such as Strassen s method [BLS91] for matrix multiplication,these routines havea large design space with many parameters such as blocking sizes, loop nesting permutations, loop unrolling depths, software pipelining strategies, register allocations, and instruction schedules. Furthermore, these parameters have complicated ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....the specifications. This freedom is most relevant in the case of the level 3 BLAS, where fast matrix multiplication techniques can be applied. There has been interest in using Strassen s method, which has been demonstrated to be viable for practical computation in terms of both speed and accuracy [12], 81] With the use of level 3 BLAS based on fast matrix multiplication techniques the existing normwise backward error bounds for block and partitioned algorithms remain valid with appropriate increases in the constant terms [40] 2.2 Block and Partitioned Algorithms The recognition that on ....
David H. Bailey, King Lee, and Horst D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:357--371, 1991.
....with the technique analyzed in [25] which enables the product of two complex matrices to be formed using only three real matrix multiplications. Several researchers are experimenting with the use of fast BLAS3 in linear equation solvers. In particular, we mention the work of Bailey, Lee and Simon [3], who use Strassen s method for the matrix multiplications arising in the LAPACK LU factorization routine SGETRF. Our purpose in this work is to investigate the numerical stability of block algorithms that employ fast BLAS3. We restrict our attention mainly to the block algorithms used in LAPACK ....
D.H. Bailey, K. Lee and H.D. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. Supercomputing, 4 (1991), pp. 357--371.
....[15] Strassen s method is not as accurate as conventional matrix multiplication when the matrices are badly row or column scaled, but if either the matrices are already reasonably scaled or if the bad scaling is first removed, it is adequate. Thus it may be used in Level 3 BLAS implementations [25, 7]. Fourth, there is possibly a tradeoff between stability and speed in certain algorithms. Some modern parallel architectures are designed to support particular communication patterns and so may execute one algorithm, call it Algorithm A, much less efficiently than another, Algorithm B, even ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....with an average of 4.31. If a faster multiplication is desired, the most promising possibilities involve the 3M method and Strassen s method. Recent experience with Strassen s method on real matrices has shown that on certain machines it can produce useful speedups for n in the hundreds [1] [2]. If the computing environment is such that complex arithmetic is implemented very efficiently it may be best to use Strassen s method alone in complex arithmetic. For example, in experiments in Algol W on an IBM 360 67 Brent [3] found that a complex matrix multiplication took less than three ....
David H. Bailey, King Lee, and Horst D. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. Supercomputing, 4 (1991), pp. 357--371.
....[15] Strassen s method is not as accurate as conventional matrix multiplication when the matrices are badly row or column scaled, but if either the matrices are already reasonably scaled or if the bad scaling is first removed, it is adequate. Thus it may be used in Level 3 BLAS implementations [24, 7]. Fourth, there is possibly a tradeoff between stability and speed in certain algorithms. 7 Some modern parallel architectures are designed to support particular communication patterns and so may execute one algorithm (call it Algorithm A) much less efficiently than another (Algorithm B) even ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....is only faster for large matrices. In practice once is large enough so the n=2 k by n=2 k submatrices fit in fast memory, conventional matrix multiply may be used. A drawback of Strassen s method is the need for extra storage for intermediate results. It has been implemented on the Cray 2 [9, 8] and IBM 3090 [50] The conventional error bound for matrix multiplication is as follows: jf onv (A 1 B) 0A 1 Bj n 1 1 jAj 1 jBj where the absolute values of matrices and the inequality are meant componentwise. The bound for Strassen s [13, 14, 49] is kf Strassen (A 1 B) 0A 1 Bk f(n) 1 1 kAk 1 ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....enables portable application code to obtain high performance provided that an optimized library (e.g. AGZ94, KHM94] is available and affordable. Developing an optimized library, however, is a difficult and time consuming task. Even excluding algorithmic variants such as Strassen s method[BLS91] for matrix multiplication, these routines have a large design space with many parameters such as blocking sizes, loop nesting permutations, loop unrolling depths, software pipelining strategies, register allocations, and instruction schedules. Furthermore, these parameters have complicated ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....only O(7 n ) operations, compared to O(8 n ) for conventional matrix multiplication. Efficient parallel implementations of this algorithm have been described in [1, 10] This algorithm has been used for fast matrix multiplication in implementing level 3 BLAS [9] and linear algebra routines [2]. In this paper, we describe the tensor product formulation of Strassen s matrix multiplication algorithm, and discuss program generation for shared memory vector processors such as the Cray Y MP. Achieving high performance on these architectures requires operating on large vectors and reducing ....
D.H. Bailey, K. Lee, and H.D. Simon. Using Strassen's Algorithm to Accelerate the Solution of Linear Systems. Journal of Supercomputing, 4(4):357--371, Jan. 1991.
....an appropriate cutoff criterion for stopping the recursions early is crucial to obtaining competitive performance on matrices of practical size. Finally, excessive amounts of memory should not be required to store temporary results. Previous work addressing these issues can be found in [3, 5, 6, 4, 11, 12, 13, 14, 19, 28]. We note that two periods of work on Strassen s algorithm are seen here; an early period from 1969 to 1976, and a recent one from 1988 to the present. In this paper we expand on this previous work by developing detailed models of computation cost for multiple variants of Strassen s algorithm on ....
....many important aspects of Strassen s matrix multiplication algorithm for matrices of arbitrary size. The algorithm can be applied in a straightforward fashion to square matrices with order a power of 2, but issues arise for all other matrices. Earlier work addressing these issues can be found in [3, 5, 6, 4, 11, 12, 13, 14, 19, 28], but to our knowledge our development in this paper is the first to address the issues fully through both theoretical modeling and practical implementation. We begin in this section by reviewing the algorithm and discussing a framework for modeling its performance. The standard algorithm for ....
[Article contains additional citation context not shown here]
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's Algorithm to Accelerate the Solution of Linear Systems. Journal of Supercomputing, 4(5):357--371, 1990.
....182:7 to 166:8 seconds on a matrix of order n = 4028. The change would have no effect on the right looking algorithm, since in all the matrices it multiplies at least one dimension is r which was smaller than 192 in all the experiments. A similar experiment carried out by Bailey, Lee, and Simon [2] showed that Strassen s algorithm can accelerate the LAPACK s right looking LU factorization on a Cray Y MP. The largest improvements in performance, however, occured when large values of r were used. The fastest factorization of a matrix of order n = 2048, for example, was obtained with r = 512. ....
D. H. Bailey, K. Lee, and H. D. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. of Supercomputing, 4 (1990), pp. 357--371.
....numbers and the solution of dense eigenproblems, as discussed in Section 3. There has been much interest recently in the use of versions of the sequential Level 3 BLAS based on Strassen s matrix multiplication algorithm [45] as building blocks for linear algebra libraries, such as ScaLAPACK [8,25,36]. We intend to pursue the possible use of such routines in our future work. Strassen s algorithm reduces the computational complexity of multiplying two N Theta N matrices from O(N 3 ) to O(N 2:807 ) though for variants of the algorithm the exponent is even smaller. Although Strassen s ....
.... as the traditional method, it is believed to be sufficiently stable for many applications, and this view is supported by numerical experiments [36] It has been shown that on one processor of a Cray Y MP the use of Strassen s method can speed up LU factorization for moderately sized matrices [8]. However, for parallel implementations on distributed memory architectures it is not clear which is the best approach since Strassen s method favors larger block sizes, but this increases load imbalance in the panel factorization and triangular solve phases. Furthermore, a practical issue that ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:357--371, 1990.
....iterative improvement of x = Ab, once A Gamma1 has been computed, is only an O(n 2 ) process. Our use of Strassen s matrix inversion algorithm is motivated by the success of Strassen s matrix multiplication method [20] which has turned out to be useful on high performance computers [2, 4, 9, 16]. It is found that although the rounding error properties of this algorithm are less favorable than for conventional matrix multiplication, the algorithm is still useful as a Level 3 BLAS routine in many circumstances. As we shall see in x3.1, Strassen s matrix inversion algorithm is fundamentally ....
D. H. Bailey, K. Lee & H. D. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. Supercomputing 4 (1990).
....methods get acceptable answers. In other words, caveat emptor for either class of matrix matrix multiplication in real codes. Stability discussions are contained in [2] and [5] along with their references. An interesting application to solving linear systems of equations is contained in [1] and [3]. This paper is actually interested in a highly portable Level 3 BLAS interface for computing C ff Delta op(A)op(B) fi Delta C; 2) where op(X) 8 : X; X transpose; X conjugate transpose; X conjugate; and op(A) M Theta K; op(B) K Theta N; and C : M Theta N: Most of ....
D. H. Baily, K. Lee, and H. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. Supercomp., 4 (1990), pp. 357--371.
....acceptable answers. In other words, caveat emptor for either class of matrix matrix multiplication in real codes. For a more complete discussion of this issue, see [3] and [8] along with their references. An interesting application to solving linear systems of equations is contained in [1] and [4]. This paper is actually interested in a highly portable Level 3 BLAS [6] interface for computing C ff Delta op(A)op(B) fi Delta C; 2) where op(X) 8 : X; X transpose; X conjugate transpose; X conjugate; and op(A) M Theta K; op(B) K Theta N; and C : M Theta N: Most ....
D. H. Bailey, K. Lee, and H. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, J. Supercomp., 4 (1990), pp. 357--371.
....establishing an appropriate cutoff criterion for stopping the recursions early is crucial to obtaining competitive performance on matrices of practical size. Finally, excessive amounts of memory should not be required to store temporary results. Earlier work addressing these issues can be found in [2, 3, 4, 5, 8, 9, 10, 11, 17, 19]. In this paper we report on our development of a general, efficient, and portable implementation of Strassen s algorithm that is usable in any program in place of calls to DGEMM, the Level 3 BLAS matrix multiplication routine. Careful consideration has been given to all of the issues mentioned ....
....one of the matrix dimensions is smaller than the optimal cutoff of 12 for square matrices. Therefore, at least theoretically, when considering rectangular matrices, the cutoff criterion (7) should be used instead of the simpler condition, m 12 or k 12 or n 12, which has been used by others [3, 8]. Alternatively, instead of using the operation count model to predict the proper cutoff condition, one can empirically determine the appropriate cutoff in a manner very similar to the theoretical analysis. This will require a more complicated set of experiments for rectangular matrices than for ....
[Article contains additional citation context not shown here]
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's Algorithm to Accelerate the Solution of Linear Systems. Journal of Supercomputing, 4(5):357--371, 1990.
....is not just this asymptotic complexity but its reduction of the problem to smaller subproblems which eventually fit in fast memory; once the subproblems fit in fast memory standard matrix multiplication may be used. This approach has led to speedups on relatively large matrices on some machines [15]. A drawback is the need for significant workspace, and somewhat lower numerical stability, although it is adequate for many purposes [49, 104] Given the complexity of optimizing the implementation of matrix multiplication, we cannot expect all other matrix algorithms to be equally optimized on ....
D. H. Bailey, K. Lee, and H. D. Simon. Using Strassen's algorithm to accelerate the solution of linear systems. J. Supercomputing, 4:97--371, 1991.
....than the conventional scheme. It has been demonstrated that Strassen s algorithm is now practical and in fact produces real speedups for matrices with dimensions larger than about 128 [2] Further, Strassen s algorithm can be employed to accelerate a variety of linear algebra calculations [3, 9] by substituting a Strassen based matrix multiply routine for the conventional matrix multiply routine in a LAPACK [1] implementation. If a Strassen based flop count were adopted for computing the megaflops rate in the solution of a 16; 000 Theta 16; 000 linear system, the resulting rate would ....
....flop count were adopted for computing the megaflops rate in the solution of a 16; 000 Theta 16; 000 linear system, the resulting rate would have to be cut by roughly one third from the usual reckoning. In this vein, I myself must confess to citing potentially misleading performance figures in [3]. These articles include one processor Cray 2 and Y MP performance rates for some Strassen matrix routines. Following established custom, my co authors and I computed megaflops rates based on the classical flop count for matrix multiplication (2n 3 ) But since the Strassen routines can produce ....
D. H. Bailey, K. Lee, and H. D. Simon, "Using Strassen's Algorithm to Accelerate the Solution of Linear Systems", Journal of Supercomputing, vol. 4., no. 4 (Jan. 1991), p. 357 -- 371.
No context found.
D. Bailey, K. Lee, and H. Simon, Using Strassen's algorithm to accelerate the solution of linear systems, The Journal of Supercomputing #, pp. 357371, 1990. 16
No context found.
Bailey, D. H., Lee, K., and Simon, H. D. Using Strassen's algorithm to accelerate the solution of linear systems. (Manuscript), 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC