| C.H.Bischof. Adaptive Blocking in the QR Factorization. The Journal of Supercomputing, 3:193--208, 1989. |
....track of an iteration index and two indices equal to the first and last column the processor is working on. The sizes of the blocks to be updated depend on the size of the remaining matrix and the number of processors available. Near the end of the computation, a form of adaptive blocking is used [13]. As the remaining problem size decreases, both the number of processors and the sizes of the blocks being updated are decreased. A processor that will both update and factorize a block will update only the columns it will factor, i.e. last # first # jb # 1 in Figure 11. There is no fixed ....
C. Bischof, "Adaptive Blocking in the QR Factorization," J. Supercomputing 3, 193--208 (1989).
....machine would require its own special tables. Second, we could devise an automatic installation procedure which could run just a few benchmarks and automatically produce the necessary tables. Third, we could devise algorithms which tuned themselves at run time, choosing parameters automatically [9, 10]. The choice of method depends on the degree of portability we desire; we return to this in Section 6 below. Finally, we have determined that floating point exception handling impacts efficiency. Since overflow is a fatal exception on some machines, completely portable code must avoid it at all ....
C. Bischof. Adaptive blocking in the QR factorization. J. Supercomputing, 3(3):193-- 208, 1989.
....machine would require its own special tables. Second, we could devise an automatic installation procedure which could run just a few benchmarks and automatically produce the necessary tables. Third, we could devise algorithms which tuned themselves at run time, choosing parameters automatically [9, 10]. The choice of method depends on the degree of portability we desire; we return to this in section 6 below. Finally, we have determined that floating point exception handling impacts efficiency. Since overflow is a fatal exception on some machines, completely portable code must avoid it at all ....
C. Bischof. Adaptive blocking in the QR factorization. J. Supercomputing, 3(3):193-- 208, 1989.
....research projects have also been focusing on algorithms for matrix factorizations for DMM. For example, in the mid eighties non block algorithms were discussed in [16, 17, 23, 24] For more references, see [14] The development of distributed block algorithms have started more recently (e.g. see [3, 13]) This paper is a contribution to the design, analysis, and evaluation of distributed block algorithms for some matrix factorizations which are efficient, and scalable in the sense that they preserve their maximal performance (measured in Mf lops node) when both the problem size and the number of ....
C. Bischof, "Adaptive Blocking in the QR Factorization", The Journal of Supercomputing, No. 3, Vol. 3, Kluwer Academic Publishers, (1989), pp 193-208.
....keep track of an iteration index and two indices equal to the first and last column the processor is working on. The size of the blocks to update depends on the size of the remaining matrix and the number of processors available. Near the end of the computation, a form of adaptive blocking is used [2]. As the remaining problem size decreases, both the number of processors and the sizes of the blocks being updated are decreased. A processor that will both update and factorize a block will only update the columns it will factor. There is no fixed synchronization in the algorithm; it is an ....
C. Bischof. Adaptive blocking in the QR factorization. The Journal of Supercomputing, 3:193--208, 1989.
....ERB CHGE CT92 0005) Householder reflections. Since these sequential algorithms have a high arithmetic complexity, the development of parallel algorithms is of considerable interest. Several parallel orthogonal factorization algorithms have been designed for various machines. We cite just a few: [3] for the Intel iPSC 1, 5] for the nCUBE 10, 9] for a network of transputers, 1] for the nCUBE 2, 2] for the CM 200, all of them for dense matrices; and [14] CM 2) 13] Fujitsu AP1000) 12] Cray T3D) for sparse matrices. We have implemented the Givens method with column pivoting for ....
C.H.Bischof. Adaptive Blocking in the QR Factorization. The Journal of Supercomputing, 3:193--208, 1989.
....Householder reflections and Givens rotations. Since these sequential algorithms have a high arithmetic complexity, the development of parallel algorithms is of considerable interest. Several parallel orthogonal factorization algorithms have been designed for various machines. We cite just a few: [2] for the Intel iPSC 1, 3] for the nCUBE 10, 4] for a network of transputers, 5] for the nCUBE 2, 6] for the CM 200, all of them for dense matrices; and [7] CM 2) 8] Fujitsu AP1000) 9] Cray T3D) for sparse matrices. We have implemented the MGS procedure with column pivoting for sparse ....
Bischof, C.H.: Adaptive Blocking in the QR Factorization. The Journal of Supercomputing, 3 (1989) 193--208
....only 4 3 n 3 flops, 9 times fewer [13, p. 248] In fact, on current computer architectures with steep memory hierarchies, the using the SVD may take over 15 times longer than QR decomposition. This is because the QR decomposition algorithm can be reorganized to exploit the memory hierarchy [3], but the conventional SVD algorithm is much less amenable to this reorganization. The SVD is usually computed in two phases: Phase I: Use orthogonal transformations to reduce A to an upper bidiagonal matrix: A = U 1 U 2 ) B 0 V T ; 1.3) where (U 1 U 2 ) 2 R m Thetam and V 2 R ....
C. Bischof. Adaptive blocking in the QR factorization. J. Supercomputing, 3(3):193--208, 1989.
....the size of the next panel dynamically, permitting a dynamic tradeoff of bandwidth, latency, and load balancing during the level 3 factorization. Van de Geijn utilizes the blocking of the data distribution to achieve a fixed panel size [10] Bischof has also explored variable blocking algorithms [2, 3]. The following function calls implement LU factorization within Cdense: void lufactorCmatrixlvl2( Cluinfolvl2 LU, Cmatrix B, Cvector rhs) void lufactorCmatrixlvl3( Cluinfolvl3 LU, Cmatrix B, Cvector rhs) Both single (row replicated) and multiple right hand sides are supported. The ....
Christian H. Bischof. Adaptive blocking in the QR factorization. The Journal of Supercomputing, 3(3):193--208, 1989.
....research projects have also been focusing on algorithms for matrix factorizations for DMM. For example, in the mid eighties non block algorithms were discussed in [15, 16, 22, 23] For more references, see [13] The development of distributed block algorithms have started more recently (e.g. see [3, 12]) This paper is a contribution to the design, analysis, and evaluation of distributed block algorithms for some matrix factorizations which are efficient, and scalable in the sense that they preserve their maximal performance when both the problem size and the number of processors increases. ....
C. Bischof, "Adaptive Blocking in the QR Factorization", The Journal of Supercomputing, No. 3, Vol. 3, Kluwer Academic Publishers, (1989), pp 193-208.
No context found.
C.H.Bischof. Adaptive Blocking in the QR Factorization. The Journal of Supercomputing, 3:193--208, 1989.
No context found.
Bischof, C.H.: Adaptive Blocking in the QR Factorization. The Journal of Supercomputing, 3 (1989) 193-208
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC