18 citations found. Retrieving documents...
J. Choi, J. J. Dongarra, and D. W. Walker, The design and implementation of the ScaLAPACK LU, QR, and Cholesky routines, Technical Report, ORNL/TM12470, Oak Ridge National Laboratory, September, 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj   (Correct)

....a code for quantum mechanical calculations of solids. We compare predicted against measured performance parameters for number of transfers, amount of data transferred, transfer times, and work distribution for changing problem and machine sizes. 5. 1 Cholesky Factorization Cholesky factorization [9] factors a n x n, symmetric, positive definite matrix into the product of a lower triangular matrix L and its transpose, i.e. A = LL T (or A = U T U , where U is upper triangular) It is assumed that the lower triangular 14 portion of A is stored in the lower triangle of a two dimensional ....

J. Choi, J. J. Dongarra, S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Report ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, TN, 1994. LAPACK Working Note 80.


P³T+: A Performance Estimator for Distributed and Parallel.. - Pozgaj, Fahringer (2000)   (Correct)

.... HPF DISTRIBUTE (CYCLIC, ONTO PR : A . A = 2 N DO 10 I=1,N A(I,I) SQRT(A(I,I) A(I 1:N,I) A(I 1:N,I) A(I,I) DO 20 K=I 1,N DO 20 J=I 1,N IF (K .GE. J) THEN A(K,J) A(K,J) A(K,I) A(J,I) ENDIF 20 CONTINUE 10 CONTINUE . Figure 4. 6: Cholesky factorization Cholesky factorization [16] factors a n x n, symmetric, positive definite matrix into the product of a lower triangular matrix L and its transpose, i.e. A = LL T (or A = U T U , CHAPTER 4. EXPERIMENTS 59 where U is upper triangular) It is assumed that the lower triangular portion of A is stored in the lower triangle ....

J. Choi, J. J. Dongarra, S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Report ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, TN, 1994. LAPACK Working Note 80.


Evaluation of P³T+: A Performance Estimator.. - Fahringer, Pozgaj, .. (1999)   (Correct)

....a code for quantum mechanical calculations of solids. We compare predicted against measured performance parameters for number of transfers, amount of data transferred, transfer times, and work distribution for changing problem and machine sizes. 5. 1 Cholesky Factorization Cholesky factorization [7] factors a n x n, symmetric, positive definite matrix into the product of a lower triangular matrix L and its transpose, i.e. A = LL T (or A = U T U , where U is upper triangular) It is assumed that the lower triangular portion of A is stored in the lower triangle of a two dimensional array ....

J. Choi, J. J. Dongarra, S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Report ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, TN, 1994. LAPACK Working Note 80.


Performance Bottlenecks and Potentials of Parallel.. - Yan, Du, Zhang, Zhang (1997)   (Correct)

....3 Experimental environments and applications 3. 1 Application programs We selected three programs from the NAS parallel benchmarks [2] EP, MG, and IS; four numerical applications: LU decomposition (LU) matrix multiplication (MM) merge sorting (MS) and Cholesky factorization (Cholesky) [4]; and an edge detection (ED) 12] from image processing. 6 Kernel EP (Embarrassing Parallel) represents the computation and data movement characteristics of large scale computational fluid dynamics applications. It executes 2 n iterations of a loop, where a pair of random numbers are generated ....

J. Choi, J. J. Dongarra, and D. W. Walker, The design and implementation of the ScaLAPACK LU, QR, and Cholesky routines, Technical Report, ORNL/TM-12470, Oak Ridge National Laboratory, September, 1994.


Experience with industrial applications on MIMD machines - Bjørstad   (Correct)

....with a multi layered software library design supporting a higher level application code. The ScaLAPACK software [6] is a distributed memory implementation of some important routines from the well known library LAPACK [1] The implementation is based on a set of communication subprograms BLACS, [9], 10] while the numerical kernels running on each node are the familiar BLAS [7] 8] 15] 16] The design of LAPACK addresses data locality, thus the software is well suited to modern RISC processor architectures with a memory hierarchy. In order to achieve performance one should then focus ....

J. Dongarra, R. A. van de Geijn, and R. C. Whaley, A users' guide to the blacs, tech. report, Oak Ridge National Laboratory, December 1993.


Experience with industrial applications on MIMD machines - Bjørstad   (Correct)

....Norway Figure 3: An offshore oil production platform analysis using SESAM Parallel execution can take place at a coarse level between different substructures, but also at a finer level within the Schur complement reduction of each substructure. In Europort this will be done by using ScaLAPACK [6] for the parallel execution of a set of matrix algorithms. Several previous projects carried out at Parallab have addressed these issues individually [2] 3] 4] 14] The current project will result in an integrated, two level parallel code. Such a code will be portable across a range of ....

....fine grain level of parallelism in both SESAM and SWAN on ScaLAPACK, one of the important tasks is the implementation of this package on the Parsytec machine. This approach is consistent with a multi layered software library design supporting a higher level application code. The ScaLAPACK software [6] is a distributed memory implementation of some important routines from the well known library LAPACK [1] The implementation is based on a set of communication subprograms BLACS, 9] 10] while the numerical kernels running on each node are the familiar BLAS [7] 8] 15] 16] The design of ....

J. Choi, J. Dongarra, D. W. Walker, and R. C. Whaley, Scalapack reference manual, Tech. Report ORNL/TM-12470, Oak Ridge National Laboratory, April 1994.


The Dypac System: A Dynamic Processor Allocation and.. - Schikuta (1993)   (Correct)

....because of the variety of the available parallel hardware architectures and the accompanying system software. The frameworks of the proprietary message passing packages of the different parallel hardware architecures differ considerably. A number of packages exist (e.g. Express [Parasoft90] PVM [Dongarra91][Sunderam90] Zipcode [Skjellum92] or the MPI initiative [MPIF93] which try to provide a common platform for different architectures; but generally they lack in availability, conciseness and simplicity. A quite comprehensive survey of parallel programming tools can be found in [Cheng93] In this ....

Dongarra J.J., Geist G.A., Manchek R., Sunderam V.S., A user's guide to PVM, Techn. Rep. No. ORNL/TM-11826, Oak Ridge National Laboratory, July 1991


Recent Developments in Dense Numerical Linear Algebra - Higham (2000)   Self-citation (Report)   (Correct)

....methods described later in this paper. A C translation of LAPACK is available, as well as a C wrapper for most of the Fortran version of LAPACK [49] For more about LAPACK, see the users guide [2] ScaLAPACK is a subset of LAPACK routines redesigned for distributed memory parallel computers [27], 55] ScaLAPACK routines make use of BLAS, Parallel BLAS (distributed memory versions of the level 2 and 3 BLAS) and a set of low level communication primitives called the BLACS. For solving square linear systems implementations of partitioned LU and Cholesky factorization are provided that use ....

Jaeyoung Choi, Jack J. Dongarra, Susan Ostrouchov, Antoine P. Petitet, David W. Walker, and R. Clint Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Report ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, TN, USA, September 1994. 26 pp. LAPACK Working Note 80.


The Design and Implementation of the Parallel Out-of-Core.. - D'Azevedo, Dongarra (1997)   (4 citations)  Self-citation (Dongarra Design Lu Cholesky Tm-)   (Correct)

....# n 0 is always chosen where most 8 of the computation is the updating of A 22 # A 22 # L 21 U 12 # A left looking variant results if k # n # n 0 . The in core ScaLAPACK factorization routines for LU, QR and Cholesky factorization, use a right looking variant for good load balancing [1]. Other work has shown [2, 3] that for an out of core factorization, a left looking variant generates less I O volume compared to the right looking variant. Toledo [5] shows that the recursively partitioned algorithm (k # n#2) may be more efficient than the left looking variant when a very large ....

J. CHOI, J. J. DONGARRA, L. S. OSTROUCHOV, A. P. PETITET, D. W. WALKER, AND R. C. WHALEY, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Tech. Report ORNL/TM-12470, Oak Ridge National Laboratory, 1994.


Installation Guide for ScaLAPACK - Choi, Dongarra, Ostrouchov, Petitet, .. (1995)   (2 citations)  Self-citation (Choi Dongarra Walker)   (Correct)

No context found.

J. Choi, J. J. Dongarra, and D. W. Walker, PUMMA Reference Manual, Technical Report ORNL/TM-12494, Oak Ridge National Laboratory, Mathematical Sciences Section, Oak Ridge, Tennessee, (in preparation) 1993.


An MPI Implementation of the BLACS - Deshpande, Sawyer, Walker (1996)   (1 citation)  Self-citation (Walker)   (Correct)

....paper an MPI [9] implementation of the Basic Linear Algebra Communication Subprograms (BLACS) is presented. The BLACS are message passing routines that communicate matrices among processes arranged in a twodimensional virtual process topology. It forms the basic communication layer for ScaLAPACK [2, 1]. MPI provides the most suitable message passing layer for BLACS, since it is widely available, has high level functionality to support the BLACS communication semantics as discussed in [4] and also has several advantages over other available communication libraries like PVM [5] This ....

....on the Intel Paragon, an i860 XP S22MP model with three processors per node with 64MB memory. Our MPI BLACS library was compared against the native Intel NX BLACS library for optimum block size. The Paragon performance of both these libraries is significantly less than that reported in [1] for reasons which are not known. For the optimal mesh configurations, performance of the MPI BLACS and NX BLACS version deviated only slightly, indicating that the overhead of NMPI, which is based on NX, is small. Non optimal mesh sizes revealed a much larger discrepancy in performance (see ....

J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines. Technical Report ORNL/TM-12470, Oak Ridge National Laboratory, Sept. 1994.


The Design and Implementation of the Parallel Out-of-Core.. - D'Azevedo, Dongarra (1997)   (4 citations)  Self-citation (Dongarra)   (Correct)

....0 is always chosen where most of the computation is the updating of A 22 A 22 Gamma L 21 U 12 : A left looking variant results if k = n Gamma n 0 . The in core ScaLAPACK factorization routines for LU, QR and Cholesky factorization, all use a right looking variant for good load balancing [1]. Other work has shown [2, 3] that for out of core factorization, a left looking variant generates less I O volume compared to the right looking variant. Toledo [5] shows that the recursivelypartitioned algorithm (k = n=2) may be more efficient than the left looking variant for very large matrices ....

J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Tech. Report ORNL/TM-12470, Oak Ridge National Laboratory, 1994.


An MPI Implementation of the BLACS - Deshpande, Sawyer, Walker (1996)   (1 citation)  Self-citation (Walker)   (Correct)

No context found.

Jacyoung Choi, Jack J. Dongarra, L. Susan Ostrouchov, Antoine P. Petitet, David W. Walker, and R. Clint Whaley. The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Technical Report ORNL/TM-12470, Oak Ridge National Laboratory, September 1994.


The Design and Implementation of the Parallel Out-of-Core.. - D'Azevedo, Dongarra (1997)   (4 citations)  Self-citation (Dongarra)   (Correct)

....= n 0 is always chosen where most of the computation is the updating of A 22 A 22 Gamma L 21 U 12 : A left looking variant results if k = n Gamma n 0 . The in core ScaLAPACK factorization routines for LU, QR and Cholesky factorization, use a right looking variant for good load balancing [1]. Other work has shown [2, 3] that for an out of core factorization, a left looking variant generates less I O volume compared to the right looking variant. Toledo [5] shows that the recursively partitioned algorithm (k = n=2) may be more efficient than the left looking variant when a very large ....

J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Tech. Report ORNL/TM-12470, Oak Ridge National Laboratory, 1994.


Modeling and Characterizing Parallel Computing Performance on - Heterogeneous Networks Of   (Correct)

No context found.

J. Choi, J. J. Dongarra, and D. W. Walker, The design and implementation of the ScaLAPACK LU, QR, and Cholesky routines, Technical Report, ORNL/TM12470, Oak Ridge National Laboratory, September, 1994.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (2001)   (Correct)

No context found.

J. Choi, J. J. Dongarra, S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Report ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, TN, 1994. LAPACK Working Note 80.


Development of Parallel BLAS with ARCH Object-Oriented Parallel.. - Adamo (1994)   (Correct)

No context found.

J. Choi et al. PB-BLAS Reference Manual. Oak Ridge National Laboratory, March, 1994.


High Performance Fortran Interfacing to ScaLAPACK - Lorenzo, Müller, Murakami.. (1996)   (4 citations)  (Correct)

No context found.

Jaeyoung Choi, Jack J. Dongarra, L. Susan Ostrouchov, Antoine P. Petitet, David W. Walker, and R. Clint Whaley. The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Technical Report ORNL/TM-12470, Oak Ridge National Laboratory, September 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC