#### DMCA

## Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy

### Cached

### Download Links

Venue: | ACM Trans. Math. Softw |

Citations: | 20 - 1 self |

### Citations

2282 |
Iterative Methods for Sparse Linear Systems
- Saad
- 1996
(Show Context)
Citation Context ...the memory requirements become prohibitively high and direct sparse methods are no longer feasible. Iterative methods are a remedy because only a few working vectors and the primary data are required =-=[8, 37]-=-. Two popular iterative solvers on which we will illustrate the techniques addressed in this paper are the Conjugate Gradient (CG) method (for symmetric and positive definite matrices) and the General... |

1483 |
The Algebraic Eigenvalue Problem
- Wilkinson
- 1965
(Show Context)
Citation Context ...tive Refinement The iterative refinement technique is a well known method that has been extensively studied and applied in the past. A fully detailed description of this method can be found elsewhere =-=[13, 28, 43, 51, 9]-=-. The iterative refinement approach has been used in the past to improve the accuracy of linear systems’ solutions and it is shown in Algorithm 3. Algorithm 3 The iterative refinement method for the s... |

1156 |
Accuracy and Stability of Numerical Algorithms
- Higham
- 2002
(Show Context)
Citation Context ...and only resorting to double precision at critical stages, while attempting to provide the full double precision accuracy. This technique is supported by the well known theory of iterative refinement =-=[13, 28]-=-, which has been successfully applied to the solution of dense linear systems [30]. This work is an extension of the work by Langou et al. [30] to the case of sparse linear systems, covering both dire... |

1005 |
Multi-Grid Methods and Applications
- Hackbusch
- 1985
(Show Context)
Citation Context ...n is spent on the preconditioner. For example, a simple diagonal preconditioner may not benefit from it, while a domain decomposition-based [35], block diagonal preconditioner, or a multigrid V-cycle =-=[27]-=-, may benefit. Also, multigrid-based solvers may benefit both in speed (as the bulk of the computation is in their V/W-cycles) and memory requirements. An example of successful application of this typ... |

773 | Applied Numerical Linear Algebra
- Demmel
- 1997
(Show Context)
Citation Context ...and only resorting to double precision at critical stages, while attempting to provide the full double precision accuracy. This technique is supported by the well known theory of iterative refinement =-=[13, 28]-=-, which has been successfully applied to the solution of dense linear systems [30]. This work is an extension of the work by Langou et al. [30] to the case of sparse linear systems, covering both dire... |

622 |
der Vorst. Templates for the Solution of Linear Systems
- Barret, Berry, et al.
- 1993
(Show Context)
Citation Context ...orth further exploration is to use a 17truncated version of GMRES [39]. Another interesting approach is self adaptivity [16]. Here, to do a fair comparison, we ran it for m = 25, 50 (PETSc’s default =-=[7]-=-), 100, 150, 200, and 300, and chose the best execution time. Experiments show that the mixed precision method suggested is stable in regard to changing the restart values in the inner and outer loops... |

428 |
Domain Decomposition Methods for Partial Di®erential Equations
- Quarteroni, Valli
- 1999
(Show Context)
Citation Context ...ing speed, depends on what percent of the overall computation is spent on the preconditioner. For example, a simple diagonal preconditioner may not benefit from it, while a domain decomposition-based =-=[35]-=-, block diagonal preconditioner, or a multigrid V-cycle [27], may benefit. Also, multigrid-based solvers may benefit both in speed (as the bulk of the computation is in their V/W-cycles) and memory re... |

357 | A exible inner-outer preconditioned GMRES algorithm
- Saad
- 1993
(Show Context)
Citation Context ...arithmetic. The robustness of variations of this nesting of iterative methods, also known in the 6literature as inner-outer iteration, has been studied before, both theoretically and computationally =-=[24, 36, 41, 6, 34, 50, 48]-=-. The general appeal of these methods is that computational speedup is possible when the inner solver uses an approximation to the original matrix that is also faster to apply. Moreover, even if no fa... |

305 |
The multifrontal solution of indefinite sparse symmetric linear systems
- Duff, Reid
- 1983
(Show Context)
Citation Context ...inear systems, covering both direct and iterative solvers. 2 Sparse Direct and Iterative Solvers Most sparse direct methods for solving linear systems of equations are variants of either multifrontal =-=[18]-=- or supernodal [5] factorization approaches. There is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS [1, 2, 3] as the repre... |

284 | An Introduction to Continuum Mechanics - Gurtin - 1981 |

268 | Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods - Barrett, Berry, et al. - 1994 |

266 | A fully asynchronous multifrontal solver using distributed dynamic scheduling
- Amestoy, Duff, et al.
- 2001
(Show Context)
Citation Context ...ther multifrontal [18] or supernodal [5] factorization approaches. There is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS =-=[1, 2, 3]-=- as the representative of the multifrontal approach and SuperLU [32, 14, 15, 31] for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in ... |

261 | A supernodal approach to sparse partial pivoting
- Demmel, Eisenstat, et al.
- 1999
(Show Context)
Citation Context ...is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS [1, 2, 3] as the representative of the multifrontal approach and SuperLU =-=[32, 14, 15, 31]-=- for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in both single and double precision, which is not the case for other freely availab... |

185 | Multifrontal parallel distributed symmetric and unsymmetric solvers’, Comput
- Amestoy, Duff, et al.
- 2000
(Show Context)
Citation Context ...ther multifrontal [18] or supernodal [5] factorization approaches. There is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS =-=[1, 2, 3]-=- as the representative of the multifrontal approach and SuperLU [32, 14, 15, 31] for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in ... |

152 | An unsymmetric-pattern multifrontal method for sparse LU factorization - Davis, Duff - 1997 |

144 | SuperLU DIST: A scalable distributed-memory sparse direct solver or unsymmetric linear systems - Li, Demmel - 2003 |

143 | Hybrid scheduling for the parallel solution of linear systems,
- Amestoy, Guermouche, et al.
- 2006
(Show Context)
Citation Context ...ther multifrontal [18] or supernodal [5] factorization approaches. There is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS =-=[1, 2, 3]-=- as the representative of the multifrontal approach and SuperLU [32, 14, 15, 31] for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in ... |

114 | A combined unifrontal/multifrontal method for unsymmetric sparse matrices,
- Davis, Duff
- 1999
(Show Context)
Citation Context ...h. Our main reason for selecting these two software packages is that they are implemented in both single and double precision, which is not the case for other freely available solvers such as UMFPACK =-=[19, 11, 12]-=-. Fill-ins, and the associated memory requirements, are inherent for direct sparse methods. And although there are various reordering techniques designed to minimize the amount of these fill-ins, for ... |

99 | A column pre-ordering strategy for the unsymmetric-pattern multifrontal method.
- Davis
- 2004
(Show Context)
Citation Context ...h. Our main reason for selecting these two software packages is that they are implemented in both single and double precision, which is not the case for other freely available solvers such as UMFPACK =-=[19, 11, 12]-=-. Fill-ins, and the associated memory requirements, are inherent for direct sparse methods. And although there are various reordering techniques designed to minimize the amount of these fill-ins, for ... |

95 | An asynchronous parallel supernodal algorithm for sparse Gaussian elimination.
- Gilbert, Demmel, et al.
- 1999
(Show Context)
Citation Context ...is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS [1, 2, 3] as the representative of the multifrontal approach and SuperLU =-=[32, 14, 15, 31]-=- for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in both single and double precision, which is not the case for other freely availab... |

93 | Self adapting linear algebra algorithms and software,”
- Demmel
- 2005
(Show Context)
Citation Context ...n time, and sometimes the convergence can get even worse [20]. An alternative worth further exploration is to use a 17truncated version of GMRES [39]. Another interesting approach is self adaptivity =-=[16]-=-. Here, to do a fair comparison, we ran it for m = 25, 50 (PETSc’s default [7]), 100, 150, 200, and 300, and chose the best execution time. Experiments show that the mixed precision method suggested i... |

92 | Accuracy and stability of numerical algorithms. 2nd edn - Higham - 2002 |

85 |
Computer Solution of Linear Algebraic Systems,
- Forsythe, Moler
- 1967
(Show Context)
Citation Context ...s 4 and 6 are performed in double precision. We are not aware of related work [45, 44, 22] that uses our exact approach. The error analysis for the mixed precision, iterative refinement, explained in =-=[33, 21, 23]-=-, shows that using this approach it is possible to achieve the same accuracy as if the system was solved in full double precision arithmetic, provided that the matrix is not too badly conditioned. Fro... |

81 |
GMRES: A Generalized Minimal Residual Method for Solving Nonsymmetric Linear Systems."
- Saad, Schultz
- 1986
(Show Context)
Citation Context ... techniques addressed in this paper are the Conjugate Gradient (CG) method (for symmetric and positive definite matrices) and the Generalized Minimal Residual (GMRES) method for nonsymmetric matrices =-=[38]-=-. The preconditioned versions of the two algorithms are given correspondingly in Algorithms 1 and 2 below with the descriptions that follow the standard notation [8, 37]. The preconditioners, denoted ... |

72 | Theory of inexact Krylov subspace methods and applications to scientific computing,
- Simoncini, Szyld
- 2003
(Show Context)
Citation Context ...kes PCG single to do a fixed (e.g., 0.3) relative reduction for the initial residual r0. Work on criteria to compute the (variable) number of inner iterations guaranteeing convergence can be found in =-=[40]-=-. Algorithm 5 PCG PCG ( b, xo, Etol, ... ) ... 3: PCG single ( ri−1, zi−1, NumIters, ... ) ... Algorithm 6 PCG single ( b, x, NumIters, ... ) 1: r0 = b; x0 = 0 2: for i = 1 to NumIters do ... 15: [che... |

71 | GMRESR: a family of nested GMRES methods, - Vorst, Vuik - 1994 |

68 | Inexact preconditioned conjugate gradient method with inner-outer iteration.
- Golub, Ye
- 1999
(Show Context)
Citation Context ...arithmetic. The robustness of variations of this nesting of iterative methods, also known in the 6literature as inner-outer iteration, has been studied before, both theoretically and computationally =-=[24, 36, 41, 6, 34, 50, 48]-=-. The general appeal of these methods is that computational speedup is possible when the inner solver uses an approximation to the original matrix that is also faster to apply. Moreover, even if no fa... |

64 | Flexible conjugate gradient.
- Notay
- 2000
(Show Context)
Citation Context ...arithmetic. The robustness of variations of this nesting of iterative methods, also known in the 6literature as inner-outer iteration, has been studied before, both theoretically and computationally =-=[24, 36, 41, 6, 34, 50, 48]-=-. The general appeal of these methods is that computational speedup is possible when the inner solver uses an approximation to the original matrix that is also faster to apply. Moreover, even if no fa... |

54 | Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems),”
- Langou, Luszczek, et al.
- 2006
(Show Context)
Citation Context ... the full double precision accuracy. This technique is supported by the well known theory of iterative refinement [13, 28], which has been successfully applied to the solution of dense linear systems =-=[30]-=-. This work is an extension of the work by Langou et al. [30] to the case of sparse linear systems, covering both direct and iterative solvers. 2 Sparse Direct and Iterative Solvers Most sparse direct... |

46 |
A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning.
- Axelsson, Vassilevski
- 1991
(Show Context)
Citation Context |

40 |
Matrix algorithms.
- Stewart
- 2001
(Show Context)
Citation Context ...tive Refinement The iterative refinement technique is a well known method that has been extensively studied and applied in the past. A fully detailed description of this method can be found elsewhere =-=[13, 28, 43, 51, 9]-=-. The iterative refinement approach has been used in the past to improve the accuracy of linear systems’ solutions and it is shown in Algorithm 3. Algorithm 3 The iterative refinement method for the s... |

39 | Sparse Gaussian Elimination on High Performance Computers
- Li
- 1996
(Show Context)
Citation Context ...is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS [1, 2, 3] as the representative of the multifrontal approach and SuperLU =-=[32, 14, 15, 31]-=- for the supernodal approach. Our main reason for selecting these two software packages is that they are implemented in both single and double precision, which is not the case for other freely availab... |

34 | Flexible inner-outer Krylov subspace methods
- Simonici, Szyld
(Show Context)
Citation Context |

31 | Accelerating double precision FEM simulations with GPUs.
- Goddeke, Strzodka, et al.
- 2005
(Show Context)
Citation Context ... mixed precision, iterative refinement where the most expensive steps, 1 and 5, are performed in single precision and steps 4 and 6 are performed in double precision. We are not aware of related work =-=[45, 44, 22]-=- that uses our exact approach. The error analysis for the mixed precision, iterative refinement, explained in [33, 21, 23], shows that using this approach it is possible to achieve the same accuracy a... |

31 | Self-adapting numerical software for next generation applications,
- Dongarra, Eijkhout
- 2002
(Show Context)
Citation Context ...fact, an estimate (up to the order of magnitude) of the condition number (often available from previous runs or the physical problem properties) may become an input parameter to an adaptive algorithm =-=[17]-=- that attempts to utilize the fastest hardware available, if its limited precision can guarantee convergence. Also, the methods for sparse eigenvalue problems that result in Lanczos and Arnoldi algori... |

30 |
GMRESR: a family of nested GMRES methods. Numer
- Vorst, Vuik
- 1993
(Show Context)
Citation Context ...S format, the nonzero matrix coefficients in SP, twice the outer restart size number of vectors in DP, and inner restart size number of vectors in SP. The Generalized Conjugate Residuals (GCR) method =-=[50, 49]-=- is comparable to the FGMRES and can replace it successfully as the outer iterative solver. 4 Results Overview 4.1 The Test Collection for Mixed precision Sparse Direct and Iterative Solvers We tested... |

29 |
Iterative refinement in floating point,
- Moler
- 1967
(Show Context)
Citation Context ...s 4 and 6 are performed in double precision. We are not aware of related work [45, 44, 22] that uses our exact approach. The error analysis for the mixed precision, iterative refinement, explained in =-=[33, 21, 23]-=-, shows that using this approach it is possible to achieve the same accuracy as if the system was solved in full double precision arithmetic, provided that the matrix is not too badly conditioned. Fro... |

24 | High-performance parallel implicit CFD
- Gropp, Kaushik, et al.
(Show Context)
Citation Context ...-based solvers may benefit both in speed (as the bulk of the computation is in their V/W-cycles) and memory requirements. An example of successful application of this type of approach in CFD was done =-=[25, 26]-=- in a PETSc solver, which was accelerated with a Schwartz preconditioner using blockincomplete factorizations over the separate subdomains that are stored in single precision. Regarding robustness, th... |

20 | The tortoise and the hare restart GMRES
- Embree
(Show Context)
Citation Context ...etter, because one assumes that bigger m will get closer to the full GMRES. Following this assumption though does not guarantee better execution time, and sometimes the convergence can get even worse =-=[20]-=-. An alternative worth further exploration is to use a 17truncated version of GMRES [39]. Another interesting approach is self adaptivity [16]. Here, to do a fair comparison, we ran it for m = 25, 50... |

20 |
Ecient high accuracy solutions with GMRES(m
- Turner, Walker
- 1992
(Show Context)
Citation Context ... Indeed, the mixed precision, iterative refinement can be interpreted as a preconditioned Richardson iteration with the preconditioner computed and applied (during the iterations) in single precision =-=[46]-=-. This interpretation can be further extended from Richardson to any preconditioned, iterative method. And in general, as long as the iterative method at hand is backward stable and converges, one can... |

18 | New insights in GMRES-like methods with variable preconditioners,
- Vuik
- 1995
(Show Context)
Citation Context |

16 | Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components
- Strzodka, Göddeke
- 2006
(Show Context)
Citation Context ... mixed precision, iterative refinement where the most expensive steps, 1 and 5, are performed in single precision and steps 4 and 6 are performed in double precision. We are not aware of related work =-=[45, 44, 22]-=- that uses our exact approach. The error analysis for the mixed precision, iterative refinement, explained in [33, 21, 23], shows that using this approach it is possible to achieve the same accuracy a... |

15 |
Progress in sparse matrix methods in large sparse linear systems on vector supercomputers
- Ashcraft, Grimes, et al.
- 1987
(Show Context)
Citation Context ...ring both direct and iterative solvers. 2 Sparse Direct and Iterative Solvers Most sparse direct methods for solving linear systems of equations are variants of either multifrontal [18] or supernodal =-=[5]-=- factorization approaches. There is a number of freely available packages that implement these methods. We have chosen for our tests the software package MUMPS [1, 2, 3] as the representative of the m... |

13 | Relaxation strategies for nested Krylov methods
- Eshof, Sleijpen, et al.
(Show Context)
Citation Context |

8 | DQGMRES: a direct quasi-minimal residual algorithm based on incomplete orthogonalization.
- Saad, Wu
- 1996
(Show Context)
Citation Context ...s assumption though does not guarantee better execution time, and sometimes the convergence can get even worse [20]. An alternative worth further exploration is to use a 17truncated version of GMRES =-=[39]-=-. Another interesting approach is self adaptivity [16]. Here, to do a fair comparison, we ran it for m = 25, 50 (PETSc’s default [7]), 100, 150, 200, and 300, and chose the best execution time. Experi... |

7 | Mixed precision methods for convergent iterative schemes
- Strzodka, Göddeke
- 2006
(Show Context)
Citation Context ... mixed precision, iterative refinement where the most expensive steps, 1 and 5, are performed in single precision and steps 4 and 6 are performed in double precision. We are not aware of related work =-=[45, 44, 22]-=- that uses our exact approach. The error analysis for the mixed precision, iterative refinement, explained in [33, 21, 23], shows that using this approach it is possible to achieve the same accuracy a... |

6 |
Iterative refinement and reliable computing
- Bjorck
- 1987
(Show Context)
Citation Context ...tive Refinement The iterative refinement technique is a well known method that has been extensively studied and applied in the past. A fully detailed description of this method can be found elsewhere =-=[13, 28, 43, 51, 9]-=-. The iterative refinement approach has been used in the past to improve the accuracy of linear systems’ solutions and it is shown in Algorithm 3. Algorithm 3 The iterative refinement method for the s... |

4 | Matrix algorithms, Society for Industrial and Applied Mathematics - Stewart - 2001 |

4 | D.B.: The effect of non-optimal bases on the convergence of Krylov subspace methods
- Simoncini, Szyld
- 2005
(Show Context)
Citation Context ...expect to have only a constant number of outer iterations until convergence. We note that non-constant preconditioning can be better accommodated in GMRES (next section). See also Simoncini and Szyld =-=[42]-=- for a way to interpret and theoretically study the effects of non-constant preconditioning. 3.2.2 GMRES-based Inner-Outer Iteration Methods For our outer loop, we take the flexible GMRES (FGMRES [36,... |

4 | Computations to enhance the performance while achieving the 64-bit accuracy. - Buttari, Dongarra, et al. - 2006 |

4 | Wavelet software - Cai, Li - 2007 |

2 |
concurrent issue limitations in high-performance CFD
- Latency
- 2001
(Show Context)
Citation Context ...-based solvers may benefit both in speed (as the bulk of the computation is in their V/W-cycles) and memory requirements. An example of successful application of this type of approach in CFD was done =-=[25, 26]-=- in a PETSc solver, which was accelerated with a Schwartz preconditioner using blockincomplete factorizations over the separate subdomains that are stored in single precision. Regarding robustness, th... |

2 |
Jakub Kurzak, Piotr Luszczek, and Stanmire Tomov. Computations to enhance the performance while achieving the 64-bit accuracy
- Buttari, Dongarra
- 2006
(Show Context)
Citation Context ...rices in this smaller subset were chosen in order to provide examples of all the significant features observed on the test suite. The results for all the 41 matrices in the test suite can be found in =-=[10]-=-. For the iterative sparse methods we used matrices from an adaptive 3D PDEs discretization package. Presented are results on a set of five matrices of increasing size, coming from the adaptive discre... |

1 | Flexible conjugate gradient, 2000. [31] C. Vuik. New insights in gmres-like methods with variable preconditioners - Notay |

1 | concurrent issue limitations in high-performance CFD - GROPP, KAUSHIK, et al. |

1 | of Supercomputer Applications, 1:10–30 - unknown authors - 1987 |