#### DMCA

## A PARALLEL SWEEPING PRECONDITIONER FOR HETEROGENEOUS 3D HELMHOLTZ EQUATIONS∗

Citations: | 4 - 3 self |

### Citations

425 |
Absorbing boundary conditions for the numerical simulation of waves.
- Engquist, Majda
- 1977
(Show Context)
Citation Context ...elmholtz matrix represented in block tridiagonal form. There are two crucial differences between the two methods: • Roughly speaking, AILU can be viewed as using Absorbing Boundary Conditions (ABC’s) =-=[13]-=- instead of PML when forming approximate subdomain auxiliary problems. While ABC’s result in strictly 2D local subproblems, versus the quasi-2D subdomain problems which result from using PML, they are... |

306 |
The multifrontal solution of indefinite sparse symmetric linear equations,
- Duff
- 1983
(Show Context)
Citation Context ... for two-dimensional problems [14, 28], there is not yet justification for three-dimensional problems. This paper therefore focuses on the second approach, which relies on multifrontal factorizations =-=[27, 34, 12, 21]-=- of the approximate auxiliary problems in order to achieve an O(γ2N4/3) setup cost and an O(γN logN) application cost, where γ(ω) denotes the number of grid points used for each Perfectly Matched Laye... |

267 | A fully asynchronous multifrontal solver using distributed dynamic scheduling:
- Amestoy, Duff, et al.
- 2001
(Show Context)
Citation Context ...es like cyclic reduction (which is a special case of a multifrontal algorithm), their straightforward application destroys the Schur complement properties that we exploit for our fast algorithm. 5Cf. =-=[1]-=-, which advocates for only distributing the root frontal matrix two-dimensionally and using a one-dimensional distribution for all other fronts. PARALLEL SWEEPING PRECONDITIONER 9 processes are used i... |

266 |
Nested dissection of a regular finite element mesh.
- George
- 1973
(Show Context)
Citation Context ... for two-dimensional problems [14, 28], there is not yet justification for three-dimensional problems. This paper therefore focuses on the second approach, which relies on multifrontal factorizations =-=[27, 34, 12, 21]-=- of the approximate auxiliary problems in order to achieve an O(γ2N4/3) setup cost and an O(γN logN) application cost, where γ(ω) denotes the number of grid points used for each Perfectly Matched Laye... |

237 | A sparse matrix arithmetic based on H-matrices. I. Introduction to H -matrices.
- Hackbusch
- 1999
(Show Context)
Citation Context ... Helmholtz operator in block tridiagonal form in a manner which exploits a radiation boundary condition. The first approach performs a block tridiagonal factorization algorithm in H-matrix arithmetic =-=[25, 22]-=-, while the second approach approximates the Schur complements of the factorization using auxiliary problems with artificial radiation boundary conditions. Though the H-matrix sweeping preconditioner ... |

201 |
Rounding Errors in Algebraic Processes.
- Wilkinson
- 1994
(Show Context)
Citation Context ...We also note that, while it is widely believed that direct inversion is numerically unstable, in [11] Druinsky and Toledo provide a review of (apparently obscure) results dating back to Wilkinson (in =-=[42]-=-) which show that x := inv(A)∗b is as accurate as a backwards stable solve if reasonable assumptions are met on the accuracy of inv(A). Since inv(A)∗b is argued to be more accurate when the columns of... |

130 | Highly scalable parallel algorithms for sparse matrix factorization
- Gupta, Karypis, et al.
- 1994
(Show Context)
Citation Context ... [35].5 More specifically, we make use of supernodal [3] elimination trees defined through nested dissection (see Figs. 2.1 and 2.2), which have been shown to result in highly scalable factorizations =-=[24, 23]-=- and moderately scalable triangular solutions [26]. Roughly speaking, the analysis in [26] shows that, if pF processes are used in the multifrontal factorization of our quasi-2D subdomain problems, th... |

123 |
The Multifrontal Method for Sparse Matrix Solution: Theory and Practice,”
- Liu
- 1992
(Show Context)
Citation Context ... for two-dimensional problems [14, 28], there is not yet justification for three-dimensional problems. This paper therefore focuses on the second approach, which relies on multifrontal factorizations =-=[27, 34, 12, 21]-=- of the approximate auxiliary problems in order to achieve an O(γ2N4/3) setup cost and an O(γN logN) application cost, where γ(ω) denotes the number of grid points used for each Perfectly Matched Laye... |

89 |
Is the pollution effect of the FEM avoidable for the helmholtz equation considering high wave numbers?
- Babuska, Sauter
- 1997
(Show Context)
Citation Context ...nt methods, it is necessary to increase the number of degrees of freedom in each direction at least linearly with the number of wavelengths spanned by the domain. In order to combat pollution effects =-=[4]-=-, which are closely related to phase errors in the discrete solution, one must use asymptotically more than a constant number of ∗This work was partially supported by the sponsors of the Texas Consort... |

81 | GMRES: A Generalized Minimal Residual Method for Solving Nonsymmetric Linear Systems." - Saad, Schultz - 1986 |

67 |
An Iterative Method for the Helmholtz Equation,”
- Bayliss, Goldstein, et al.
- 1983
(Show Context)
Citation Context ...ly damped version of the Helmholtz operator, say J ≡ [ −∆− (ω + iα) 2 c2(x) ] , (1.5) where α ≈ 2pi is responsible for the artificial damping. This is in contrast to shifted Laplacian preconditioners =-=[5, 16]-=-, where α is typically O(ω) [18], and our motivation is to avoid introducing large long-range dispersion error by damping the long range interactions in the preconditioner. Just as A refers to the dis... |

62 | On a Class of Preconditioners for Solving the Helmholtz Equation,”
- Erlangga, Vuik, et al.
- 2004
(Show Context)
Citation Context ...ly damped version of the Helmholtz operator, say J ≡ [ −∆− (ω + iα) 2 c2(x) ] , (1.5) where α ≈ 2pi is responsible for the artificial damping. This is in contrast to shifted Laplacian preconditioners =-=[5, 16]-=-, where α is typically O(ω) [18], and our motivation is to avoid introducing large long-range dispersion error by damping the long range interactions in the preconditioner. Just as A refers to the dis... |

62 |
A new implementation of sparse Gaussian elimination
- Schreiber
- 1982
(Show Context)
Citation Context |

49 | Superfast multifrontal method for large structured linear systems of equations
- Chandrasekaran, Gu, et al.
(Show Context)
Citation Context ...ith a properly tuned two-grid approach, large-scale heterogeneous 3D problems can be solved with impressive timings. There has also been a recent effort to extend the fast-direct methods presented in =-=[43]-=- from definite elliptic problems into the realm of low-to-moderate frequency time-harmonic wave equations [40, 41]. While their work has resulted in a significant constant speedup versus applying a cl... |

46 | Sweeping Preconditioner for the Helmholtz Equation: Moving Perfectly Matched Layers,” Multiscale Model.
- Engquist, Ying
- 2011
(Show Context)
Citation Context ...ω3), every linear solve required Ω(ω4) work with iterative techniques. Engquist and Ying recently introduced two classes of sweeping preconditioners for Helmholtz equations without internal resonance =-=[14, 15]-=-: Both approaches approximate a block LDLT factorization of the Helmholtz operator in block tridiagonal form in a manner which exploits a radiation boundary condition. The first approach performs a bl... |

42 |
Advances in Iterative Methods and Preconditioners for the Helmholtz Equation,”
- Erlangga
- 2008
(Show Context)
Citation Context ...requency of Eq. (1.1) not only increased the size of the linear system by at least a factor of 2d, it also doubled the number of iterations required for convergence with preconditioned Krylov methods =-=[6, 17, 18]-=-. Thus, denoting the number of degrees of freedom in a three-dimensional finite-element or finite-difference discretization as N = Ω(ω3), every linear solve required Ω(ω4) work with iterative techniqu... |

38 | MPI: a standard Message Passing Interface,
- Walker, Dongara
- 1996
(Show Context)
Citation Context ...onds to position (i mod r, bi/rc mod c) in the process grid (if the grid is constructed with a column-major ordering of the process ranks; see the left side of Fig. 2.4). Then a call to MPI_Allgather =-=[10]-=- within each row of the process grid would allow for each process to collect all of the data necessary to form xS [MC , ?], as for any process row index s ∈ {0, 1, ..., r − 1}, {i ∈ N0 : i mod r = s} ... |

38 | Why it is difficult to solve Helmholtz problems with classical iterative methods.
- Ernst, Gander
- 2012
(Show Context)
Citation Context ...requency of Eq. (1.1) not only increased the size of the linear system by at least a factor of 2d, it also doubled the number of iterations required for convergence with preconditioned Krylov methods =-=[6, 17, 18]-=-. Thus, denoting the number of degrees of freedom in a three-dimensional finite-element or finite-difference discretization as N = Ω(ω3), every linear solve required Ω(ω4) work with iterative techniqu... |

36 |
de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience
- Chan, Heimlich, et al.
- 2007
(Show Context)
Citation Context ...Under reasonable assumptions, both of these redistributions can be shown to have per-process communication volume lower bounds of Ω(n/ √ p) (if FTL is n × n) and latency lower bounds of Ω(log2( √ p)) =-=[9]-=-. We also note that translating between xS [VC , ?] and xS [VR, ?] simply requires permuting which process 12 J. POULSON ET AL. {0, 2, 4} {1, 3, 5} {0, 2, 4} {1, 3, 5} {0, 2, 4} {1, 3, 5} {0... |

32 | Multilevel preconditioners constructed from inverse-based ILUs. - Bollhofer, Saad - 2006 |

28 | Elemental: A New Framework for Distributed Memory Dense Matrix Computations.
- Poulson, Marker, et al.
- 2010
(Show Context)
Citation Context ... domain onto the PML-padded auxiliary domain simply requires individually extending each supernodal subvector by zero in the x3 direction. Consider an element-wise two-dimensional cyclic distribution =-=[30]-=- of a frontal matrix F over q processes using an r× c process grid, where r and c are O(√q). Then the (i, j) entry will be stored by the process in the (i mod r, j mod c) position in the process grid.... |

27 |
Algebraic multilevel preconditioner for the Helmholtz equation in heterogeneous media
- Bollhöfer, Grote, et al.
(Show Context)
Citation Context ...requency of Eq. (1.1) not only increased the size of the linear system by at least a factor of 2d, it also doubled the number of iterations required for convergence with preconditioned Krylov methods =-=[6, 17, 18]-=-. Thus, denoting the number of degrees of freedom in a three-dimensional finite-element or finite-difference discretization as N = Ω(ω3), every linear solve required Ω(ω4) work with iterative techniqu... |

21 | Convergence properties of block GMRES and matrix polynomials, Linear Algebra Appl
- Simoncini, Gallopoulos
- 1996
(Show Context)
Citation Context ...ach was not pursued in this paper due to the modest storage space available on Lonestar and is left for future work. Another performance improvement might come from exploiting block variants of GMRES =-=[36]-=-, which can potentially lower the number of required iterations. 2.5. Clique. In order to implement the previously discussed techniques for scalable multifrontal factorizations and solves (via selecti... |

20 |
Construction and arithmetics
- Grasedyck, Hackbusch
(Show Context)
Citation Context ... Helmholtz operator in block tridiagonal form in a manner which exploits a radiation boundary condition. The first approach performs a block tridiagonal factorization algorithm in H-matrix arithmetic =-=[25, 22]-=-, while the second approach approximates the Schur complements of the factorization using auxiliary problems with artificial radiation boundary conditions. Though the H-matrix sweeping preconditioner ... |

19 |
Communication reduction in parallel sparse Cholesky factorization on a hypercube
- George, Liu, et al.
- 1987
(Show Context)
Citation Context ...el multifrontal algorithms. While a large number of techniques exist for parallelizing multifrontal factorizations and triangular solves, we focus on parallelizations which combine subtree-to-subteam =-=[20]-=- mappings of processes to the elimination tree [34] that also make use of two-dimensional distributions of the frontal matrices [35].5 More specifically, we make use of supernodal [3] elimination tree... |

19 |
A fast direct solver for scattering problems involving elongated structures
- Martinsson, Rokhlin
- 2007
(Show Context)
Citation Context ...mplements of the factorization using auxiliary problems with artificial radiation boundary conditions. Though the H-matrix sweeping preconditioner has theoretical support for two-dimensional problems =-=[14, 28]-=-, there is not yet justification for three-dimensional problems. This paper therefore focuses on the second approach, which relies on multifrontal factorizations [27, 34, 12, 21] of the approximate au... |

17 |
On 3D Modeling of Seismic Wave Propagation Via a Structured Parallel Multifrontal Direct Helmholtz Solver,” Geophys.
- Wang, Hoop, et al.
- 2011
(Show Context)
Citation Context ...mings. There has also been a recent effort to extend the fast-direct methods presented in [43] from definite elliptic problems into the realm of low-to-moderate frequency time-harmonic wave equations =-=[40, 41]-=-. While their work has resulted in a significant constant speedup versus applying a classical multifrontal algorithm to the full 3D domain [41], their results have so far still demonstrated the same O... |

15 |
Progress in sparse matrix methods in large sparse linear systems on vector supercomputers
- Ashcraft, Grimes, et al.
- 1987
(Show Context)
Citation Context ...tree-to-subteam [20] mappings of processes to the elimination tree [34] that also make use of two-dimensional distributions of the frontal matrices [35].5 More specifically, we make use of supernodal =-=[3]-=- elimination trees defined through nested dissection (see Figs. 2.1 and 2.2), which have been shown to result in highly scalable factorizations [24, 23] and moderately scalable triangular solutions [2... |

15 |
AILU for Helmholtz problems: a new preconditioner based on the analytic parabolic factorization,”
- Gander
- 2001
(Show Context)
Citation Context ...and, doubling the subdomain sizes allows for more parallelism in both the setup and solve phases, and less sweeps seem to be required. Another closely related method is the Analytic ILU factorization =-=[19]-=-. Like the sweeping preconditioner, it uses local approximations of the Schur complements of the block LDLT factorization of the Helmholtz matrix represented in block tridiagonal form. There are two c... |

14 |
Domain-Separator Codes for the parallel solution of sparse linear systems
- Raghavan
- 2002
(Show Context)
Citation Context ...ditional computation. 2.2. Selective inversion. The lackluster scalability of dense triangular solves is well known and a scheme known as selective inversion was introduced in [32] and implemented in =-=[31]-=- specifically to avoid the issue; the approach is characterized by directly inverting every distributed dense triangular matrix which would have been solved against in a normal multifrontal triangular... |

13 |
Efficient parallel sparse triangular solution using selective inversion
- Raghavan
- 1998
(Show Context)
Citation Context ... even if they require additional computation. 2.2. Selective inversion. The lackluster scalability of dense triangular solves is well known and a scheme known as selective inversion was introduced in =-=[32]-=- and implemented in [31] specifically to avoid the issue; the approach is characterized by directly inverting every distributed dense triangular matrix which would have been solved against in a normal... |

12 | A rapidly converging domain decomposition method for the Helmholtz equation. ArXiv e-prints
- Stolk
- 2012
(Show Context)
Citation Context ...arallelization should carry over to more general wave equations in a conceptually trivial way. 1.2. Related work. A domain decomposition variant of the sweeping preconditioner was recently introduced =-=[37]-=- which results in fast convergence rates, albeit at the expense of requiring PML padding on both sides of each subdomain. Recalling our previous analysis with respect to the PML size, γ, the memory us... |

10 |
Sparse matrix factorization on massively parallel computers
- Gupta, Koric, et al.
- 2009
(Show Context)
Citation Context ... [35].5 More specifically, we make use of supernodal [3] elimination trees defined through nested dissection (see Figs. 2.1 and 2.2), which have been shown to result in highly scalable factorizations =-=[24, 23]-=- and moderately scalable triangular solutions [26]. Roughly speaking, the analysis in [26] shows that, if pF processes are used in the multifrontal factorization of our quasi-2D subdomain problems, th... |

10 |
Notes on Perfectly Matched Layers (PMLs
- Johnson
- 2010
(Show Context)
Citation Context ...approximate auxiliary problems in order to achieve an O(γ2N4/3) setup cost and an O(γN logN) application cost, where γ(ω) denotes the number of grid points used for each Perfectly Matched Layer (PML) =-=[29]-=-. While the sweeping preconditioner is competitive with existing techniques even for a single right-hand side, its main advantage is for problems with large numbers of righthand sides, as the precondi... |

7 | A high performance two dimensional scalable parallel algorithm for solving sparse triangular systems
- Joshi, Gupta, et al.
- 1997
(Show Context)
Citation Context ...3] elimination trees defined through nested dissection (see Figs. 2.1 and 2.2), which have been shown to result in highly scalable factorizations [24, 23] and moderately scalable triangular solutions =-=[26]-=-. Roughly speaking, the analysis in [26] shows that, if pF processes are used in the multifrontal factorization of our quasi-2D subdomain problems, then we must have γn = Ω(p 1/2 F ) in order to maint... |

6 |
A sweeping preconditioner for timeharmonic Maxwell’s equations with finite elements
- Tsuji, Engquist, et al.
(Show Context)
Citation Context ...weeping preconditioner and a full multifrontal factorization. PARALLEL SWEEPING PRECONDITIONER 7 that the moving PML sweeping preconditioner is equally effective for time-harmonic Maxwell’s equations =-=[38, 39]-=-, and we believe that the same will hold true for timeharmonic linear elasticity. The rest of the paper will be presented in the context of the Helmholtz equation, but we emphasize that the paralleliz... |

5 |
An Improved Two-Grid Preconditioner for the Solution of Three-Dimensional Helmholtz Problems in Heterogeneous Media,” Numer. Linear Algebra Appl.
- Calandra, Gratton, et al.
- 2013
(Show Context)
Citation Context ...LU to the O(1) iterations needed with the sweeping preconditioner (for problems without internal resonance). Two other iterative methods warrant mentioning: the two-grid shifted-Laplacian approach of =-=[8]-=- and the multilevel-ILU approach of [6]. Though both require O(ω) iterations for convergence, they have very modest memory requirements. In particular, [8] demonstrates that, with a properly tuned two... |

4 |
Scalability of Sparse Direct Solvers, Graph Theory and Sparse
- Schreiber
- 1993
(Show Context)
Citation Context ...lves, we focus on parallelizations which combine subtree-to-subteam [20] mappings of processes to the elimination tree [34] that also make use of two-dimensional distributions of the frontal matrices =-=[35]-=-.5 More specifically, we make use of supernodal [3] elimination trees defined through nested dissection (see Figs. 2.1 and 2.2), which have been shown to result in highly scalable factorizations [24, ... |

4 |
A sweeping preconditioner for Yee’s finite difference approximation of time-harmonic Maxwell’s Equations
- Tsuji, Ying
(Show Context)
Citation Context ...weeping preconditioner and a full multifrontal factorization. PARALLEL SWEEPING PRECONDITIONER 7 that the moving PML sweeping preconditioner is equally effective for time-harmonic Maxwell’s equations =-=[38, 39]-=-, and we believe that the same will hold true for timeharmonic linear elasticity. The rest of the paper will be presented in the context of the Helmholtz equation, but we emphasize that the paralleliz... |

2 |
de Hoop, Efficient scalable algorithms for hierarchically semiseparable matrices
- Wang, Li, et al.
- 2011
(Show Context)
Citation Context ...mings. There has also been a recent effort to extend the fast-direct methods presented in [43] from definite elliptic problems into the realm of low-to-moderate frequency time-harmonic wave equations =-=[40, 41]-=-. While their work has resulted in a significant constant speedup versus applying a classical multifrontal algorithm to the full 3D domain [41], their results have so far still demonstrated the same O... |

1 |
How accurate is inv(A
- Druinsky, Toledo
(Show Context)
Citation Context ... number of processes, selective inversion will be shown to yield a very large performance improvement. We also note that, while it is widely believed that direct inversion is numerically unstable, in =-=[11]-=- Druinsky and Toledo provide a review of (apparently obscure) results dating back to Wilkinson (in [42]) which show that x := inv(A)∗b is as accurate as a backwards stable solve if reasonable assumpti... |