32 citations found. Retrieving documents...
C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Compile-time Composition of Run-time Data and Iteration.. - Strout, Carter, Ferrante (2003)   (4 citations)  (Correct)

.... We build on the Kelly and Pugh framework to describe the e#ects of run time data and iteration reordering transformations for locality, which include consecutive packing [6] graph partitioning [9] bucket tiling [18] lexicographical grouping [6] full sparse tiling [26] and cache blocking [7]. We show how the symbolic e#ect of a run time transformation can be propagated to relevant data mappings and dependences. Given a selected composition of run time reorderings, the resulting data mappings and dependences can be used by any subsequent choice of run time transformation. While this ....

....all data and iteration reorderings have been computed. Not shown in this example, but implemented in the experiments, it is possible to do another data reordering after the sparse tiling inspector. 4. GENERALIZED SPARSE TILING Sparse tiling techniques, full sparse tiling [26] and cache blocking [7], were developed for an important kernel used in Finite Element Methods, Gauss Seidel. Sparse tiling results in run time generated tiles or iteration slices [21] which cut across loops that only touch a subset of the total data. By performing a iteration reordering based on the sparse tiling, ....

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Combining Performance Aspects of Irregular.. - Strout, Carter.. (2002)   (1 citation)  (Correct)

....The seed partition is at the first convergence iteration. Figure 3. A visual comparison of the two sparse tiling techniques. There are two known sparse tiling techniques. Our previous work [33] developed a sparse tiling technique which in this paper we call full sparse tiling. Douglas et al. [12] described another sparse tiling technique which they refer to as cache blocking of unstructured grids. In this paper, we will refer to their technique as cache block sparse tiling. Figures 3(a) and 3(b) illustrate how the full sparse tiling and the cache block sparse tiling techniques divide the ....

....#. The schedule function provides a list of iteration points to execute for each tile at each convergence iteration. Generate tile dependence graph identifying which tiles may be executed in parallel. Either full tile growth (called serial sparse tiling in [33] or cache blocking tile growth [12] can be used to grow tiles based on an initial matrix graph partitioning. We show results for both methods. Our experiments were conducted using the IBM Blue Horizon and SUN Ultra at the San Diego Supercomputer center. Details on both machines are given in table 1. We generated three matrices by ....

[Article contains additional citation context not shown here]

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Compile-time Composition of Run-time Data and Iteration.. - Strout, Carter, Ferrante (2003)   (4 citations)  (Correct)

....when non a#ne memory references are involved. We can also express run time data and iteration reordering transformations for locality, which include consecutive packing [7] graph partitioning [12] bucket tiling [21] lexicographical grouping [7] full sparse tiling [29] and cache blocking [9]. Describing the e#ect of run time data and iteration reorderings in a compile time framework has several advantages. First, both run time and compile time transformations are uniformly described. Secondly, the transformation legality checks provide constraints on the run time reordering ....

....for reordering the iteration of the j loop. fx1 fx4 fx2 fx5 fx6 fx3 fx 1 2 3 4 5 6 7 8 j Figure 4: Example of Figure 2 mapping after the CPACK data reordering followed by a lexGroup iteration reordering. Sparse tiling programming techniques, full sparse tiling [29] and cache blocking [9], were developed for an important kernel used in Finite Element Methods, Gauss Seidel. Sparse tiling results in run time generated tiles or iteration slices [24] that cut between loops or across an outer loop and that only access a subset of the total data. By performing an iteration reordering ....

[Article contains additional citation context not shown here]

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21--40, February 2000.


Using Sparse Tiling with Symmetric Multigrid - Strout, Carter, Ferrante (2002)   (Correct)

....graph from the previous convergence iteration. f ) 2: for i = 1, 2, R 4: for all j in order where (a ij i) 1) Our previous work [21] developed a sparse tiling technique which in this paper we call sparse tiling with full tile growth or full sparse tiling. Douglas et al. [5] described a sparse tiling technique which they refer to as cache blocking of unstructured grids. In this paper, we will refer to their technique as sparse tiling with cache blocking tile growth or cache block sparse tiling. Figures 2(a) and 2(b) illustrate how the full sparse tiling and the cache ....

....that the row ordering is arbitrary for the sparse matrix in question. 3 overhead could be amortized over many calls to Gauss Seidel using the same sparse matrix. Multigrid is an example of a algorithm that calls Gauss Seidel multiple times with the same sparse matrix as input. Douglas et. al [5] showed that cache block sparse tiling helps the overall performance of multigrid, however the overhead studied was that of partitioning and reordering the meshes that underlie the matrices at each multigrid level. Only geometric multigrid methods generate a mesh for each level of multigrid. When ....

[Article contains additional citation context not shown here]

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Using Graphics Cards for Quantized FEM Computations - Rumpf, Strzodka (2001)   (7 citations)  (Correct)

....component of the data block. Therefore many typical scientific computing applications perform at about 1 of the peak processor performance. This disastrous situation is still little acknowledged. Though there are successful concepts how to significantly optimize storage and access for caching [4, 16, 2], overall performance remains far away from peak values. Unfortunately this will not change in near future, because cache sizes are miles away from reaching the size of graphics memory, and even then they would still lack the vector operations on entire data blocks. In conclusion we see that when ....

C.C. Douglas, J. Hu, M. Lowarschik, U. Rude, and C. Wei. Cache optimization for structured and unstructured multigrid. Electronic Transactions on Numerical Analysis (ETNA), 1999.


Proof of Correctness for Sparse Tiling of Gauss-Seidel - Strout, Carter, Ferrante   (Correct)

....they do not improve the temporal or inter iteration locality on the sparse matrix, because in their rescheduled code the entire sparse matrix is traversed each convergence iteration. rescheduling Increasing inter iteration locality for iterative computations on regular meshes is explored by [3], 16] 17] and [6] The only other technique to our knowledge which handles inter iteration locality for irregular meshes is unstructured cache blocking by Douglas et al. 3] They tile the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using ....

....iteration. rescheduling Increasing inter iteration locality for iterative computations on regular meshes is explored by [3] 16] 17] and [6] The only other technique to our knowledge which handles inter iteration locality for irregular meshes is unstructured cache blocking by Douglas et al.[3]. They tile the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using Gauss Seidel as a smoother. They achieve overall speedups up to 2 with 2D meshes containing 3983, 15679, and 62207 nodes on an SGI O2. They partition the mesh into cells using ....

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21-40, February 2000.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante (2001)   (2 citations)  (Correct)

....improving the intra iteration locality [4, 7, 1, 3, 6] We will be presenting a technique for improving both the intra and inter iteration locality. The only other technique, we are aware of, which accomplishes this for iterative algorithms on irregular meshes is cacheblocking by Douglas et al.[2]. On the left is an Iteration Space Graph which has been broken into tiles using the cache blocking technique described by Douglas et al. They partition the mesh using the Metis[5] partitioner; the red lines show the mesh partitioning. They then grow tiles from this partitioning forward through ....

....points. Since a single tile uses less data than the entire computation, each tile exhibits good inter and intra iteration data locality. The second phase of the new schedule has the same locality problems as the original computation. Here we show the second phase in orange. Douglas et al. [2] found that 7:19 to 33:67 of the nodes in their sample meshes had iteration points in this second phase. 3 Serial Sparse Tiling We call our technique serial sparse tiling. We also grow tiles from a partitioning, but we grow the tiles backwards through the iteration space. Notice that the ....

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante   (Correct)

....Fig. 3. Tile layers for T ile0 , T ile1 , T ile2 , and T ile3 . The tile layers for T ile0 are shaded. We refer to the runtime tiling of sparse matrix computations as sparse tiling. This paper describes and implements a serial sparse tiling, in that the resulting schedule is serial. Douglas et al. [4] describe a parallel sparse tiling for GaussSeidel. They partition the mesh and then grow tiles forward through the iteration space (in the direction of the convergence iterator) in such a way that the tiles do not depend on one another and therefore can be executed in parallel. After executing ....

....Gauss Seidel. For example, on a 3D mesh with N = 40; 687 an overall speedup of 1.17 is observed even when the overhead cost is included. This indicates that the tradeoff between raw speedup and overhead must be considered when calculating partition sizes. 4 Related Work Douglas et al. [4] does tiling on the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using Gauss Seidel as a smoother. They achieve overall speedups up to 2 with 2D meshes containing 3983, 15679, and 62207 nodes on an SGI O2. They are able to reschedule their tiles ....

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Cache-Efficient Multigrid Algorithms - Sriram Sellappa And (2001)   (4 citations)  (Correct)

....blocking covers [9] work with their framework for the Jacobi scheme. Stals and Rude [11] studied program transformations for the Red Black Gauss Seidel method. They explore blocking along one dimension for two dimensional problems, but our work involves two dimensional blocking. Douglas et al. [5] investigate cache optimizations for structured and unstructured multigrid. They focus only on the Red Black Gauss Seidel relaxation scheme, Povitsky [10] discusses a different wavefront approach to a cachefriendly algorithm to solve PDEs. Bromley et al. 4] developed a compiler module to ....

C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache optimization for structured and unstructured grid multigrid. Electronic Transactions on Numerical Analysis, 10:21--40, 2000. University of Kentucky, Louisville, KY, USA. ISSN 1068--9613.


Cache-Efficient Multigrid Algorithms (Extended Abstract) - Sellappa, al.   (Correct)

....dimension a row 5 at a time, whereas our wavefront advances in blocks along both dimensions. This difference is key in enabling stencil optimizations. They show improvements in Mflop s rate with their techniques, but do not have experimental data exploring the memory behavior. Douglas et al. [6] investigate cache optimizations for structured and unstructured multigrid. They focus only on the Red Black Gauss Seidel relaxation scheme, Their cache efficient algorithm is similar to ours, with wavefronts propagating along both dimensions. However, their subgrids are parallelogram shaped ....

C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache optimization for structured and unstructured grid multigrid. Electronic Transactions on Numerical Analysis, 10:21--40, 2000. University of Kentucky, Louisville, KY, USA. ISSN 1068--9613.


DiMEPACK - A Cache-Optimized Multigrid Library - Kowarschik, Weiß (2001)   Self-citation (Kowarschik Wei)   (Correct)

No context found.

C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, February 2000.


An Overview of Cache Optimization Techniques and Cache-Aware .. - Kowarschik, Weiß (2003)   Self-citation (Kowarschik Wei)   (Correct)

No context found.

C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, 2000.


Cache Performance Optimizations for Parallel Lattice.. - Wilke, Pohl..   Self-citation (Kowarschik Ude)   (Correct)

No context found.

C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, 2000.


A Note on Cache Memory Methods for Multigrid in Three Dimensions - Douglas, Thorne (1998)   Self-citation (Douglas)   (Correct)

No context found.

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, Elect. Trans. Numer. Anal. 10 (2000), 21-40.


Mathematical Models and Methods in Applied Sciences - World Scienti Publishing   Self-citation (Douglas)   (Correct)

No context found.

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, Elect. Trans. Numer. Anal. 10 (2000), 21-40.


Fast, Adaptively Refined Computational Elements in 3D - Douglas, Hu, Ray, Thorne..   Self-citation (Douglas Hu)   (Correct)

....allows us to experiment with di erent cache optimizations on di erent processors easily. We are exploring a number of storage schemes for optimizing cache e ects. For serial computers, modifying algorithms and data structures for standard multigrid has been investigated for a variety of problems [7, 9, 10, 17]. For adaptively re ned grids on parallel processors, there are many more possibilities than in the serial case. We are implementing many possibilities in order to see which ones work well on which parallel architectures. After approximating the solution on the grid hierarchy, there may be ....

Douglas, C.C., Hu, J., Kowarschik, M., R ude, U., and Weiss, C.: Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal. 10 (2000), 21-40.


Preprocessing Costs Of Cache Based Multigrid - Douglas, HU, al. (1998)   Self-citation (Douglas Hu)   (Correct)

No context found.

C. C. Douglas, J. Hu, M. Kowarschik, and U. Rude. Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal., 9, 2000.


Portable memory hierarchy techniques for PDE solvers.. - Douglas, Haase, Hu.. (2000)   Self-citation (Douglas Hu Kowarschik Weiss)   (Correct)

....the northwest and the northeast. A portion of the northeast border is xed with respect to x. Portions of the southeastern border and the south central border are xed with respect to x and y. All other boundary conditions are homogeneous Neumann. Results are given in Table 5. More examples are in [3] and color pictures are linked in the Kentuky Full Caches (KFCs) page, http: www.ccs.uky.edu douglas ccd kfcs.html. Acknowledgments The work presented here a truly international e ort was supported in part by the DFG Ru 422 7 1,2, NATO grant CRG 971574, and NSF grants DMS 9707040, ....

C.C. Douglas, J. Hu, M. Kowarschik, U. R ude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, ETNA, 10(2000), pp: 21-40.


Portable memory hierarchy techniques for PDE solvers - Douglas, Haase, Hu.. (2000)   Self-citation (Douglas Hu Kowarschik Weiss)   (Correct)

....aware and one using standard form, can be compared. If the norm of the di erence of the two computed solutions is not identically zero, then there is a bug in the cache aware implementation. In addition, the convergence rates of our cache aware iterative methods are identical to standard ones (see [4] and [7] We begin with a tutorial on how computers work. We continue with some simple, constant coecient, matrix free problems and methods. Finally, we end up with variable coecient, coupled PDE s on unstructured grids. While we concentrate on two dimensional problems in this article, we note ....

C.C. Douglas, J. Hu, M. Kowarschik, U. R ude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, ETNA, 10(2000), pp: 21-40.


Maximizing Cache Memory Usage for Multigrid Algorithms - Douglas, Hu, Iskandarani, .. (1999)   (3 citations)  Self-citation (Douglas Hu Kowarschik Weiss)   (Correct)

....grids (see Figure 6) can frequently be accommodated using techniques similar to structured grids. In particular, the number of graph connections in the matrices A i are usually predictable, just like in the structured grid case. Both of these cases are considered in [5] 6] and [7]. 3 Combining Multigrid Components A typical multigrid method is based on a V cycle multigrid method (see Figure 7) Implementing a W or F Cycle (or any other correction cycle) is a trivial extension. All multigrid correction algorithms are a simple combination of two distinct parts: the ....

....cache aware, the last step must use a somewhat smaller (m i ) ij than the rest of the iterations. This is because the interpolant on level i 1 is added to much larger vector, which is typically four times the size of the vector on level i. 4 Numerical Results and Conclusions In [5] 6] and [7] a collection of problems are solved on structured grids (2D and 3D) and unstructured grids (in 2D) Speedups range from 100 to 300 over using standard, well coded implementations. Reducing the number of times data passes through cache to the absolute minimum eliminates one of the major areas ....

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal., 9, 2000.


Cache-aware Multigrid Methods for Solving Poisson's.. - Weiss, Kowarschik.. (1999)   (1 citation)  Self-citation (Kowarschik)   (Correct)

....be achieved by making e orts towards cache aware implementations of these algorithms. In the case of more general situations, e.g. involving unstructured meshes, the situation is much more complicated. Nevertheless, signi cant speedups can also be gained by applying such acceleration techniques [10]. In Section 2 we consider the cache behavior of our relaxation procedure, which can be shown to be the computationally most intensive part of a multigrid cycle. We describe our data locality optimization techniques in Section 3 and present our results. In Section 4 we draw several nal ....

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 9, 2000.


Fixed and Adaptive Cache Aware Algorithms for.. - Douglas, Hu, Karl.. (2000)   (1 citation)  Self-citation (Douglas Hu Kowarschik)   (Correct)

....node in cache block i ; i 6= j. The number of relaxations possible on any node i in j is the length of the shortest path between i and any node in j , where the length of a path is the number of nodes in a path. The work required to nd the distance of every node in j from j is O(n) See [4] for a description of the algorithms. We assume that the grid has been divided into k cache blocks and that within a block the numbering is contiguous. In general, let L j i denote those nodes in block j which are distance i from j . We renumber the nodes in j , beginning with subblock L j ....

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal., 9, 2000.


Memory Characteristics of Iterative Methods - Weiss, Karl, Kowarschik, Rüde (1999)   Self-citation (Kowarschik)   (Correct)

....hundred kilobytes in size. Besides, the investigation of the cache friendly treatment of adaptively refined structured grids and even completely unstructured meshes e.g. arising in the context of finite element discretizations of partial differential equations has only recently begun [7, 14]. In all these more general cases, it is not the grid vector of the unknowns but the coefficients of the sparse matrix of the resulting linear system of equations which determines the cache behavior and eventually the performance of the code, since this is the data structure which consumes most of ....

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Submitted to Electronic Transactions on Numerical Analysis (ETNA), 1999.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3


Adaptive Finite Element/difference Method for Inverse Elastic.. - Beilina (2003)   (Correct)

No context found.

C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss, Cache optimization for structured and unstructured grid multigrid, ETNA volume 10, pp.21-40, 2000.


Adaptive hybrid FEM/FDM methods for inverse scattering problems - Beilina (2002)   (Correct)

No context found.

C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss. Cache optimization for structured and unstructured grid multigrid. ETNA volume 10, pp.21-40, (2000)


A Hybrid Method for Elastic Waves - Beilina (2003)   (Correct)

No context found.

C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss. Cache optimization for structured and unstructured grid multigrid. ETNA volume 10, pp.21-40, (2000)


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3


Algorithms + Data Structures + Transformations = Portable Program .. - Strout (2000)   (Correct)

No context found.

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000. URL: http://wwwbode.informatik.tumuenchen. de/Par/arch/cache/index.html.


Combining Performance Aspects of Irregular.. - Strout, Carter.. (2002)   (1 citation)  (Correct)

No context found.

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.


Proof of Correctness for Sparse Tiling of Gauss-Seidel - Strout, Carter, Ferrante (2003)   (Correct)

No context found.

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21-40, February 2000.


Comparison of Solvers for a Bioelectric Field Problem - Mohr (2001)   (Correct)

No context found.

C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21-40, February 2000. 23

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC