| C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3 |
.... We build on the Kelly and Pugh framework to describe the e#ects of run time data and iteration reordering transformations for locality, which include consecutive packing [6] graph partitioning [9] bucket tiling [18] lexicographical grouping [6] full sparse tiling [26] and cache blocking [7]. We show how the symbolic e#ect of a run time transformation can be propagated to relevant data mappings and dependences. Given a selected composition of run time reorderings, the resulting data mappings and dependences can be used by any subsequent choice of run time transformation. While this ....
....all data and iteration reorderings have been computed. Not shown in this example, but implemented in the experiments, it is possible to do another data reordering after the sparse tiling inspector. 4. GENERALIZED SPARSE TILING Sparse tiling techniques, full sparse tiling [26] and cache blocking [7], were developed for an important kernel used in Finite Element Methods, Gauss Seidel. Sparse tiling results in run time generated tiles or iteration slices [21] which cut across loops that only touch a subset of the total data. By performing a iteration reordering based on the sparse tiling, ....
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
....The seed partition is at the first convergence iteration. Figure 3. A visual comparison of the two sparse tiling techniques. There are two known sparse tiling techniques. Our previous work [33] developed a sparse tiling technique which in this paper we call full sparse tiling. Douglas et al. [12] described another sparse tiling technique which they refer to as cache blocking of unstructured grids. In this paper, we will refer to their technique as cache block sparse tiling. Figures 3(a) and 3(b) illustrate how the full sparse tiling and the cache block sparse tiling techniques divide the ....
....#. The schedule function provides a list of iteration points to execute for each tile at each convergence iteration. Generate tile dependence graph identifying which tiles may be executed in parallel. Either full tile growth (called serial sparse tiling in [33] or cache blocking tile growth [12] can be used to grow tiles based on an initial matrix graph partitioning. We show results for both methods. Our experiments were conducted using the IBM Blue Horizon and SUN Ultra at the San Diego Supercomputer center. Details on both machines are given in table 1. We generated three matrices by ....
[Article contains additional citation context not shown here]
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
....when non a#ne memory references are involved. We can also express run time data and iteration reordering transformations for locality, which include consecutive packing [7] graph partitioning [12] bucket tiling [21] lexicographical grouping [7] full sparse tiling [29] and cache blocking [9]. Describing the e#ect of run time data and iteration reorderings in a compile time framework has several advantages. First, both run time and compile time transformations are uniformly described. Secondly, the transformation legality checks provide constraints on the run time reordering ....
....for reordering the iteration of the j loop. fx1 fx4 fx2 fx5 fx6 fx3 fx 1 2 3 4 5 6 7 8 j Figure 4: Example of Figure 2 mapping after the CPACK data reordering followed by a lexGroup iteration reordering. Sparse tiling programming techniques, full sparse tiling [29] and cache blocking [9], were developed for an important kernel used in Finite Element Methods, Gauss Seidel. Sparse tiling results in run time generated tiles or iteration slices [24] that cut between loops or across an outer loop and that only access a subset of the total data. By performing an iteration reordering ....
[Article contains additional citation context not shown here]
C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21--40, February 2000.
....graph from the previous convergence iteration. f ) 2: for i = 1, 2, R 4: for all j in order where (a ij i) 1) Our previous work [21] developed a sparse tiling technique which in this paper we call sparse tiling with full tile growth or full sparse tiling. Douglas et al. [5] described a sparse tiling technique which they refer to as cache blocking of unstructured grids. In this paper, we will refer to their technique as sparse tiling with cache blocking tile growth or cache block sparse tiling. Figures 2(a) and 2(b) illustrate how the full sparse tiling and the cache ....
....that the row ordering is arbitrary for the sparse matrix in question. 3 overhead could be amortized over many calls to Gauss Seidel using the same sparse matrix. Multigrid is an example of a algorithm that calls Gauss Seidel multiple times with the same sparse matrix as input. Douglas et. al [5] showed that cache block sparse tiling helps the overall performance of multigrid, however the overhead studied was that of partitioning and reordering the meshes that underlie the matrices at each multigrid level. Only geometric multigrid methods generate a mesh for each level of multigrid. When ....
[Article contains additional citation context not shown here]
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
....component of the data block. Therefore many typical scientific computing applications perform at about 1 of the peak processor performance. This disastrous situation is still little acknowledged. Though there are successful concepts how to significantly optimize storage and access for caching [4, 16, 2], overall performance remains far away from peak values. Unfortunately this will not change in near future, because cache sizes are miles away from reaching the size of graphics memory, and even then they would still lack the vector operations on entire data blocks. In conclusion we see that when ....
C.C. Douglas, J. Hu, M. Lowarschik, U. Rude, and C. Wei. Cache optimization for structured and unstructured multigrid. Electronic Transactions on Numerical Analysis (ETNA), 1999.
....they do not improve the temporal or inter iteration locality on the sparse matrix, because in their rescheduled code the entire sparse matrix is traversed each convergence iteration. rescheduling Increasing inter iteration locality for iterative computations on regular meshes is explored by [3], 16] 17] and [6] The only other technique to our knowledge which handles inter iteration locality for irregular meshes is unstructured cache blocking by Douglas et al. 3] They tile the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using ....
....iteration. rescheduling Increasing inter iteration locality for iterative computations on regular meshes is explored by [3] 16] 17] and [6] The only other technique to our knowledge which handles inter iteration locality for irregular meshes is unstructured cache blocking by Douglas et al.[3]. They tile the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using Gauss Seidel as a smoother. They achieve overall speedups up to 2 with 2D meshes containing 3983, 15679, and 62207 nodes on an SGI O2. They partition the mesh into cells using ....
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21-40, February 2000.
....improving the intra iteration locality [4, 7, 1, 3, 6] We will be presenting a technique for improving both the intra and inter iteration locality. The only other technique, we are aware of, which accomplishes this for iterative algorithms on irregular meshes is cacheblocking by Douglas et al.[2]. On the left is an Iteration Space Graph which has been broken into tiles using the cache blocking technique described by Douglas et al. They partition the mesh using the Metis[5] partitioner; the red lines show the mesh partitioning. They then grow tiles from this partitioning forward through ....
....points. Since a single tile uses less data than the entire computation, each tile exhibits good inter and intra iteration data locality. The second phase of the new schedule has the same locality problems as the original computation. Here we show the second phase in orange. Douglas et al. [2] found that 7:19 to 33:67 of the nodes in their sample meshes had iteration points in this second phase. 3 Serial Sparse Tiling We call our technique serial sparse tiling. We also grow tiles from a partitioning, but we grow the tiles backwards through the iteration space. Notice that the ....
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
....Fig. 3. Tile layers for T ile0 , T ile1 , T ile2 , and T ile3 . The tile layers for T ile0 are shaded. We refer to the runtime tiling of sparse matrix computations as sparse tiling. This paper describes and implements a serial sparse tiling, in that the resulting schedule is serial. Douglas et al. [4] describe a parallel sparse tiling for GaussSeidel. They partition the mesh and then grow tiles forward through the iteration space (in the direction of the convergence iterator) in such a way that the tiles do not depend on one another and therefore can be executed in parallel. After executing ....
....Gauss Seidel. For example, on a 3D mesh with N = 40; 687 an overall speedup of 1.17 is observed even when the overhead cost is included. This indicates that the tradeoff between raw speedup and overhead must be considered when calculating partition sizes. 4 Related Work Douglas et al. [4] does tiling on the iteration space graph resulting from unstructured grids in the context of the Multigrid algorithm using Gauss Seidel as a smoother. They achieve overall speedups up to 2 with 2D meshes containing 3983, 15679, and 62207 nodes on an SGI O2. They are able to reschedule their tiles ....
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
....blocking covers [9] work with their framework for the Jacobi scheme. Stals and Rude [11] studied program transformations for the Red Black Gauss Seidel method. They explore blocking along one dimension for two dimensional problems, but our work involves two dimensional blocking. Douglas et al. [5] investigate cache optimizations for structured and unstructured multigrid. They focus only on the Red Black Gauss Seidel relaxation scheme, Povitsky [10] discusses a different wavefront approach to a cachefriendly algorithm to solve PDEs. Bromley et al. 4] developed a compiler module to ....
C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss. Cache optimization for structured and unstructured grid multigrid. Electronic Transactions on Numerical Analysis, 10:21--40, 2000. University of Kentucky, Louisville, KY, USA. ISSN 1068--9613.
No context found.
C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, February 2000.
No context found.
C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, 2000.
No context found.
C.C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21-40, 2000.
No context found.
C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, Elect. Trans. Numer. Anal. 10 (2000), 21-40.
No context found.
C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Weiss, Cache optimization for structured and unstructured grid multigrid, Elect. Trans. Numer. Anal. 10 (2000), 21-40.
....allows us to experiment with di erent cache optimizations on di erent processors easily. We are exploring a number of storage schemes for optimizing cache e ects. For serial computers, modifying algorithms and data structures for standard multigrid has been investigated for a variety of problems [7, 9, 10, 17]. For adaptively re ned grids on parallel processors, there are many more possibilities than in the serial case. We are implementing many possibilities in order to see which ones work well on which parallel architectures. After approximating the solution on the grid hierarchy, there may be ....
Douglas, C.C., Hu, J., Kowarschik, M., R ude, U., and Weiss, C.: Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal. 10 (2000), 21-40.
No context found.
C. C. Douglas, J. Hu, M. Kowarschik, and U. Rude. Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal., 9, 2000.
No context found.
C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3
No context found.
C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss, Cache optimization for structured and unstructured grid multigrid, ETNA volume 10, pp.21-40, 2000.
No context found.
C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss. Cache optimization for structured and unstructured grid multigrid. ETNA volume 10, pp.21-40, (2000)
No context found.
C. Douglas, J. Hu, M. Kowarschik, U. Rude and C. Weiss. Cache optimization for structured and unstructured grid multigrid. ETNA volume 10, pp.21-40, (2000)
No context found.
C. C. Douglas and J. H. et al. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:25-- 40, 2000. 6.3.3
No context found.
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000. URL: http://wwwbode.informatik.tumuenchen. de/Par/arch/cache/index.html.
No context found.
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R ude, and Christian Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21--40, February 2000.
No context found.
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rude, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21-40, February 2000.
No context found.
C. C. Douglas, J. Hu, M. Kowarschik, U. Rude, and C. Wei. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 10:21-40, February 2000. 23
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC