| F. Bodin, E. Granston, T. Montaut, Loop Transformations to Prevent False Sharing, International Journal of Parallel Programming. |
....on the analysis of array subscripting patterns. Dependence analysis [5] is one example for parallelizing compilers, but access region analysis is also crucial for array privatization [4] communication optimization for Non Uniform Memory Access (NUMA) multiprocessors [2, 3] locality enhancement [8], and interprocedural summarization [9] Compiler modules implementing such techniques must represent the array accesses in some standard fashion. For instance, Tu and Padua [4] approximated access regions for array privatization with the triplet notation. The same notation was used in papers by ....
F. Bodin, E. Granston, T. Montaut, Loop Transformations to Prevent False Sharing, International Journal of Parallel Programming.
....cache coherence protocol, such page will move back and forth between processors. This will increase enormously the number of communications between processors. Several solutions can be applied to solve this problem such as array padding, specific loop scheduling strategy, compiler optimizations [14] or the use of specific cache coherence protocols [18] Depending on the degree of false sharing and the way the program has been parallelized, one of these optimizations is probably better than the others. In order to decide which of these optimizations has to be used, a fine analysis of this ....
E. D. Granston, T. Montaut, and F. Bodin. Loop transformations to prevent false sharing. International Journal of Parallel Programming, vol. 23(num'ero 4):263--301, August 1995.
....our case, it would be desirable to have a classification based on the class of overhead that is being reduced: for example, transformations for communication, transformations for synchronisation, etc. While there have been studies of transformations in the context of reducing particular overheads [37, 87, 158], the same transformations can be applied to reduce different sources of overheads; therefore, a strictly overheadbased classification would not result in an orthogonal classification scheme. A more general approach has been followed in [12] based on this, we identify three major classes of ....
....automatic parallelisation, where decisions are usually taken on the basis of a machine based economic model, dynamic loop mapping schemes do not provide any insight into the mapping phase. Thus, when modelling communication costs, researchers have preferred to use static schemes for mapping loops [87, 88]. 2.3.2 Mapping Loops for Distributed Memory Computers In distributed memory computers, given that communication is a dominant cost, parallelism is usually exploited by means of the data parallel paradigm (see Section 1.4.2.1) The elements of arrays are distributed among processors and each ....
E. Granston, T. Montaut, F. Bodin, "Loop Transformations to Prevent False Sharing", International Journal of Parallel Programming, 23-4, 1995, pp. 263--301; also available as Technical Report CRPC-TR95528, Center for Research on Parallel Computation, Rice University, May 1995.
....can still appear for some array leading sizes [BS95] Modifying access patterns is not sufficient to eliminate interferences. It can be combined with array data layout optimizations, array leading size padding and inter array padding. Padding has been used to align arrays on multiprocessors [GMB95, TLH90, Por89] but can also be used to align arrays in caches [BCJ94] For non scientific applications, however, such techniques are irrelevant since complex data structures are not accessed using indexes. The orthogonal approach, data distribution is better suited. Although scalars benefit from efficient ....
Elana D. Granston, Thierry Montaut, and F. Bodin. Loop transformations to prevent false sharing. International Journal of Parallel Programming, 1995.
....[BS95] Modifying access patterns is not suOEcient to eliminate interferences so it can be combined with array layout (distribution) optimizations. It is quite limited: Only array leadingsize padding and inter array padding can be used. Padding has been used to align arrays on multiprocessors [GMB95, TLH90, Por89] but has also been used to align arrays in caches [BCJ94] For non scienti c applications, however, code transformations like tiling are irrelevant, since complex structures are not accessed using indexes. The orthogonal approach, datadistribution is better suited. Although scalars bene t from ....
Elana D. Granston, Thierry Montaut, and F. Bodin. Loop transformations to prevent false sharing. International Journal of Parallel Programming, 1995.
....on the scheduling strategy employed and the information regarding this scheduling strategy that is available to the compiler. When static inter epoch scheduling is employed, optimizations such as aligning accesses across several doall loops can increase the number of redundancies that exists [GMB] Note that, even when reuse is detected, insufficient temporal locality or local storage space may prevent us from capitalizing on them. Hence, redundancy analysis is most beneficial when combined with locality enhancing program transformations. Acknowledgements We would like to thank Edward ....
Elana D. Granston, Thierry Montaut, and Fran¸cois Bodin. Loop Transformations to Prevent False Sharing. To appear in the International Journal of Parallel Programming.
No context found.
F. Bodin, E. Granston, T. Montaut, Loop Transformations to Prevent False Sharing, International Journal of Parallel Programming.
No context found.
F. Bodin, E. Granston, T. Montaut, Loop Transformations to PreventFalse Sharing, International Journal of Parallel Programming.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC