| Mounes-Toussi F., Lilja D.J., Li Z. An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory Proceedings of the International Conference on Supercomputing, pages 75-84, Manchester U.K., July 1994. |
....these directives are only performance hints; the directory hardware is still responsible for correctness. The use of hybrid techniques where the compiler is used to eliminate unnecessary coherence actions rather than relying on it to maintain coherence is also investigated by Mounes Toussi et al. [15] for CC NUMAs. They note the critical role of (but do not develop) compiler analysis and show that if static scheduling and accurate dependence analysis were available then their coherence scheme would show marked improvement across all the programs examined they examined from the Perfect ....
....occurring on different processors e.g. W 1 i ; W 2 i refer to the the local data written on processors 1 and 2 respectively. We use an owner computes rule and static scheduling. To eliminate multiple writer false sharing arrays are padded and partitioned along page boundaries where necessary [15]. Therefore each processor writes to the same data throughout the life time of the program and no two processors write to the same pages We also assume no read write false sharing when invalidating copies and making local data exclusive. The state of a page after a particular action is dependent ....
Mounes-Toussi F., Lilja D.J., Li Z. An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory Proceedings of the International Conference on Supercomputing, pages 75-84, Manchester U.K., July 1994.
....correctness and is limited in the amount of inter epoch locality that can be exploited by the size of the tag and epoch counters, as well as by the conservative nature of the compiler analysis. Other approaches include a directory based method, Dynamic Self Invalidation [7] and hybrid techniques, [6, 9] each of which is developed for CC NUMAs. In [9] they note the critical role of (but do not develop) compiler analysis. We use an SPMD execution model, exploiting data parallelism with ownercomputes scheduling. Owner computes guarantees that the write to any page will be performed by the same ....
....locality that can be exploited by the size of the tag and epoch counters, as well as by the conservative nature of the compiler analysis. Other approaches include a directory based method, Dynamic Self Invalidation [7] and hybrid techniques, 6, 9] each of which is developed for CC NUMAs. In [9] they note the critical role of (but do not develop) compiler analysis. We use an SPMD execution model, exploiting data parallelism with ownercomputes scheduling. Owner computes guarantees that the write to any page will be performed by the same processor, and hence two or more write actions ....
Mounes-Toussi F., Lilja D.J., Li Z. An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory, Proceedings of ICS '94, July 1994.
....these directives are only performance hints; the directory hardware is still responsible for correctness. The use of hybrid techniques, where the compiler is used to eliminate unnecessary coherence actions rather than relying on it to maintain coherence, is also investigated by MounesToussi et al. [15] for CC NUMAs. They note the critical role of (but do not develop) compiler analysis and show that if static scheduling and accurate dependence analysis were available, then their coherence scheme would show marked improvement across all the programs they examined from the Perfect Benchmark suite. ....
....occurring on different processors e.g. W 1 i and W 2 i refer to the local data written on processors 1 and 2 respectively. We use an owner computes rule and static scheduling. To eliminate multiple writer false sharing, arrays are padded and partitioned along page boundaries where necessary [15]. Therefore, each processor writes to the same data throughout the life time of the program and no two processors write to the same pages. We also assume no read write false sharing when invalidating copies and making local data exclusive. The state of a page after a particular action is ....
Mounes-Toussi F., Lilja D.J., Li Z. An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory, Proc. of Inter. Conf. on Super., July 1994.
....discuss the computation of the array KILL set [13, 33, 10, 20] but active research on this and related subjects is still ongoing. Of course, array data flow analysis can be quite important when trying to improve memory referencing locality and when eliminating redundant coherence operations [26, 27, 29], but, as our simulation results show, array data flow analysis is not as important for the pointer cache overflow problem being addressed in this study. Variable privatization [8, 20] is another technique which may be used to reduce pointer thrashing to a certain degree. Privatization, of course, ....
F. Mounes-Toussi, D. J. Lilja, and Z. Li. An evaluation of a compiler optimization for improving the performance of a coherence directory. In Proc. ACM International Conference on Supercomputing, pages 75--84, July 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC