| H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization. In Proc. of the 14th ACM Int. Conf. on Supercomputing, Santa Fe, NM, May 2000. |
No context found.
H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization. In Proc. of the 14th ACM Int. Conf. on Supercomputing, Santa Fe, NM, May 2000.
No context found.
H. Yu and L. Rauchwerger. Adaptive reduction parallelization. In Proc. of the 14th ACM ICS, Santa Fe, NM, May 2000.
....we can compact the entire access pattern and keep the algorithm scalable with data size and number of processors. The cost of the scalability is that every data access will be more expensive. This implementation, especially its use in reduction optimizations has been explained in more detail in [20, 19]. 5.3 Feedback Guided Load Balancing One of the drawbacks of the R LRPD test is the requirement that the speculative loop needs to be statically block scheduled in order to commit partial work. Due to the fact that the target of our techniques are irregular codes load balancing does indeed pose ....
....main loop in BJT has been previously parallelized ( a DOALL followed by cross processor reduction) by first speculatively distributing the linked list traversal (of the circuit topology) and then using a sparse version of the LRPD test. A more detailed description of the technique is available in [20, 19]. Loop 70 in DCDCMP is fully parallel with a premature exit and has been parallelized with our techniques described in [14, 4] Loop 15 in DCDCMP (LU decomposition) is partially parallel due to the sparse nature of the circuit topology. We employ a sparse version of the R LRPD test that can ....
H. Yu and L. Rauchwerger. Adaptive reduction parallelization. In Proceedings of the 14th ACM International Conference on Supercomputing, Santa Fe, NM, May 2000.
....based on a time stamp comparison. Overall, with this support, at the end of execution, the shared array would have the last values and no copy out would have been necessary. 7 Related Work Nearly all of the past work on reduction parallelization has been based on software only transformations [8, 27]. The most related architectural work that we are aware of is the work of Larus et al. 20] Zhang et al. 28] and the work on advanced synchronization mechanisms [3, 9, 10, 16, 17, 18, 23, 24, 25, 29] Larus et al. briefly mention an idea similar to PCLR as one application of their Reconcilable ....
H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization. In Proc. 14th ACM Intl. Conf. on Supercomputing, May 2000.
....by static compiler analysis, e.g. type analysis. For example, when a reduction operation is recognized or specifically called by the program, the compiler will possibly decide between the standard parallel equivalent or histogram reductions if enough knowledge can be extracted from the code [35]. The second stage in an application s life is driven by the run time system. It starts by reading in and or sampling the input data which are relevant to the unfinished optimizations. This relevant data is analyzed with fast, approximative methods and essential characteristics are extracted. ....
....for that matter, other optimizations) that will work well in all cases. We have designed an adaptive scheme that will detect the type of reference pattern through static (compiler) and dynamic (run time) methods and choose the most appropriate scheme from a library of already implemented choices [35]. To find the best choice we establish a taxonomy of different access patterns, devise simple, fast ways to recognize them, and model the various old and newly developed reduction methods in order to find the best match. The characterization of the access pattern is performed at compile time ....
H. Yu and L. Rauchwerger. Adaptive reduction parallelization. In Proc. of the 14th ACM ICS, Santa Fe, NM, May 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC