| V. Ramakrishnan, I. Sherson, and R. Subramanian. Efficient techniques for fast nested barrier synchronization. In ACM Symposium on Parallel Algorithms and Architectures, 1995. |
....and here. The thread based execution model of nested parallelism has been shown to respect the step and work complexities of the source level metrics [9,5] However, overheads and space requirements in the realization of this model require careful run time scheduling [4] fast synchronization [25], and granularity control (in the sense of [10] to make it practical. Blelloch [1] and Suciu and Tannen [34,33] have presented nested parallel languages and have argued that these languages can be implemented on the VRAM with the correct step work complexity. However, these results are based on ....
V. Ramakrishnan, I. Sherson, and R. Subramanian. Efficient techniques for fast nested barrier synchronization. In ACM Symposium on Parallel Algorithms and Architectures, 1995.
....and here. The thread based execution model of nested parallelism has been shown to respect the step and work complexities of the source level metrics [9,5] However, overheads and space requirements in the realization of this model require careful run time scheduling [4] fast synchronization [25], and granularity control (in the sense of [10] to make it practical. Blelloch [1] and Suciu and Tannen [34,33] have presented nested parallel languages and have argued that these languages can be implemented on the VRAM with the correct step work complexity. However, these results are based on ....
V. Ramakrishnan, I. Sherson, and R. Subramanian. Efficient techniques for fast nested barrier synchronization. In ACM Symposium on Parallel Algorithms and Architectures, 1995.
....to provide hardware support for all nested barriers in a data parallel program will result in a significant speedup of most data parallel applications. In this paper, two schemes are presented for supporting nested barriers using only limited hardware. Preliminary results in this area appeared in [16]. The first scheme uses two single bit trees to support any number of nested barriers. The method relies on code transformations, and it increases the code size. The second scheme uses an integer max tree, which requires more expensive hardware, to support an exponential number of nesting levels ....
V. Ramakrishnan, I. D. Scherson, and R. Subramanian. Efficient techniques for fast nested barrier synchronization. In Symposium on Parallel Algorithms and Architectures, pages 157--164, July 1995. 21
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC