| Jyh-Herng Chow and Williams Ludwell Harrison. Switch-stacks: A scheme for microtasking nested parallel loops. In Supercomputing 90, pages 190 199, Nov. 1990. |
....language designed for the exploitation of coarse grain task (functional) parallelism. Concurrency is detected dynamically from data access specifications in each task. The language also supports the declaration of hierarchical tasks. Techniques to exploit nested parallelism include switch stacks [4] and process con 15 trol blocks (PCBs) 13] A PCB for a parallel loop is used to schedule the iterations of that loop. Reference [13] discusses heuristics that strike a balance between efficient allocation of PCBs versus load balancing problems that arise from barrier synchronization in nested ....
Jyh-Herng Chow and Williams Ludwell Harrison. Switch-stacks: A scheme for microtasking nested parallel loops. In Supercomputing 90, pages 190 199, Nov. 1990.
....loops, consider what happens below. We would be inclined to think that is a valid transformation. There is a problem, though, with the observation that the loops could be run in parallel, since there are, under analysis 39 First run. Second run. Sequential version. v; v; v; [1, 2, 4, 4, 5]; 1, 2, 4, 5, 5] 2, 3, 4, 5, 6] Nondeterminism Figure 5.5: Nondeterminism used for languages like C or Fortran, no cross iteration dependencies. Look at what consecutive Parallel ISETL runs of the cdoall code from the above example produces below. We see we did not preserve sequential ....
....what happens below. We would be inclined to think that is a valid transformation. There is a problem, though, with the observation that the loops could be run in parallel, since there are, under analysis 39 First run. Second run. Sequential version. v; v; v; 1, 2, 4, 4, 5] [1, 2, 4, 5, 5]; 2, 3, 4, 5, 6] Nondeterminism Figure 5.5: Nondeterminism used for languages like C or Fortran, no cross iteration dependencies. Look at what consecutive Parallel ISETL runs of the cdoall code from the above example produces below. We see we did not preserve sequential consistency under our ....
[Article contains additional citation context not shown here]
J.H. Chow and W. L. Harrison III. Switch-Stacks: A Scheme for Microtasking Nested Parallel Loops. In Proceedings of Supercomputing '90, pages 190--199, 1990.
....heap. This implementation requires efficient heap management algorithms for maximum effectiveness. Detailed analysis of such algorithms, however, is beyond the scope of this thesis. Another alternative to processor independent activation frame allocation and deallocation is the switchstack scheme [98]. The latter scheme is efficient in performing allocation and deallocation, but the depth of exploited parallelism is limited by the total number of stacks available. Techniques for implementing the cactus stack with one stack per processor, and the associated restrictions 86 in task scheduling, ....
J.-H. Chow and L. Harrison, "Switch-stacks: A scheme for microtasking nested parallel loops," in Proceedings of Supercomputing'90, pp. 190--199, November 12-16, 1990.
....of the subtree under the first branch. On the other hand, scheduling the outer parallelism would allocate space for the p branches at the top level, and then execute each subtree serially. Hence we use our algorithm without the delay to estimate the memory requirements of previous techniques like [14, 23], which schedule the outer parallelism with higher priority. Cilk uses less memory than this estimate due to its use of randomization: an idle processor steals the topmost thread (representing the outermost parallelism) from the private queue of a randomly picked processor; this thread may not ....
J. H. Chow and W. L. Harrison III. Switch-stacks: A scheme for microtasking nested parallel loops. In Proceedings of Supercomputing, pages 190--199, New York, NY, November 1990.
....the end of the iteration. Assuming that F(B,i,j) does not allocate any space, the serial execution requires O(n) space, since the space for array B is reused for each i iteration. Now consider the parallel implementation of this function on p processors, where p n. Previous scheduling systems [6, 8, 9, 14, 19, 23, 32, 36], which include both heuristicbased and provably space efficient techniques, would schedule the outer level of parallelism first. This results in all the p processors executing one i iteration each, and hence the total space allocated is O(p Delta n) Our scheduling algorithm also starts by ....
....under the first branch. On the other hand, scheduling the outer parallelism would allocate space for the p branches at the top level, with each processor executing a subtree serially. Hence we use our algorithm without the delay to estimate the memory requirements of previous techniques like [14, 23], which schedule the outer parallelism with higher priority. Cilk uses less memory than this estimate due to its use of randomization: an idle processor steals the topmost thread (representing the outermost parallelism) from the private queue of a randomly picked processor; this thread may not ....
J. H. Chow and W. L. Harrison III. Switch-stacks: A scheme for microtasking nested parallel loops. In Proc. Supercomputing, New York, NY, November 1990.
....the space for array B is reused for each i iteration. Space Efficient Scheduling of Nested Parallelism Delta 5 Now consider the parallel implementation of this function on p processors, where p n. Previous scheduling systems [Blumofe and Leiserson 1993; Burton 1988; Burton and Simpson 1994; Chow and W. L. Harrison III 1990; Goldstein et al. 1995; Hummel and Schonberg 1991; Halstead 1985; Rugguero and Sargeant 1987] which include both heuristic based and provably space efficient techniques, would schedule the outer level of parallelism first. This results in all the p processors executing one i iteration each, and ....
....of creating continuations, building structures to hold arguments, executing dummy nodes, and (de)allocating memory from a shared pool of memory, as well as the effects of cache misses and bus contention. algorithm without the delay to estimate the memory requirements of previous techniques [Chow and W. L. Harrison III 1990; Hummel and Schonberg 1991] which schedule the outer parallelism with higher priority. Cilk uses less memory than this estimate due to its use of randomization: an idle processor steals the topmost thread (representing the outermost parallelism) from the private queue of a randomly picked ....
Chow, J. H. and W. L. Harrison III 1990. Switch-stacks: A scheme for microtasking nested parallel loops. In Proc. Supercomputing, New York, NY.
....implemented with any of the above thread packages. However, its full potential can only be exploited with the kernel support similar to IRIX nanothreads. The initial version of IML is implemented on Windows NT for IA32 using threads and fibers. IML uses fibers as a replacement of Switch Stacks [4][3]. Other features of fibers are not essential to IML. A task in IML can be seen as a fine grain user level thread, and the IML scheduler implements user level context switches through task switching. Such approach would continue to be useful until user threads become easily reusable or the cost ....
....operations) routines have their own conventional multipro7 cessing runtime library. Despite IML s flexibility, BLAS3 routines re implemented with IML did not degrade in performance (Section 5.2.2) 2.1. 3 Nested Parallelism Chow and Harrison studied support for nested parallel loops [4][3], and implemented it on top of the microtasking environment on the Alliant FX 8 mini supercomputer. IML employs their Switch Stacks approach using Windows NT fibers. IML switches stacks upon entry to the parallel loop while Switch Stacks switches stacks when the initiating processor finishes its ....
[Article contains additional citation context not shown here]
J-H. Chow and L. Harrision. Switch Stacks: A scheme for microtasking nested parallel loops. In Proceedings of Supercomputing'90, 1990. Also available as CSRD Report No.1032.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC