14 citations found. Retrieving documents...
S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Portable High-Performance Programs - Frigo (1992)   (1 citation)  (Correct)

....A spawns two children B and C. The two children can reference objects in A s activation frame, but B and C do not see each other s frame. from our compiler. Other researchers have been able to reduce overheads even more, however, at the expense of portability. For example, lazy threads [68] obtains efficiency at the expense of implementing its own calling conventions, stack layouts, etc. Although we could in principle incorporate such machine dependent techniques into our compiler, we feel that Cilk 5 strikes a good balance between performance and portability. We also feel that the ....

....of basing a parallel language on C so as to leverage C compiler technology for high performance codes. Cilk is a faithful extension of C, however, supporting the simplifying notion of a C elision and allowing Cilk to exploit the C compiler technology more readily. TAM [45] and Lazy Threads [68] also analyze many of the same overhead issues in a more general, nonstrict language setting, where the individual performances of a whole host of mechanisms are required for applications to obtain good overall performance. In contrast, Cilk s multithreaded language provides an execution model ....

S. C. GOLDSTEIN,K.E.SCHAUSER, AND D. E. CULLER, Lazy threads: Implementing a fast parallel call, Journal of Parallel and Distributed Computing, 37 (1996), pp. 5--20.


Portable High-Performance Programs - Frigo (1999)   (1 citation)  (Correct)

....A spawns two children B and C. The two children can reference objects in A s activation frame, but B and C do not see each other s frame. from our compiler. 9 Other researchers have been able to reduce overheads even more, however, at the expense of portability. For example, lazy threads [68] obtains efficiency at the expense of implementing its own calling conventions, stack layouts, etc. Although we could in principle incorporate such machine dependent techniques into our compiler, we feel that Cilk 5 strikes a good balance between performance and portability. We also feel that the ....

....of basing a parallel language on C so as to leverage C compiler technology for high performance codes. Cilk is a faithful extension of C, however, supporting the simplifying notion of a C elision and allowing Cilk to exploit the C compiler technology more readily. TAM [45] and Lazy Threads [68] also analyze many of the same overhead issues in a more general, nonstrict language setting, where the individual performances of a whole host of mechanisms are required for applications to obtain good overall performance. In contrast, Cilk s multithreaded language provides an execution model ....

S. C. GOLDSTEIN, K. E. SCHAUSER, AND D. E. CULLER, Lazy threads: Implementing a fast parallel call, Journal of Parallel and Distributed Computing, 37 (1996), pp. 5--20.


Cilk: Efficient Multithreaded Computing - Randall (1998)   (3 citations)  (Correct)

....the principle no longer hold. We believe that Cilk 5 work overhead is nearly as low as possible, given our goal of generating portable C output from our compiler. Other researchers have been able to reduce overheads even more, however, at the expense of portability. For example, lazy threads [46] obtains efficiency at the expense of implementing its own calling conventions, stack layouts, etc. Although we could in principle incorporate such machine dependent techniques into our compiler, we feel that Cilk 5 strikes a good balance between performance and portability. We also feel that the ....

....of basing a parallel language on C so as to leverage C compiler technology for high performance codes. Cilk is a faithful extension of C, however, supporting the simplifying notion of a C elision and allowing Cilk to exploit the C compiler technology more readily. TAM [28] and Lazy Threads [46] also analyze many of the same overhead issues in a more general, nonstrict language setting, where the individual performances of a whole host of mechanisms are required for applications to obtain good overall performance. In contrast, Cilk s multithreaded language provides an execution model ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


The Implementation of the Cilk-5 Multithreaded Language - Frigo, Leiserson, Randall (1998)   (70 citations)  (Correct)

....the principle no longer hold. We believe that Cilk 5 work overhead is nearly as low as possible, given our goal of generating portable C output from our compiler. 7 Other researchers have been able to reduce overheads even more, however, at the expense of portability. For example, lazy threads [14] obtains efficiency at the expense of implementing its own calling conventions, stack layouts, etc. Although we could in principle incorporate such machinedependent techniques into our compiler, we feel that Cilk 5 strikes a good balance between performance and portability. We also feel that the ....

....of basing a parallel language on C so as to leverage C compiler technology for high performance codes. Cilk is a faithful extension of C, however, supporting the simplifying notion of a C elision and allowing Cilk to exploit the C compiler technology more readily. TAM [10] and Lazy Threads [14] also analyze many of the same overhead issues in a more general, nonstrict language setting, where the individual performances of a whole host of mechanisms are required for applications to obtain good overall performance. In contrast, Cilk s multithreaded language provides an execution model ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Dynamic Feedback: An Effective Technique for Adaptive Computing - Diniz, Rinard (1997)   (19 citations)  (Correct)

....overhead in sequential executions of such programs, it does not address the trade off between lock overhead and waiting overhead. The goal is simply to minimize the lock overhead. 7. 7 Parallel Function Calls Several researchers have developed efficient implementations for parallel function calls [15, 27, 30]. These implementations dynamically match the amount of exploited parallelism to the amount of parallelism available on the parallel hardware platform by selecting between an efficient sequential call and a full parallel call. The selection is based on a dynamic measure of the difference between ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Architectural Support for Thread-Level Data Speculation - Steffan, Colohan, Mowry (1997)   (11 citations)  (Correct)

.... f Try again non speculatively: x[my i] x[y[my i] g g join threads( i = N; sequence # i=0 fork threads( x[N] x[y[N] join threads( join threads( 1 N 2 7 join threads( 3 6 x[1] x[21] join threads( 4 5 x[4] x[24] x[2] x[22] x[3] x[23] x[5] x[25] x[6] x[5] x[7]=x[27] x[6] x[5] i=N Violation (c) Loop executed speculatively using recycled threads and static scheduling. i = 0; Fork to all processors, returns number of threads created: num threads = fork threads( start) start: my i = i; while (my i N) f begin speculation(my i) x[my i] ....

.... next epoch statically: my i = my i num threads; g join threads( i = N; i=0 fork threads( join threads( 4 8 12 x[N] x[y[N] join threads( i=N 1 join threads( 1 5 9 N 2 10 join threads( 3 7 11 15 6 x[1] x[21] x[2] x[22] x[3] x[23] x[4] x[24] x[5] x[25] x[6] x[5] x[7]=x[27] x[8] x[28] x[9] x[29] x[6] x[5] x[11] x[31] x[12] x[32] x[10] x[30] x[15] x[35] Violation Figure 2: Speculative execution illustrated using a while loop. a) A simple while loop. Initialize y so that there is a RAW dependence in iteration 6 of the loop: y[ f20, 21, 22, 23, ....

[Article contains additional citation context not shown here]

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Compiler Optimization of Value Communication for Thread-Level.. - Zhai (2005)   Self-citation (Goldstein)   (Correct)

No context found.

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Hardware Support for Thread-Level Speculation - Steffan (2003)   Self-citation (Goldstein)   (Correct)

No context found.

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Hardware Support for Thread-Level Speculation - Steffan (2003)   Self-citation (Goldstein)   (Correct)

No context found.

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson..   Self-citation (Culler)   (Correct)

....databaseindustry standard sorting benchmarks) it was difficult to attain a high level of performance consistently [2, 3] There have been many parallel programming environments that are aligned with our River design philosophy of run time adaptivity. Some examples include Cilk [7] Lazy Threads[23], and Multipol [12] All of these systems balance load across consumers in order to allow for highly irregular, fine grained parallel applications. The main difference between River and the systems above is the granularity of communication. Because River limits itself to I O workloads, data is ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a Fast Parallel Call. Journal of Parallel and Distributed Computing, 37(1):5--20, Aug. 1996.


Lazy Threads: Compiler and Runtime Structures for Fine-Grained.. - Goldstein (1997)   (3 citations)  Self-citation (Goldstein)   (Correct)

....0 0 0 Create lazy thread after disconnect 3 0 4 Table 4.3: Times for the primitive operations using spaghetti stacks. These times do not include any control costs. CHAPTER 4. STORAGE MODELS 76 B C A D E F G Figure 4.14: A cactus stack using stacklets. 4. 6 Stacklets Stacklets, introduced in [24], effect a different compromise between the linkedframe and stack approaches. Whereas spaghetti stacks incur extra cost when a function returns (to handle reclamation of free space) stacklets maintain a stack invariant, but incur extra cost on function call, to check for overflow. A stacklet is a ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.


Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson..   Self-citation (Culler)   (Correct)

....database industry standard sorting benchmarks) it was difficult to attain a high level of performance consistently [2, 3] There have been many parallel programming environments that are aligned with our River design philosophy of run time adaptivity. Some examples include Cilk [7] Lazy Threads [22], and Multipol [11] All of these systems balance load across consumers in order to allow for highly irregular, fine grained parallel applications. The main difference between River and the systems above is the granularity of communication. Because River limits itself to I O workloads, data is ....

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a Fast Parallel Call. Journal of Parallel and Distributed Computing, 37(1):5--20, Aug. 1996.


Engineering Parallel Symbolic Programs in GpH - Loidl, Trinder, Hammond.. (1999)   (5 citations)  (Correct)

No context found.

S.C. Goldstein, K.E. Schauser, D.E. Culler. Lazy Threads: Implementing a Fast Parallel Call. Journal of Parallel and Distributed Computing, 37(1):5-20, 1996. URL: http://http.cs.berkeley.edu/~sethg/papers/jpdc.ps.Z


Dynamic Feedback: An Effective Technique for Adaptive Computing - Diniz, Rinard (1997)   (19 citations)  (Correct)

No context found.

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, August 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC