23 citations found. Retrieving documents...
J. Hoch. Compile-time partitioning of a non-strict language into sequential threads. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 180--189, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Partitioning-Independent Paradigm for Nested Data.. - Engelhardt, Wendelborn (1995)   (5 citations)  (Correct)

....program and threads within the Omega translation. The process of determining such a mapping is analogous to the dependency graph partitioning procedure undertaken in thread generating compilers for dataflow languages. The literature describes several techniques for thread generation (e.g. [19, 11]) however for our present discussion we choose a simplistic scheme. For each function in the original program we define a thread in the abstract machine translation that is to implement it. This simplistic scheme has the advantage that the representation of data parallel operators may be made ....

J. Hoch. Compile-time partitioning of a non-strict language into sequential threads. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 180--189, 1991.


Thread Partitioning and Scheduling Based on Cost Model - Tang, al. (1997)   (Correct)

....has more constraints than for strict languages, such as avoiding deadlock. Many algorithms for thread partitioning have been proposed for functional languages. The dependence set method merges threads with the same inputs [19] while the demand set al..gorithm combines threads with the same output [16, 30]. The combination of both, together with global analysis, is reported in [34, 8] Moreover, the larger size threads can be generated by applying separation constraint partitioning plus inter procedural analysis [29] For a review of those algorithms, readers are referred to [31] In compiling for ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time partitioning of a non-strict language into sequential threads.


Design and Implementation of an Efficient Thread.. - Amaral, Gao.. (1999)   (Correct)

....merging nodes together with the same dependence set [12] The dependence set of a node is the union over the input of all its predecessors. Demand set partitioning works similarly to dependence set partitioning, but finds a correct partitioning by grouping together nodes with the same demand set [10, 23]. The demand set of a node is the union over the output of all successor nodes. The combination of dependence set partitioning with demand set partitioning, together with global analysis, is called iterated partitioning [28, 4] It combines the power of both partitioning algorithms by applying ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time partitioning of a non-strict language into sequential threads. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, Dallas, Texas, December 1--4 1993.


Compiling For Multithreaded Architectures - Tang (1999)   (1 citation)  (Correct)

....since the graph is a DAG. Finding nodes with identical dependence sets can be implemented by lexicographically sorting the dependence sets. Demand set partitioning works similarly to dependence set partitioning, but finds a correct partitioning by grouping together nodes with the same demand set [64, 112]. The demand set of a node is the union over the output of all successor nodes. The partitioning algorithm is also similar to the one used for dependence set partitioning, but traversing the graph in the reverse topological order. However both dependence set partitioning and demand set ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time partitioning of a non-strict language into sequential threads. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, Dallas, Texas, December 1--4 1993.


The Spectrum Of Thread Implementations On Hybrid Multithreaded.. - Shankar (1995)   (Correct)

....process. In stage 1, code is generated as a fine grain dataflow graph. Stage 2 involves the clustering of fine grain dataflow instructions into coarse grain nodes. Several methods have been proposed for the generation of coarse grain graphs starting from fine grained graphs [Ian88, Tra88, SCvE91, HDGS92] The schemes have been described for programs that have been written in Id, a non strict language. The partitioning of the dataflow graph involves the balancing of various issues [Ian88] These are: 1. Maximization of potential parallelism. 2. Maximization of run length of each thread. 3. ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a non-strict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


Space-Efficient Scheduling of Parallelism with.. - Blelloch, Gibbons, .. (1997)   (13 citations)  (Correct)

....oe can be as large as the work w, resulting in a running time of O(log(pd) Delta (w=p d) which is not work efficient. However, since synchronizations are expensive in any implementation, there has been considerable work in reducing the number of synchronizations using compile time analysis [19, 15, 35, 36, 25]. We plan to explore the use of such methods to improve the running time of our implementation. The implementation described for the scheduling algorithm assumes that a constant fraction of the processors are assigned to the scheduler computation, eliminating them from the work force of the ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time partitioning of a non-strict language into sequential threads. In Proc. Symposium on Parallel and Distributed Computing, Dec. 1991.


A Multi-Threaded Implementation of Nested Data-Parallelism - Engelhardt, Wendelborn (1997)   (1 citation)  (Correct)

....4.2 Thread Production The purpose of this important compilation stage is to determine a suitable division of the highly structured Adl program into unstructured AMAM threads. Initially, we have chosen a syntax directed thread scheme to determine such division. Other, more complex schemes (e.g. [14, 17]) may be considered in future. The scheme is implemented within the thread colourizer module of the Adl compiler as a walk across the program AST during which nodes are coloured to denote which object thread their implementation should appear within. The approach we adopt first grants each ....

J.E. Hoch. Compile-time partitioning of a nonstrict language into sequential threads. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 180--189, 1991.


Partitioning Non-strict Functional Languages for Multi-threaded.. - Coorg (1995)   (1 citation)  (Correct)

....Our work provides the first insight into the complexity of the problem it shows that finding an optimal partitioning (minimum number of partitions or largest possible partitions) is NP hard even for basic blocks. Developing heuristics to generate safe partitions is the focus of many papers [12, 21, 20]. These include dependence partitioning [15] based on the notion of dependence sets. Dependence sets are the natural dual of demand sets, where each node is annotated with the set of inputs that it depends on. In dependence set partitioning, nodes with the same dependence set can be merged ....

....sets, if it is possible to do so using dependence sets, but not vice versa. Figure 5 provides an example of this. The dependence sets of nodes 2 and 5 are fa; bg and fbg respectively, but it is still safe to merge them as they have the same tolerance sets. Demand sets itself were introduced in [21, 12], and the iterated partitioning algorithm (similar to algorithm 2) was introduced in [26, 12] Most of these papers did not do any global analysis for partitioning, they used the disconnecting operation used in algorithm 3 to handle conditionals and calls to user defined functions. Also, these ....

[Article contains additional citation context not shown here]

J. E. Hoch, D. M. Davenprot, V. G. Grafe, and K. M. Steele. Compile-time Partitioning of a Non-strict Language into Sequential Threads. In Proc. Symp. on Parallel and Distributed Processing, Dec 1991.


Code Generations, Evaluations, and Optimizations in Multithreaded.. - Roh (1995)   (Correct)

....Generally, the partitioning process generates threads during compile time and the scheduling process determines how to schedule these threads at either run time or compile time. Several methods that generate multithreaded code have been developed by various researchers [GN88, EG91, Ian88, SCvE91, HDGS92, Tra91, RNB93] These methods generally fall into two categories. In the top down strategy, codes are generated directly from program data dependence graphs. In the bottom up strategy, fine grain dataflow graphs are first generated, which are then partitioned into a set of threads. The 7 ....

....The programming method therefore hides the non blocking nature of EM 4. The overall effectiveness of this approach remains to be demonstrated. 2.2. 2 Bottom up Code Generation Many bottom up code generation strategies based on partitioning of dataflow graphs have been developed [Ian88, SCvE91, HDGS92, Tra91] Most of these schemes generate threads for programs written in Id [NA92] a non strict language. Iannucci [Ian88] describes the dependence sets method of graph partitioning as well as the conditions that must be satisfied for a correct and efficient partitioning of a dataflow graph. In ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a non-strict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


Architecture-Dependent Partitioning of Dependence Graphs - Beck, Zehendner, Ungerer (1997)   (Correct)

....from nodes with a different annotation end in that node. If a node is not added, the subgraph starting with this node is cut and the node itself is a starting point for a new traversal, generating a new thread. The algorithm terminates when all instructions are assigned to threads. Hoch et al. [9] enhance Iannucci s algorithm by a further criterion for thread fusion. The goals of their partitioning algorithm are the maximization of the thread length and the minimization of the synchronization between threads. In addition to Iannucci s annotations, all starting nodes of dynamic arcs are ....

e. a. Hoch, J. E. Compile-time partitioning of a nonstrict language into seqential threads. In Proc. 3rd Symp. on Parallel and Distributed Processing, 1993.


How Much Non-strictness do Lenient Programs Require? - Schauser, Goldstein (1995)   (1 citation)  (Correct)

....cost for context switching. Much research has been devoted to analysis techniques that determine when the full generality of non strictness is not required. These techniques include strictness analysis [Pey87] backwards analysis [Hug88b] path analysis [BH87] and partitioning [Tra91, SCvE91, HDGS91] For lenient languages the compilation approach is to schedule instructions statically into sequential threads and have dynamic scheduling only between threads. The task of identifying portions of the program that can be scheduled statically and ordered into threads is called partitioning ....

....[Tra91] Two operations can be placed into the same thread only if the compiler can statically determine the order in which they execute. In [Sch94] we presented a new thread partitioning algorithm, separation constraint partitioning, which improves substantially on previous work [Tra91, HDGS91, SCvE91] To evaluate our partitioning algorithm, we compared benchmark programs compiled using our algorithm and using strict partitioning. Strict partitioning ignores possible non strictness and compiles functions and conditionals strictly, thus representing the best possible partitioning. ....

[Article contains additional citation context not shown here]

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time Partitioning of a Non-strict Language into Sequential Threads. In Proc. Symp. on Parallel and Distributed Processing, Dec. 1991.


Partitioning a Lenient Parallel Language into Sequential Threads - Sangho Ha (1995)   (Correct)

....in a large scale parallel system since it allows split phase memory operations and fast context switching between computations without blocking the processor. A thread is usually defined by a sequence of instructions. Its execution may be paused or interrupted as [3,4] may be nonpreemptive[1,5,6,7,8], or may be executed by interleaving among threads[9,10] according to execution model. A lenient parallel language Id[11] supports nonstrict control constructs or data structures for synchronization, thus allowing massively fine grain parallelism. There are several approaches to generate ....

....among threads[9,10] according to execution model. A lenient parallel language Id[11] supports nonstrict control constructs or data structures for synchronization, thus allowing massively fine grain parallelism. There are several approaches to generate multithreaded codes from Id programs[1,4,5,6,7,12,13]. It is an important issue how to partition the dataflow program graph, which is an intermediate form of Id programs, to satisfy the following criteria: maximization of parallelism and run length, and minimization of synchronization costs. Nikhil[14] doesn t aggregate nodes in dataflow graphs. ....

[Article contains additional citation context not shown here]

J.E. Hoch, et al, "Compile-time Partitioning of a Non-strict Language into Sequential Threads", Int'l Proc. of the 3rd IEEE Sympo. on Parallel and Distributed Processing, pp.180-189, 1991.


An Evaluation of Optimized Threaded Code Generation - Roh, Najjar, Shankar, Böhm (1994)   (3 citations)  (Correct)

....code generation schemes for nonblocking threads is to generate as large a thread as possible [6] on the premise that the thread is not going to be too large, due to several constraints imposed by the execution model. Two approaches to thread generation have been proposed: the bottom up method [7, 8, 9, 10] starts with a fine grain dataflow graph and then coalesce instructions into clusters (threads) the top down method [11, 12, 13] generates threads directly from the compiler s intermediate data dependence graph form. In this study, we have combined the two approaches in which we initially ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a non-strict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


A Common Intermediate Language and its Use in.. - Ariola, Massey, Sami..   (Correct)

....that some latter task is not responsible for providing a binding which some earlier task needs to proceed. This problem of static scheduling to increase task granularity while avoiding deadlock is a main component of compilers for executing these languages on high performance multiprocessors [11, 16, 19, 23, 24, 30]. Comparative analysis of different partitioning techniques is best achieved by first developing an intermediate language common to the DSA language family. A common language allows us to accurately compare currently proposed analyses, as well as develop a single, unified analysis technique. The ....

....because of the prevalence of logic variables, which can cause hidden cycles through aliasing. 4 Non strict functional languages have analogous problems with hidden cycles through I structures. Several researchers have explored the problem of thread partitioning of non strict functional languages [11, 16, 23, 24, 30]. While most of the techniques produce threads within a function and in isolation from the rest of the program, the analysis of Traub et al. 31] referred to in this paper as DD analysis, attempts to improve the thread size within a function by propagating global dependence information. The ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-Time Partitioning of a NonStrict Language into Sequential Threads. In Symposium on Parallel and Distributed Processing, pages 180--189. Dallas, IEEE Computer Society Press, December 1991.


Compilation of Concurrent Declarative Languages - Ariola, Sami, Massey, Tick (1995)   (Correct)

....that some latter task will not provide a binding which some earlier task needs to proceed. This problem of static scheduling to increase task granularity while avoiding deadlock is a main component of the work of compilation for the execution of these languages on high performance multiprocessors [15, 12, 22, 18]. Comparative analysis of different partitioning techniques is best achieved by first developing an intermediate language common to the DSA language family. A common language allows us to accurately compare currently proposed analyses, as well as develop a single, unified analysis technique. The ....

....of the prevalence of logic variables, which can cause hidden cycles through aliasing. 3 Non strict functional languages have analogous problems with hidden cycles through I structures. Several researchers have explored the problem of thread partitioning of non strict functional languages [12, 15, 22, 30]. While most of the techniques produce threads within a function and in isolation from the rest of the program, the analysis of Traub et al. 30] referred to in this paper as Dependence Demand (DD) analysis, attempts to improve the thread size within a function by propagating global dependence ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-Time Partitioning of a NonStrict Language into Sequential Threads. In Symposium on Parallel and Distributed Processing, pages 180--189. Dallas, IEEE Computer Society Press, December 1991.


Generation, Optimization and Evaluation of Multi-Threaded.. - Roh, Najjar, Shankar, Böhm   (Correct)

....by the mapping heuristics. Girkar and Polychronopoulos [12] describe a method to exploit parallelism across the loop and function boundaries through the use of a hierarchical intermediate level graph. Several projects use bottom up, multithreaded code generation strategies based on dataflow graphs [16, 32, 13, 37, 27]. Most of these schemes generate sequential threads for programs written in Id [23] a non strict language. The non strict semantics of Id requires a more careful partitioning strategy than what is required under the strict semantics of Sisal [18] to avoid deadlock [21, 27] The direct bottom up ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a non-strict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


Partitioning Non-strict Languages for Multi-threaded Code Generation - Coorg (1994)   (3 citations)  (Correct)

....This is a global approach; we need information about the whole program to apply this technique. We call this approach the Dependence Analysis approach. Another less theoretical, more pragmatic approach (called Demand and Dependence (DD) partitioning) started in [21] progressed through [36, 16, 38] and culminated in [37] The approach began as a local approach. That is, the initial algorithms worked as follows: look at small regions in programs and see whether they can be sequentialized. This local analysis was extended to propagate local information to the entire program in [37] 1.3.1 ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele, Compile-time partitioning of a non-strict language into sequential threads, In Proc. 3rd Symp. on Par. and Dist. Processing, IEEE, Dec 1991.


Generation and Quantitative Evaluation of Dataflow Clusters - Roh, Najjar, Böhm (1993)   (3 citations)  (Correct)

.... sets method [Ian88, SCvE91, TCS92] The fine grain code used is the Manchester Dataflow Machine code [GKW85] generated from programs written in Sisal [MSA 85] Similar code generation strategies for the Id language based on the Tagged Token Dataflow Architecture have been described in [SCvE91, HDGS92, Tra91] Second, we evaluate the speedup that is achievable in the coarse grain code over the fine grain counterpart through a comprehensive simulation of a variety of benchmarks on various realistic machine models. Third, the results are used to gauge the effectiveness of various architectural ....

....a variety of machine models obtained from the simulation of a large set of benchmarks are presented and discussed in Section 4. Concluding remarks are provided in Section 5. 2 Previous Work Multi threaded code generation strategies based on dataflow graphs have been described in [Ian88, SCvE91, HDGS92, Tra91] These schemes generate sequential threads for programs written in Id [NA90] a non strict language. The nonstrict semantics of Id requires a more careful partitioning strategy than would be required under the strict semantics of Sisal. Iannucci [Ian88] describes the dependence sets ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a nonstrict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


An Evaluation of Medium-Grain Dataflow Code - Najjar, Roh, Böhm (1994)   (1 citation)  (Correct)

....storage [32, 9] has many advantages: reduced overheads, improved performance through shorter critical path in sequential code, and exploitation of intra thread locality. Several methods that generate medium coarse grain code for these execution models have been developed by various researchers [14, 12, 21, 40, 20, 42]. These methods generally fall into two categories. In the top down strategy, the code is generated directly from a program data dependence graph. In the bottom up strategy, fine grain dataflow code is first generated, and is then partitioned into a set of tasks or threads. In this paper, several ....

....mapping heuristics. In [15] Girkar and Polychronopoulos describe a method to exploit parallelism across the loop and function boundaries through the use of a hierarchical intermediate level graph. Bottom up, multi threaded code generation strategies based on dataflow graphs have been described in [21, 40, 20, 42]. These schemes generate sequential threads for programs written in Id [30] a non strict language. The non strict semantics of Id requires a more careful partitioning strategy than would be required under the strict semantics of Sisal [23] The bottom up code generation is equivalent to a graph ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile time partitioning of a non-strict language into sequential threads. Technical report, Sandia National Laboratory, 1992.


Partitioning Non-strict Functional Languages for.. - Satyan Coorg (1995)   (1 citation)  (Correct)

....Our work provides the first insight into the complexity of the problem it shows that finding an optimal partitioning (minimum number of partitions or largest possible partitions) is NP hard even for basic blocks. Developing heuristics to generate safe partitions is the focus of many papers [8, 17, 16]. Our technique of using demand and tolerance sets subsume the various heuristics suggested in these papers. Most of these papers did not do any global analysis for partitioning, they used the disconnecting operation used in algorithm 3 to handle conditionals and calls to user defined functions. ....

J. E. Hoch, D. M. Davenprot, V. G. Grafe, and K. M. Steele. Compile-time Partitioning of a Non-strict Language into Sequential Threads. In Proc. Symp. on Parallel and Distributed Processing, Dec 1991.


Architecture-Dependent Partitioning of Dependence Graphs - Beck, Zehendner, Ungerer (1997)   (Correct)

....from the same thread are stored in registers instead of writing them back to memory. These registers may be referenced by any succeeding instruction in the thread. The next section describes and analyses the different strategies for dependence graph partitioning due to Iannucci [7] Hoch et al. [6], and Schauser [12, 13] which are predecessors and presuppositions of our own architecture dependent partitioning method, presented in section 3. The architecture dependent partitioning algorithm is a heuristic for determining a cost efficient solution that is based on an architecture dependent ....

....nodes with a different annotation end in that node. If a node is not added, the subgraph starting at this node is cut off, and the node itself becomes a starting point for a new traversal, generating a new thread. The algorithm terminates when all instructions are assigned to threads. Hoch et al. [6] enhanced Iannucci s algorithm by a further criterion for thread fusion. The goals of their partitioning algorithm are the maximisation of the thread length and the minimisation of the synchronisation between threads. In addition to Iannucci s annotations, all starting nodes of dynamic arcs are ....

J. E. Hoch et. al. Compile-time partitioning of a nonstrict language into seqential threads. In Proc. 3rd Symp. on Parallel and Distributed Processing, 1993.


Thread Partitioning and Scheduling Based On Cost Model - Tang, Wang, Theobald, Gao (1997)   (Correct)

....has more constraints than for strict languages, such as avoiding deadlock. Many algorithms for thread partitioning have been proposed for functional languages. The dependence set method merges threads with the same inputs [13] while the demand set al..gorithm combines threads with the same output [10, 24]. The combination of both, together with global analysis, is reported in [26, 6] Moreover, the larger size threads can be generated by applying separation constraint partitioning plus interprocedural analysis [23] For a review of those algorithms, readers are referred to [22] In compiling for ....

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time partitioning of a non-strict language into sequential threads. In Proc. of the Fifth IEEE Symp. on Parallel and Distributed Processing, Dallas, Tex., Dec. 1993.


Separation Constraint Partitioning - A New Algorithm.. - Schauser, Culler.. (1995)   (24 citations)  (Correct)

....known to contribute to the result. imize the length of the threads and minimize the number of thread switches. 1. 1 Contributions The main contribution of this paper is the development of a new thread partitioning algorithm that is substantially more powerful than any previously known [Tra91, HDGS91, SCvE91, TCS92] A compiler for Id90 has been developed with back ends for workstations and the CM 5 multiprocessor. It serves as an experimental platform for studying the effectiveness of the partitioning algorithms. The partitioning algorithms presented in [SCvE91] and [TCS92] are a starting ....

....minimum number of threads is NP complete. Thus, all of the partitioning approaches rely on heuristics to group nodes into threads. Iannucci developed dependence set partitioning, which groups nodes that depend on the same set of inputs [Ian88] Demand set partitioning, presented in [SCvE91] and [HDGS91] is analogous to dependence set partitioning, but it groups nodes which are demanded by the same set of outputs. Iterated partitioning combines the power of dependence and demand set partitioning by applying them iteratively [TCS92, HDGS91] One of the algorithms is applied, then the reduced ....

[Article contains additional citation context not shown here]

J. E. Hoch, D. M. Davenport, V. G. Grafe, and K. M. Steele. Compile-time Partitioning of a Non-strict Language into Sequential Threads. In Proc. Symp. on Parallel and Distributed Processing, Dec. 1991.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC