| R. Rugina and M. C. Rinard. Automatic parallelization of divide and conquer algorithms. In PPOPP, 1999. |
....Traverse(c) Figure 56: Pseudocode for the tree traversal routine from the health program. The spawn keyword is simply an annotation that indicates that sibling calls to Traverse are guaranteed to correctly run concurrently. legal to traverse the tree in either depth first or breadth first order [18, 96, 98]. The breadth first traversal tends to execute sibling nodes in the tree concurrently, but does so at the cost of pushing continuations on to the front, and popping continuations from the back, of a FIFO queue. Each of the pointers (to the front and back of the queue) thus forms an implicit ....
Radu Rugina and Martin C. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 1999.
....describe the infrastructure and algorithms of Bitwise, a compiler that performs bitwidth analysis. Bitwise uses SSA as its intermediate form. It performs a numerical data flow analysis. Because we are solving for absolute numerical bitwidths, the more complex symbolic analysis is not needed [22]. We continue by comparing the candidate data flow lattices that were considered in our implementation. 3.1 Candidate Lattices We considered three candidate data structures for propagating the numerical information of our analysis. Figure 3 1 visually depicts the lattice that corresponds to ....
....arithmetic expressions precisely. Technically, this representation maps bitwidth analysis to the more general value range propagation problem. Value range propagation is known 18 to be useful in value prediction, branch prediction, constant propagation, procedure cloning, and program verification [18, 22]. For the Bitwise compiler we chose to propagate data ranges, not only because of their generality, but also because most important applications use arithmetic and will benefit from their exact precision. Unlike a regular set union, we define the datarange union operation (t) to be the union over ....
Radu Rugina and Martin Rinard. Automatic Parallelization of Divide and Conquer Algorithms. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, BC, June 2000.
....the existing algorithms for array bounds check elimination are heavyweight (e.g. those based on theorem provers [SI77,Nec98,NL98] and are therefore not suitable for deployment in a dynamic optimization setting. Some simpler algorithms (e.g. those based upon value range analysis [Har77, Pat95, RR99] cannot eliminate partially redundant checks. Algorithms that can eliminate partial redundancy [MCM82,Gup90,Gup94, Asu92, KW95] operate upon dense program representations (e.g. the control flow graph) and rely upon exhaustive iterative data flow analyzers. Thus, they too do not meet our ....
....as Xu et al. XMR00] Although more powerful than ABCD, theorem proving is expensive and therefore unsuitable for a dynamic optimization setting. Value range analysis has been used to compute bounds on the values of index expressions for the purpose of eliminating full redundancy [Har77, Pat95, RR99] One of our goals was to handle non scientific programs, which often have complex control flow. Therefore, in addition to full redundancy, it was important for us to handle elimination of partial redundancy. Several conventional iterative data flow style bounds check elimination algorithms have ....
Radu Rugina and Martin Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the ACM SIGPLAN 1999 Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999. ACM SIGPLAN.
....algorithms for array bounds check elimination are heavyweight (e.g. those based on theorem provers [SI77,Nec98,NL98,XMR00] and are therefore not suitable for deployment in a dynamic optimization environment. Some simpler algorithms (e.g. those based upon value range analysis [Har77, Pat95, RR99] cannot eliminate partially redundant checks. Algorithms that can eliminate partial redundancy [MCM82, Gup90, Gup94,Asu92,KW95] operate upon dense program representations (e.g. the control flow graph) and rely upon exhaustive iterative data flow analyzers. Thus, they, too, do not meet our ....
....amount shown for each benchmark represents the fraction of upper bound checks that were removed, measured in terms of dynamic instruction counts. For the benchmarks from SPECjvm98 (the top five bars) this fraction is divided between local and global checks. inating full redundancy [Har77, Pat95, RR99] One of our goals was to handle non scientific programs, which often have complex control flow. Therefore, in addition to full redundancy, it was important for us to handle elimination of partial redundancy. Several conventional iterative data flow style bounds check elimination algorithms have ....
Radu Rugina and Martin Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the ACM SIGPLAN 1999 Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999. ACM SIGPLAN.
....describes the infrastructure and algorithms of Bitwise, a compiler that performs bitwidth analysis. Bitwise uses SSA as its intermediate form. It performs a numerical data flow analysis. Because we are solving for absolute numerical bitwidths, the more complex symbolic analysis is not needed [23]. We continue by comparing the candidate data flow lattices that were considered in our implementation. 3.1 Candidate Lattices We considered three candidate data structures for propagating the numerical information of our analysis. Figure 2 visually depicts the lattice that corresponds to each ....
....on arithmetic expressions precisely. Technically, this representation maps bitwidth analysis to the more general value range propagation problem. Value range propagation is known to be useful in value prediction, branch prediction, constant propagation, procedure cloning, and program verification [19, 23]. For the Bitwise compiler we chose to propagate dataranges, not only because of their generality, but also because most important applications use arithmetic and will benefit from their exact precision. Unlike a regular set union, we define the data range union operation (t) to be the union over ....
R. Rugina and M. Rinard. Automatic Parallelization of Divide and Conquer Algorithms. In Proceedings of the SIGPLAN Conference on Program Language Design and Implementation, Vancouver, BC, June 2000.
....is implemented using explicit invocation records. We have implemented Satin by extending the Manta compiler. We discuss the performance of ten applications on a Myrinet based cluster. 1 Introduction There is currently much interest in divide and conquer systems for parallel programming [2, 6, 10, 11, 15]. Divide and conquer style programs start by dividing the problem into subproblems. Each subproblem is then recursively solved, again by dividing it into smaller subproblems. An example of such a system is Cilk [6] which extends C with divide and conquer primitives. Cilk runs these annotated C ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 72-83, Atlanta, May 4-6 1999. Massachusetts Institute of Technology.
....is implemented using explicit invocation records. We have implemented Satin by extending the Manta compiler. We discuss the performance of four applications on a Myrinet based cluster. 1 Introduction There is currently much interest in divide and conquer systems for parallel programming [2, 5, 11, 14, 18]. Divide and conquer style programs start by dividing the problem into subproblems. Each subproblem is then recursively solved, again by dividing it into smaller subproblems. An example of such a system is MIT s Cilk [5] which extends C with divide and conquer primitives. Cilk runs these ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 72--83, Atlanta, May 4-6 1999. Massachusetts Institute of Technology.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. Slides for talk given at PPoPP '99; available 93 from http://www.cag.lcs.mit.edu/~rinard/divide and conquer/ ppopp99.slides.ps, May 1999.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....extra components would slow the compiler down and increase the complexity. The problem with this situation is that general techniques tend to do relatively pedestrian things to the program. For speci c classes of programs, more specialized analyses and transformations would make a huge di erence [12, 9, 1]. But because they are not generally useful, they don t make it into widely used compilers. We believe that credible compilation can make it possible to develop lots of di erent custom compilers that have been specialized for speci c classes of applications. The idea is to make a set of credible ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....in Figure 6.11 on the facing page, with negligible performance cost. However, handling upper bounds completely requires a symbolic analysis that is out of the current scope of this work. Future work will use induction variable analysis and integrate an existing integer linear programming approach [36] to fully address array bounds checks. 6.2.6 Experimental results The full SPTC analysis and optimization has been implemented in the FLEX java compiler platform. Some quantitative measure of the utility of SPTC is given as Figure 6.12. The run times given are intermediate representation ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. Slides for talk given at PPoPP '99; available 93 from http://www.cag.lcs.mit.edu/~rinard/divide and conquer/ ppopp99.slides.ps, May 1999.
....store instructions [2] The goal is to exploit instruction level parallelism and to determine statically which memory modules specific instructions may access. We have used the pointer analysis results as a foundation for the symbolic analysis and parallelization of divide and conquer algorithms [20]. For efficiency reasons, these programs often access memory using pointers and pointer arithmetic. Our analysis algorithm provides the pointer analysis information required to symbolically analyze such pointerintensive code. Both of these projects use the pointer analysis algorithm only on ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....analysis, which is executed in both cases. The availability of design information significantly decreased both the complexity and the implementation time of the analysis. Compared to the implementation in our previous work for the automatic parallelization of divide and conquer algorithms [14, 16], the design based approach presented in this paper eliminated sophisticated interprocedural algorithms based on fixed point algorithms or on reductions to linear programs. These complex analyses were replaced by the simple design verification algorithms presented in the current paper. This ....
....the array regions that each procedure accesses. They then propagate accessed array regions from callees to callers to derive the regions accessed by the complete execution of each procedure. Researchers have recently generalized this approach for recursive procedures that access data via pointers [14, 9]. An issue is maintaining precision in the face of the fixed point computations used to analyze recursive procedures. Our recent generalization of the intraprocedural approach presented in Section 3 to accurately analyze recursive procedures without fixed points eliminates this particular problem ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....where the running times were not significantly dropped by the availability of design information, both compiler complexity and implementation time were significantly decreased. Compared to the implementation in our previous work for the automatic parallelization of divide and conquer algorithms [14, 16], the design based approach presented in this paper eliminated sophisticated interprocedural algorithms based on fixed point algorithms or on reductions to linear programs. These complex analyses were replaced by the simple design verification algorithms presented in the current paper. This ....
....the array regions that each procedure accesses. They then propagate accessed array regions from callees to callers to derive the regions accessed by the complete execution of each procedure. Researchers have recently generalized this approach for recursive procedures that access data via pointers [14, 9]. An issue is maintaining precision in the face of the fixedpoint computations used to analyze recursive procedures. Our recent generalization of the intraprocedural approach presented in Section 3 to accurately analyze recursive procedures without fixed points eliminates this particular problem ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....Eliminated Register Bits Memory Bits convolve 35.94 25.76 histogram 30.56 73.86 intfir 36.72 1.59 intmatmul 47.32 35.42 jacobi 42.71 75.00 life 65.92 96.88 median 43.75 3. 12 mpegcorr 58.20 53.12 pmatch 59.38 47.24 Table 2: Bitwidth Analysis Results tial programs [22, 31, 19]. These algorithms use fixed point approaches to analyze recursive programs, employing a variety of ad hoc techniques (such as artificially limiting the number of iterations or using imprecise widening operators) to avoid the problem of infinite ascending chains in the domain of symbolic ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
....components would slow the compiler down and increase the complexity. The problem with this situation is that general techniques tend to do relatively pedestrian things to the program. For specific classes of programs, more specialized analyses and transformations would make a huge difference [9, 8, 1]. But because they are not generally useful, they don t make it into widely used compilers. We believe that credible compilation can make it possible to develop lots of different custom compilers that have been specialized for specific classes of applications. The idea is to make a set of ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999. 38
....in symbolic memory access region analysis, array bounds check elimination, race detection, and parallelizing compilers. 6. 1 Symbolic Memory Access Region Analysis Researchers have previously proposed several algorithms for the symbolic analysis of accessed memory regions in sequential programs [18, 24, 15]. These algorithms use fixed point approaches to analyze recursive programs, employing a variety of ad hoc techniques (such as artificially limiting the number of iterations or using imprecise widening operators) to avoid the problem of infinite ascending chains in the domain of symbolic ....
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, May 1999.
No context found.
R. Rugina and M. C. Rinard. Automatic parallelization of divide and conquer algorithms. In PPOPP, 1999.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, pp. 72--83, 1999.
No context found.
R. Rugina and M. Rinard, "Automatic parallelization of divide and conquer algorithms," in Proc. of PPoPP, pp. 72--83, 1999.
No context found.
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proc. of PPoPP, 1999.
No context found.
R. Rugina and M. Rinard. "AUTOMATIC PARALLELIZATION OF DIVIDE AND CONQUER ALGORITHMS". In Proc. of Principles and Practice of Parallel Programming, pp. 72-83, 1999.
No context found.
Radu Rugina and Martin Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 72--83, May 1999.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC