Results 1 - 10
of
47
Maximizing Multiprocessor Performance with the SUIF Compiler
, 1996
"... This paper presents an overview of the SUIF compiler, which automatically parallelizes and optimizes sequential programs for shared-memory multiprocessors. We describe new technology in this system for locating coarse-grain parallelism and for optimizing multiprocessor memory behavior essential to ..."
Abstract
-
Cited by 280 (22 self)
- Add to MetaCart
(Show Context)
This paper presents an overview of the SUIF compiler, which automatically parallelizes and optimizes sequential programs for shared-memory multiprocessors. We describe new technology in this system for locating coarse-grain parallelism and for optimizing multiprocessor memory behavior essential to obtaining good multiprocessor performance. These techniques have a significant impact on the performance of half of the NAS and SPECfp95 benchmark suites. In particular, we achieve the highest SPECfp95 ratio to date of 63.9 on an eight-processor 440MHz Digital AlphaServer. 1 Introduction Affordable shared-memory multiprocessors can potentially deliver supercomputer-like performance to the general public. Today, these machines are mainly used in a multiprogramming mode, increasing system throughput by running several independent applications in parallel. The multiple processors can also be used together to accelerate the execution of single applications. Automatic parallelization is a promis...
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract
-
Cited by 110 (4 self)
- Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler
, 1995
"... This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array ..."
Abstract
-
Cited by 103 (23 self)
- Add to MetaCart
This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially ...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a side-effect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing run-time speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parse-time peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
SUIF Explorer: an interactive and interprocedural parallelizer
, 1999
"... The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarse-grain loops, thus mini ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarse-grain loops, thus minimizing the number of spurious dependences requiring attention. Second, the system uses dynamic execution analyzers to identify those important loops that are likely to be parallelizable. Third, the SUIF Explorer is the first to apply program slicing to aid programmers in interactive parallelization. The system guides the programmer in the parallelization process using a set of sophisticated visualization techniques. This paper demonstrates the effectiveness of the SUIF Explorer with three case studies. The programmer was able to speed up all three programs by examining only a small fraction of the program and privatizing a few variables. 1. Introduction Exploiting coarse-grain parallelism i...
Symbolic Range Propagation
- Proceedings of the 9th International Parallel Processing Symposium
, 1994
"... Many analyses and transformations in a parallelizing compiler can benefit from the abilityto compare arbitrary symbolic expressions. In this paper, we describe how one can compare expressions by using symbolic ranges of variables. A range is a lower and upper bound on a variable. We will also des ..."
Abstract
-
Cited by 63 (9 self)
- Add to MetaCart
(Show Context)
Many analyses and transformations in a parallelizing compiler can benefit from the abilityto compare arbitrary symbolic expressions. In this paper, we describe how one can compare expressions by using symbolic ranges of variables. A range is a lower and upper bound on a variable. We will also describe how these ranges can be efficiently computed from the program text. Symbolic range propagation has been implemented in Polaris, a parallelizing compiler being developed at the University of Illinois, and is used for symbolic dependence testing, detection of zero-trip loops, determining array sections possibly referenced by an access, and loop iteration-count estimation.
Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries
, 2001
"... As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap the discrepancy between the need for n ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
(Show Context)
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called
Interprocedural Constant Propagation: A Study of Jump Function Implementations
- IN PROCEEDINGS OF THE ACM SIGPLAN 93 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1993
"... An implementation of interprocedural constant propagation must model the transmission of values through each procedure. In the framework proposed by Callahan, Cooper, Kennedy, and Torczon in 1986, this intraprocedural propagation is modeled with a jump function. While Callahan et al. propose several ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
An implementation of interprocedural constant propagation must model the transmission of values through each procedure. In the framework proposed by Callahan, Cooper, Kennedy, and Torczon in 1986, this intraprocedural propagation is modeled with a jump function. While Callahan et al. propose several kinds of jump functions, they give no data to help choose between them. This paper reports on a comparative study of jump function implementations. It shows that different jump functions produce different numbers of useful constants; it suggests a particular function, called the pass-through parameter jump function, as the most cost-effective in practice.
Simplification of Array Access Patterns for Compiler Optimizations
, 1994
"... Existing array region representation techniques are sensitive to the complexityofarray subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in sci ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Existing array region representation techniques are sensitive to the complexityofarray subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in scientific applications, many access patterns are simple even when the subscript expressions are complex. In this work, we present a new, general array access representation and define operations for it. This allows us to aggregate and simplify the representation enough that precise region operations may be applied to enable compiler optimizations. Our experiments show that these techniques hold promise for speeding up applications.