Results 1 - 10
of
216
The Omega Test: a fast and practical integer programming algorithm for dependence analysis
- Communications of the ACM
, 1992
"... The Omega testi s ani nteger programmi ng algori thm that can determi ne whether a dependence exi sts between two array references, and i so, under what condi7: ns. Conventi nalwi[A m holds thati nteger programmiB techni:36 are far too expensi e to be used for dependence analysi6 except as a method ..."
Abstract
-
Cited by 522 (15 self)
- Add to MetaCart
(Show Context)
The Omega testi s ani nteger programmi ng algori thm that can determi ne whether a dependence exi sts between two array references, and i so, under what condi7: ns. Conventi nalwi[A m holds thati nteger programmiB techni:36 are far too expensi e to be used for dependence analysi6 except as a method of last resort for si:8 ti ns that cannot be deci:A by si[976 methods. We present evi[77B that suggests thiwi sdomi s wrong, and that the Omega testi s competi ti ve wi th approxi mate algori thms usedi n practi ce and sui table for usei n producti on compi lers. Experi ments suggest that, for almost all programs, the average ti me requi red by the Omega test to determi ne the di recti on vectors for an array pai ri s less than 500 secs on a 12 MIPS workstati on. The Omega testi based on an extensi n of Four i0-Motzki var i ble eli937 ti n (aliB: r programmiA method) toi nteger programmi ng, and has worst-case exponenti al ti me complexi ty. However, we show that for manysiB7 ti ns i whi h ...
Some efficient solutions to the affine scheduling problem -- Part I One-dimensional Time
, 1996
"... Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A s ..."
Abstract
-
Cited by 266 (22 self)
- Add to MetaCart
Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution date to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An efficient algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Global Optimizations for Parallelism and Locality on Scalable Parallel Machines
- IN PROCEEDINGS OF THE SIGPLAN '93 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1993
"... Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key i ..."
Abstract
-
Cited by 256 (20 self)
- Add to MetaCart
Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures.
A PRACTICAL ALGORITHM for Exact Array Dependence Analysis
, 1992
"... A fundamental analysis step in advanced optimizing compiler (as well as many other software tools) is data dependence analysis for arrays. This means deciding if two references to an array can refer to the same element and if so, under what conditions. This information is used to deter-mine allowabl ..."
Abstract
-
Cited by 196 (0 self)
- Add to MetaCart
A fundamental analysis step in advanced optimizing compiler (as well as many other software tools) is data dependence analysis for arrays. This means deciding if two references to an array can refer to the same element and if so, under what conditions. This information is used to deter-mine allowable program transformations and optimizations. For example, we can determine that in the following code fragment, no location of the array is both read and written. Once we also verify that no location is written more than once, we know that the writes can be done in any order. for i = 1 to 100 do forj-- i to 100 do A[i, j+ 11 = A[100,j] There has been extensive study of decision methods for array data dependences [1, 2, 5, 6, 8, 15, 18, 25]. Much of this work has focused on approximate methods that are guaranteed to be fast but only com-pute exact results in (commonly occurring) special cases. In other situations, approximate methods are conservative. They accurately report all actual dependences, but may also report spurious depen-dences. Data dependency problems are equivalent to deciding whether there exists an integer solution to a set of linear equalities and inequali-ties, a form of integer program-ming. The problem as just shown would be formulated as an integer programming problem in the next example. In this example, iw andjw refer to the values of the loop vari-ables at the time the write is per-formed and iT and jr refer to the val-ues of the loop variables at the time the read is performed.
Code generation in the polyhedral model is easier than you think
- In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract
-
Cited by 167 (16 self)
- Add to MetaCart
(Show Context)
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedback-directed, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with non-unimodular, non-invertible, non-integral or even non-uniform functions. It presents several improvements to a state-of-the-art code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle real-life problems. 1.
Maximizing Parallelism and Minimizing Synchronization with Affine Transforms
- Parallel Computing
, 1997
"... This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop nestings and affine data accesses. The problem is formulated without the use of imprecise data depende ..."
Abstract
-
Cited by 148 (6 self)
- Add to MetaCart
This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop nestings and affine data accesses. The problem is formulated without the use of imprecise data dependence abstractions such as data dependence vectors. The algorithm presented subsumes previously proposed program transformation algorithms that are based on unimodular transformations, loop fusion, fission, scaling, reindexing and/or statement reordering. 1 Introduction As multiprocessors become popular, it is important to develop compilers that can automatically translate sequential programs into efficient parallel code. Getting high performance on a multiprocessor requires not only finding parallelism in the program but also minimizing the synchronization overhead. Synchronization is expensive on a multiprocessor. The cost of synchronization goes far beyond just the operations that manipul...
A Singular Loop Transformation Framework Based on Non-singular Matrices
, 1992
"... In this paper, we discuss a loop transformation framework that is based on integer non-singular matrices. The transformations included in this framework are called -transformations and include permutation, skewing and reversal, as well as a transformation called loop scaling. This framework is mo ..."
Abstract
-
Cited by 130 (8 self)
- Add to MetaCart
In this paper, we discuss a loop transformation framework that is based on integer non-singular matrices. The transformations included in this framework are called -transformations and include permutation, skewing and reversal, as well as a transformation called loop scaling. This framework is more general than existing ones; however, it is also more difficult to generate code in our framework. This paper shows how integer lattice theory can be used to generate efficient code. An added advantage of our framework over existing ones is that there is a simple completion algorithm which, given a partial transformation matrix, produces a full transformation matrix that satisfies all dependences. This completion procedure has applications in parallelization and in the generation of code for NUMA machines.
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 124 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
A practical automatic polyhedral parallelizer and locality optimizer
- In PLDI ’08: Proceedings of the ACM SIGPLAN 2008 conference on Programming language design and implementation
, 2008
"... We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical mod ..."
Abstract
-
Cited by 122 (8 self)
- Add to MetaCart
(Show Context)
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model.Unlike previous polyhedral frameworks, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high performance for local and parallel execution on multi-cores, when compared with state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.