• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Some efficient solutions to the affine scheduling problem, part II: multidimensional time (1992)

by P Feautrier
Venue:Intl. J. of Parallel Programming
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 266
Next 10 →

Code generation in the polyhedral model is easier than you think

by Cédric Bastoul - In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04 , 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract - Cited by 167 (16 self) - Add to MetaCart
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedback-directed, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with non-unimodular, non-invertible, non-integral or even non-uniform functions. It presents several improvements to a state-of-the-art code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle real-life problems. 1.
(Show Context)

Citation Context

...ution order by using a reordering function (a schedule, or a placement, or a chunking function). Finding suitable execution orders has been the subject of most of the research on the polyhedral model =-=[4, 5, 9, 12, 13, 20, 22, 24, 27]-=-. Lastly the code generation step returns back to an abstract syntax tree or to a new source code implementing the execution order implied by the reordering function. Up to now, the polyhedral model f...

Maximizing Parallelism and Minimizing Synchronization with Affine Transforms

by Amy W. Lim, Monica S. Lam - Parallel Computing , 1997
"... This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop nestings and affine data accesses. The problem is formulated without the use of imprecise data depende ..."
Abstract - Cited by 148 (6 self) - Add to MetaCart
This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop nestings and affine data accesses. The problem is formulated without the use of imprecise data dependence abstractions such as data dependence vectors. The algorithm presented subsumes previously proposed program transformation algorithms that are based on unimodular transformations, loop fusion, fission, scaling, reindexing and/or statement reordering. 1 Introduction As multiprocessors become popular, it is important to develop compilers that can automatically translate sequential programs into efficient parallel code. Getting high performance on a multiprocessor requires not only finding parallelism in the program but also minimizing the synchronization overhead. Synchronization is expensive on a multiprocessor. The cost of synchronization goes far beyond just the operations that manipul...

A practical automatic polyhedral parallelizer and locality optimizer

by Uday Bondhugula, Albert Hartono, J. Ramanujam - In PLDI ’08: Proceedings of the ACM SIGPLAN 2008 conference on Programming language design and implementation , 2008
"... We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical mod ..."
Abstract - Cited by 122 (8 self) - Add to MetaCart
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model.Unlike previous polyhedral frameworks, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high performance for local and parallel execution on multi-cores, when compared with state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
(Show Context)

Citation Context

...good ways to tile for parallelism and locality directly through an affine transformation framework is the central idea. Our approach is thus a departure from scheduling-based approaches in this field =-=[20, 21, 17, 24, 15]-=- as well as partitioning-based approaches [37, 36, 35] (due to incorporation of more concrete optimization criteria), however, is built on the same mathematical foundations and machinery. We show how ...

Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies.

by Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam - Int. J. Parallel Program., , 2006
"... Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is ..."
Abstract - Cited by 82 (24 self) - Add to MetaCart
Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semi-automatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architecture-aware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.
(Show Context)

Citation Context

...our representation lies in the simplicity of the formalism to compose non-unimodular transformations across long, flexible sequences. Existing formalisms have been designed for black-box optimization =-=(8, 11, 12)-=- , and applying a classical loop transformation within them — as proposed in (9, 10, 3) — requires a syntactic form of the program to anchor the transformation to existing statements. Up to now, the e...

Fuzzy Array Dataflow Analysis

by Denis Barthou, Jean-François Collard, Paul Feautrier , 1997
"... This paper deals with more general control structures, such as ifs and while loops, together with unrestricted array subscripting. Notice that we assume that unstructrured programs are preprocessed, and that for instance gotos are first converted in whiles. However, with such unpredictable, dynamic ..."
Abstract - Cited by 78 (18 self) - Add to MetaCart
This paper deals with more general control structures, such as ifs and while loops, together with unrestricted array subscripting. Notice that we assume that unstructrured programs are preprocessed, and that for instance gotos are first converted in whiles. However, with such unpredictable, dynamic control structures, no exact information can be hoped for in the general case. However, the aim of this paper is threefold. First, we aim at showing that even partial information can be automatically gathered thanks to Fuzzy Array Dataflow Analysis (FADA). This paper extends our previous work [4] on FADA to general, nonaffine array subscripts. The second purpose of this paper is to formalize and generalize these previous formalisms and to prove general results. Third, we will show that the precise, classical ADA is a special case of FADA.

A Framework for Unifying Reordering Transformations

by Wayne Kelly, William Pugh , 1993
"... We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iterat ..."
Abstract - Cited by 76 (10 self) - Add to MetaCart
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. This work is supported by an NSF PYI grant CCR-9157384 and by a Packard Fellowship. 1 Introduction Optimizing compilers reorder iterations of statements to improve instruction scheduling, register use, and cache utilization, and to expose parallelism. Many different reordering transformations have been developed and studied, su...

Code Generation for Multiple Mappings

by Wayne Kelly, William Pugh, Evan Rosser - IN FRONTIERS '95: THE 5TH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION , 1994
"... There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ..."
Abstract - Cited by 75 (2 self) - Add to MetaCart
There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropriate order. This problem has been fairly well-studied in the case where all statements use the same mapping. We have developed an algorithm for the less well-studied case where each statement uses a potentially different mapping. Unlike many other approaches, our algorithm can also generate code from mappings corresponding to loop blocking. We address the important trade-off between reducing control overhead and duplicating code.

Is Search Really Necessary to Generate High-Performance BLAS?

by Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill , 2005
"... Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of p ..."
Abstract - Cited by 64 (13 self) - Add to MetaCart
Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional model-driven optimization cannot compete with search-based empirical optimization because tractable analytical models cannot capture all the complexities of modern high-performance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a model-driven optimization engine, and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code with performance comparable to that of code generated by ATLAS using global search. Index Terms — program optimization, empirical optimization, model-driven optimization, compilers, library generators, BLAS, high-performance computing

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

by Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali - In Proceedings of the 2000 ACM International Conference on Supercomputing , 2000
"... We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop ..."
Abstract - Cited by 64 (3 self) - Add to MetaCart
We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. 1. BACKGROUND AND PREVIOUSWORK Sophisticated algorithms based on polyhedral algebra have been developed for determining good sequences of linear loop transformations (permutation, skewing, reversal and scaling) for enhancing locality in perfectly-nested loops 1. Highlights of this technology are the following. The iterations of the loop nest are modeled as points in an integer lattice, and linear loop transformations are modeled as nonsingular matrices mapping one lattice to another. A sequence of loop transformations is modeled by the product of matrices representing the individual transformations; since the set of nonsingular matrices is closed under matrix product, this means that a sequence of linear loop transformations can be represented by a nonsingular matrix. The problem of finding an optimal sequence of linear loop transformations is thus reduced to the problem of finding an integer matrix that satisfies some desired property, permitting the full machinery of matrix methods and lattice theory to ¢ This work was supported by NSF grants CCR-9720211, EIA-9726388, ACI-9870687,EIA-9972853. £ A perfectly-nested loop is a set of loops in which all assignment statements are contained in the innermost loop.

Quantifying the Multi-Level Nature of Tiling Interactions

by Nicholas Mitchell , Larry Carter, Jeanne Ferrante, Karin Högstedt - INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING , 1997
"... Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost f ..."
Abstract - Cited by 64 (7 self) - Add to MetaCart
Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost functions decreases as architectures trend towards a hierarchy of memory and parallelism. We look at three common architectural scenarios. For each, we quantify the improvement a single tiling choice could realize by using information from multiple levels in concert. To do so, we derive multilevel cost functions which guide the optimal choice of tile size and shape. We give both analyses and simulation results to support our points.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University