| R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In International Conference on Parallel Architectures and Compilation Techniques, August 1994. |
....He then uses an affinity graph framework to determine block sizes. Finally, he allows at most two array dimensions to be distributed across the processors, and determines the proper aspect ratio by exhaustive enumeration. Kremer [12] shows that dynamic distribution is NP complete. Bixby et al. [1] and Kremer et al. 13] present a partial, heuristic solution method. They assume that the user provides a decomposition of the program into phases, which are program fragments that are executed without changing distribution. Dynamic redistribution therefore only occurs between phases. Their ....
Robert Bixby, Ken Kennedy, and Ulrich Kremer. Automatic data layout using 0-1 integer programming. Technical Report CRPC-TR93349-S, Center for Research on Parallel Computation, Rice University, Houston, TX, November 1993.
....are solved separately by dedicated algorithms [5] making global optimization impossible. Work based on Integer Programming with Boolean variables led to a combinatorial explosion [21] A lot of work has been done to optimise local criteria such as data and or computation distribution locality [15, 6, 13], parallelism level, number of communications [2, 24] In [11] the scheduling is computed w.r.t. a given partitioning. Since a few years, THALES in collaboration with Ecole des Mines de Paris open a radically new way by bringing up a concurrent model based approach to handle the problem as a ....
E. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proc. of the International Conference on Parallel Architectures and Compilation Techniques, August 1994.
....memory. Keywords: multiprocessors, compilers, addressing, data partitioning, loop partitioning, pages, virtual memory, locality. 1 Introduction The problem of loop and data partitioning for distributed memory multiprocessors with global address spaces has been studiedby many researchers [1, 3, 6, 18, 9, 8, 7, 13]. The goal oflooppartitioningfor applications with nested loops that access data arrays is to divide the iteration space among the processors to get maximum reuse Authors e mail: barua, kranz, agarwal lcs. mit. edu. Authors phone: 617)253 8569. of data in the cache, subject to the constraint ....
....cost compared to other approaches. Section 6 contains some experimental results on the locality addressing tradeoff. We conclude in Section 7. 2 Related work There has been surprisingly little related work in this area. While many works have addressed the problems of loop and data partitioning [1, 3, 6, 18, 9, 8, 7, 13], the problem of addressing data partitioned arrays, which is orthogonal, has been assumed to be system dependent. Specifically, systems with hardware virtual memory use the same, achieving the best locality they can, while systems without use some form of software address calculation. Examples of ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 111-122, Montreal, Canada, August 1994.
....and local memories. We optimize quantitatively for both cache and data locality. 5) Unlike in [4] we allow for hyperparallelepiped data tiles, important for achieving good locality in general affine function array accesses. Results on only one program were presented. Bixby, Kennedy and Kremer [6] present a formulation of the problem of finding data layout as a 0 1 integer programming problem. Though the problem is exponential time in the worst case, a case is made why for smaller problem sizes, the solution can be found in a reasonable amount of time. Formulating compiler problems as 0 1 ....
....is an exciting new approach, also used by the Stanford SUIF compiler [ 16] However, a 0 1 integer programming approach is only as good as its formulation. In the case of finding loop and data partitions 0 1 programming may not be the best answer for the following reasons: 1) The formulation in [6] solves for data partitions only, which is a simpler problem. It been widely recognized [8] that for good performance, data and loop partitions need to be found simultaneously, not one following another. 2) For 0 1 formulations in general, to get one with few enough variables so as to avoid an ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 111-122, Montreal, Canada, August 1994. 13
....of realignment, DDT decides whether it is better to pay the realignment cost penalty and execute a loop or procedure with the appropriate alignment functions rather than execute it with less appropriate alignment functions. Other researchers are considering this topic, either between loops [3] and [17] o within nested loops [18] 5 Acknowledgements This work has been partially supported by CONVEX Computer Corporation, CONVEX Supercomputers S.A.E, CEPBA (European Center for Parallelism of Barcelona) and by the Ministry of Education of Spain under contract TIC880 92. We gratefully acknowledge ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In International Conference on Parallel Architectures and Compilation Techniques, August 1994.
....the WCT problem without precedence constraints, P jj P w i C i , Belouadeh and Potts [8] show that a branch and bound algorithm based on a Lagrangian relaxation can solve instances for n 30. Other studies have shown that ILP can be used to solve hard compilation problems, such as data layout [9], software pipelining [10, 11] register allocation [12, 13] and scheduling [14, 15, 16, 17] On the other hand, approximation algorithms run efficiently and come with performance guarantees. More specifically, a ffi approximation algorithm is a polynomial time algorithm that for every problem ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 111--122, 1994. 16
....case, a good solution is independently found for each phase, and realignment and or redistribution statements are inserted where necessary. Data remapping is also one of the topics in this area subject of current research. Some of the proposals presented in the literature about array remapping [BKK94, CP93, CGSS94, PB95] are summarized in the rest of this section. The D system, currently under development at Rice University considers the profitability of dynamic data remapping by exploring a search space of reasonable alignment and distribution spaces [BKK94] In their work, each phase has a ....
....literature about array remapping [BKK94, CP93, CGSS94, PB95] are summarized in the rest of this section. The D system, currently under development at Rice University considers the profitability of dynamic data remapping by exploring a search space of reasonable alignment and distribution spaces [BKK94] In their work, each phase has a set of candidate mapping schemes. Selecting a mapping scheme for each phase in the entire program is done by representing the problem with the Data Layout Graph. Each possible mapping for a phase is represented with a node. Edges between two nodes in different ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
....along the execution of the whole program ( LC90] KLS90] LC91] Gup92] Who92] CGSS94b] AGG 94] Our work focuses on dynamic mappings in which the mapping of an array may change over its lifetime. Data remapping is one of the topics in this area subject of current research ( CP93] BKK94] CGSS94a] PB95] The main objective of this work has been to devise an algorithm to automatically detect points in the code where to realign or redistribute arrays in order to reduce the total data movement and thus improve performance of the application. Deciding the granularity of the ....
....may be done is also one of the aspects to consider. This research was partially supported by Convex Computer Corporation, CONVEX Supercomputers S.A.E, CEPBA (European Center for Parallelism of Barcelona) and by the Ministry of Education of Spain under contracts TIC 880 92 and TIC 429 95. BKK94] considers the profitability of data remapping between computational phases. Each phase has a set of candidate mapping schemes. Selecting a mapping scheme for each phase in the entire program is done by representing the problem with the Data Layout Graph. Each possible mapping for a phase is ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In International Conference on Parallel Architectures and Compilation Techniques, August 1994.
....that we used in our work; in that environment, however, the analysis guiding the compilation process was to be provided by a separate tool named Data Mapping Assistant . This tool would enable user interaction, and would evaluate candidate distributions 12 using an integer programming framework [11]; the expected performance from each candidate distribution would be derived using the previously observed computation and communication behavior of training sets [7] consisting of small meta benchmarks with the various constructs that are common in data parallel programs. 2.5 Summary Most ....
Robert Bixby, Ken Kennedy, and Ulrich Kremer. Automatic data layout using 0-1 integer programming. Technical Report CRPC-TR93349-S, CRPC/Rice University, 1993.
....algorithm for selections based on preference graphs. Li and Chen [13] considered axis alignment by developing a heuristic to reduce it to weighted bipartite graph matching. Kennedy et al. 11] determine data layouts automatically on distributed memory environments by using 0 1 integer programming [3]. Gupta et al. 8] extend the work of Li and Chen by presenting a framework based on weighted graphs. Kandemir et al. 10] present a framework that can automatically determine data layouts with respect to loop transformations. Their work can find optimal data layouts for all arrays at compiler ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. Technical Report CRPCTR93349 -S, Rice University, 1993.
....structure for the matrices automatically. Clearly this is not applicable to (existing) programs that use hard coded data structures for sparse matrices. While the problems of automatic parallelization for dense matrix computations are, meanwhile, well understood and sufficiently solved (e.g. [3, 4, 5]) these problems are, for sparse matrix computations, solved, if at all, in a very conservative way (e.g. by run time parallelization techniques such as the inspector executor method [6] or run time analysis of sparsity patterns 4 for load balanced array distribution [7] This is not ....
Robert Bixby, Ken Kennedy, and Ulrich Kremer. Automatic Data Layout Using 0--1 Integer Programming. Technical Report CRPC-TR93349-S, Center for Research on Parallel Computation, Rice University, Houston, TX, Nov. 1993.
....heuristics. Alternative data oriented approaches are based on automatic data decomposition techniques; they strive for a global partition to colocate data and computation. The computation partition may be computed simultaneously with the data partition [5] or derived from the data decomposition [7, 14] using the owner computes rule [18] where each processor is assigned work that results in values for its local data. For this paper we assume the compiler partitions computation using global automatic data decomposition techniques, since such methods have been extensively studied and are more ....
B. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedingsof the International Conference on Parallel Architectures and Compilation Techniques (PACT), Montreal, Canada, August 1994.
....alignment and distribution spaces, represented with the Data Layout Graph. Kremer proved in [Kre93] that this exploration is NPcomplete. Again, the problem is modeled as a linear 0 1 integer programming problem suitable to be solved by a state of the art general purpose integer programming solver [BKK94b] Schreiber et al. [SSGC95] with the Excalibur project for compiling array oriented languages at the Research Institute for Advanced Computer Science and at Xerox Park, define the Alignment Distribution Graph where nodes represent program operations and edges connect definitions of array objects ....
....In this case, a good solution is assumed independently for each phase, and realignment or redistribution actions are performed where necessary. Again, note that the best solution for a program could be one with a suboptimal mapping for each phase. Most dynamic data distribution methods [CP93, BKK94b, SSP 95, PB95, AGG 95] consider a set of suboptimal mappings for each phase in order to deal with this case. However, this is a heuristic and not all possibilities are taken into account, and therefore the solution could not be the optimal. The work in Barcelona has focussed on ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
.... can be either static (they do not change along program execution) or dynamic (they can change between computational phases) Most current proposals solve the problem in independent steps: alignment and distribution for each computational phase [13, 9, 11, 3, 16] and their dynamic combination [5, 4, 17, 15, 2]. The alignment step tries to find appropriate alignments between all arrays in a phase, that is, to decide for each array the dimensions that will be aligned to the dimensions of another array called the template (inter dimensional alignment) and for each aligned dimension, to decide the offset ....
....code is composed of a single module (main program) Inter procedural data distribution with our model is out of the scope of this paper. The definition of a phase is a topic of current research, and can lead to different solutions. In our approach we have adopted the definition of phase made by [4] in which a phase is a loop nest such that for each induction variable occurring in a subscript position of an array reference in the loop body, the phase contains the surrounding loop that defines the induction variable. So the first step of our approach is to decompose the program into phases. ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
....read and write accesses will be local on each processor. This example demonstrates the importance of the proper data partitioning. The problem is not easy, especially when considered in combination with the choice of parallelization methods and loops to be parallelized. Traditional approaches [4] where the data partitioning was done before the parallelization may lead to suboptimal solutions. In the future we plan to extend our parallelization algorithm to simultaneously perform data partitioning. For the time being we implemented a simple communication overhead estimator, based on the ....
Robert Bixby, Ken Kennedy, and Urrich Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of PACT'94, Montreal, August 1994.
....In this case, a good solution is assumed independently for each phase, and realignment or redistribution actions are performed where necessary. Again, note that the best solution for a program could be one with a suboptimal mapping for each phase. Most dynamic data distribution methods [CP93, BKK94, SSP 95, PB95, AGG 95] consider a set of suboptimal mappings for each phase in order to deal with this case. However, this is a heuristic and not all possibilities are taken into account, and therefore the solution could not be the optimal. This paper shows a novel approach towards ....
....as different arrays. This means that, although the structure of the graph is the same, the complexity of the problem is higher if there are arrays repeated in several phases. 4 Modelling the Minimal Path Problem Linear integer programming is a tool for solving optimization problems. As stated by [BKK94] data layout problems can be very efficiently solved using linear integer programming. In this case, the problem to solve is to find a path in the CPG that includes exactly one node of each column, so that the sum of weights of the edges (data movement edges) minus the sum of weights of the ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
....case, a good solution is found independently for each phase, and realignment and or redistribution statements are inserted where necessary. Data remapping is also one of the topics in this area subject of current research. Some of the proposals presented in the literature about array remapping [13], 14] 15] are summarized in the rest of this section. The D system, currently under development at Rice University considers the profitability of dynamic data remapping by exploring a search space of reasonable alignment and distribution spaces [13] In their work, a phase is defined as a loop ....
....in the literature about array remapping [13] 14] 15] are summarized in the rest of this section. The D system, currently under development at Rice University considers the profitability of dynamic data remapping by exploring a search space of reasonable alignment and distribution spaces [13]. In their work, a phase is defined as a loop nest such that for each induction variable occurring in a subscript position of a loop body array reference, the phase contains the outermost loop which defines the induction variable. Each phase has a set of candidate mapping schemes. Selecting a ....
[Article contains additional citation context not shown here]
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In International Conference on Parallel Architectures and Compilation Techniques, August 1994.
....into two subproblems which should be addressed independently: i) the alignment problem; ii) the distribution problem. The majority of researchers have focussed on stage i) 9] 4] 6] In other research line the data distribution problem has been formulated as a 0 1 lineal programming model [10], 7] In the approach presented in [1] the goal is finding a static data iterations decomposition without communications; if that is not possible, then a redistribution of data is computed. In these approaches array reshaping is not allowed, triangular loops are not handle and the access ....
K. Kennedy and U. Kremer. Automatic data layout using 0-1 integer programming. In Int'l Conf. Parallel Architectures and Compilation Techniques, Montr'eal, Canada, Aug. 1994.
....this case, a reasonably good solution is assumed independently for each phase, and remapping actions ensure that each phase executes with its solution. Since the best solution for a program could be one with a suboptimal mapping for some of the phases, most dynamic data distribution methods [CP93, BKK94, SSP 95, PB95, AGG 95] consider a set of suboptimal mappings for each phase in order to perform the global analysis. However, this approach may not be optimal as all possible mappings are not taken into account. Kre93] demonstrates that the optimal selection of a mapping for each phase ....
....total execution time of a parallelized program is estimated as the sequential execution time plus the time overhead spent moving data across processors minus the time saved due to the parallel execution of loops. Linear integer programming is a tool for solving optimization problems. As stated in [BKK94] linear integer programming tools can be useful to efficiently solve data layout problems. In this case, the CPG is modeled as a minimal path problem with a set of additional constraints that ensures the correctness of the solution generated. The objective function to minimize is execution time. ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
....the PCFG shown in the middle of the figure. A PCFG is a compacted version of a control flow graph where all CFG nodes associated with a phase are represented by a single node in the PCFG, and edges are annotated with control flow information such as branch probabilities or frequency of execution [BKK94b, Kre95] A node in the DLG represents a candidate data layout in the data layout search space of a phase. The edges represent possible remappings between candidate layouts. Nodes and edges are weighted with their estimated execution times. A solution to the data layout selection problem picks ....
....only. The following discussion concentrates on the impact of read only arrays on the DLG construction and 0 1 integer programming formulation of the data layout selection problem. For arrays that are not read only, the standard DLG representation and 0 1 formulation is used as discussed in [BKK94b, Kre95] 3.1.1 DLG Edges New edges for read only arrays are introduced into the DLG as follows: 1. If the read only region is restricted to a single iteration of the loop, edges are introduced between candidate layouts of a phase that references a read only array a and all candidate layouts ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0--1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT94), pages 111--122, Montreal, Canada, August 1994.
....formulations of the data layout problem as a 0 1 problem. However, these other formulations turned out to be inferior to the presented formulation in terms of the time CPLEX needed to compute the optimal solution. A more detailed discussion of the different formulations can be found elsewhere [7]. 6 Experiments All of our experiments are based on Erlebacher, a 800 line benchmark program written by Thomas Eidson at the Institute for Computer Applications in Science and Engineering (ICASE) The program performs 3 dimensional tridiagonal solves using Alternating Direction Implicit (ADI) ....
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0--1 integer programming. Technical Report CRPC-TR93-349-S, Center for Research on Parallel Computation, Rice University, November 1993.
No context found.
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In International Conference on Parallel Architectures and Compilation Techniques, August 1994.
No context found.
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Montreal, Canada, August 1994.
No context found.
Bixby R., Kennedy K. et.al. Automatic data layout using 0-1 integer programming. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT94), Montreal, Canada, August 1994.
No context found.
R. Bixby, K. Kennedy, and U. Kremer. Automatic Data Layout Using 0-1 Integer Programming. In Proceedings of the 1994.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC