| J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. of the Workshop on Automatic Data Layout and Performance Prediction, 1995. |
....its MLG is the same as the NG of the said nest. Figure 2 shows the MLG for the program fragment shown in Figure l(a) Note that the MLG, once built, contains all the memory access information for every array accessed in every loop nest. It is inspired by the graph structures used by Garcia et al. [7] and Kennedy and Kremer [15] to solve the automatic data distribution problem for distributed memory message passing architectures. A path in an MLG is defined as a series of connected paths in each NG. The LG visited by the path on a specific NG corresponds to the innermost loop in the nest in ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. Supercomputing'95, San Diego, December 1995.
....an accurate form used to summarize array regions accessed in each node of the LCG. For this, we use the ID that is generated from the LMADs for a loop. As shown in Section 4, the LMAD can accurately represent array accesses with affine or non affine expressions. Thus, unlike most other techniques [2, 10, 14, 24, 26], our technique works whether subscripts and loop limits expressions are affine or non affine. Also, our technique is interprocedural since the LMAD can represent array access across procedure boundaries efficiently, as described in [22] 5.2.2 Intra phase Locality Let # # ### ## be the ID for ....
J. Garcia, E. Ayguade, and J. Labarta. A Novel Approach towards Automatic Data Distribution. Proceedings of Supercomputing '95, 1995.
....legality of the resulting code. For this ane mapping have been used and simpli ed using the OMEGA library [72] Code generation was performed using the OMEGA calculator. Communication Parallelism Graph A graph based framework to minimise communication is presented by Garcia, Ayguad e and Labarta [31]. They build a weighted graph linking all consecutive array accesses to each other, such that a shortest path may be found, which represents minimal communication overheads. To this end, a two dimensional matrix of nodes is constructed, consisting of d rows, where d is the highest dimension of any ....
Jordi Garcia, Eduard Ayguade, and Jesus Labarta. A novel approach towards automatic data distribution. In Proceedings of the Conference on Supercomputing, San Diego, California, USA, December 1995. ACM Press.
....layout of data within the virtual address space is changed to make the data accessed by each processor local. For example, column wise distribution of a very large array which is stored row major in memory may require reshaped distribution. In theory, a compiler can use automatic static (e.g. [13, 35, 23, 11, 38, 27, 7]) or dynamic (e.g. 12, 32, 3] data distribution methods developed for message passing architectures or non uniform memory access (NUMA) machines; once the optimal data distributions are determined, the appropriate distribution directives can be inserted in the code. We considered the C ....
....to [21, 22] and [8] the approach explained in this paper considers a much larger search space for memory layouts, and finds the optimal layouts in a single step. Finally, as mentioned earlier, there is a huge body of work on automatic data distribution on distributed memory machines (e.g. [7, 3, 11, 12, 13, 23, 27, 32, 38, 36]) The interplay between that work and ours have been discussed earlier in the paper. 10 Summary In this paper, we presented an approach based on the theory of hyperplanes and and the linear algebra framework used by parallelizing compilers for optimizing memory layouts of arrays. Our approach ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. Supercomputing '95, San Diego, December 1995.
.... work done by Anderson and Lam at Stanford University [AL93] Chatterjee, Gilbert, Schreiber, Sheffler, and Pugh at RIACS, Xerox Parc, and the University of Maryland [CGSS94, SSP 95] Ayguad e, Garcia, Girones, Labarta, Torres and Valero at the University of Catalunya in Barcelona, AGG 94, GAL95] and Ning, Van Dongen, and Gao at CRIM and McGill University [NDG95] In contrast to previous work, we are the first to consider read only replication and memory constraints in a unified framework. More recently, other researchers have started to investigate the feasibility of 0 1 integer ....
....[NDG95] In contrast to previous work, we are the first to consider read only replication and memory constraints in a unified framework. More recently, other researchers have started to investigate the feasibility of 0 1 integer programming techniques in the context of automatic data layout [GAL95, Phi95] Using integer programming for instruction scheduling under resource constraints for super scalar machines has been discussed by Feautrier [Fea94] and Ning, Govindarajan, Altman and Gao [NG93, AG94, AGG95] 6 Conclusion and Future Work Read only replication is an important technique to ....
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Proceedings of the Workshop on Automatic Data Layout and Performance Prediction (AP'95), Houston, TX, April 1995.
....node T is connected to each subscript position of the last column. Figure 1 also shows the LG for our example loop nest. The LG, once built, contains all the memory access information for all the arrays with respect to the innermost loop, and inspired by the graph structure used by Garcia et al. [6] to solve the automatic data distribution problem for message passing machines. In Figure 1 the five columns correspond to the five arrays referenced in the nest. The nodes in a specific column correspond to the subscript positions; e.g. the first node denotes the first subscript position and so ....
....positions; e.g. the first node denotes the first subscript position and so on. 3 Static memory layout detection The 0 1 integer programming problem is a linear integer programming problem in which each variable is restricted to have a value from the set f0; 1g [17] Like Garcia et al. [6], we use the notation EUV to denote all the dim(U) Thetadim(V ) edges between U and V. EUV [i; j] on the other hand, denotes the edge between the i th subscript position of U and the j th subscript position of V. We also use EUV [i; j] to denote the 0 1 integer variable associated with the ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. Supercomputing'95, San Diego, December 1995.
....local, since the unit of consideration is a loop nest rather than the entire program. Furthermore, they are restricted in their application by data dependences. Conversely, data transformations, such as alignment and partitioning, have received much attention in distributed memory compilation [6, 9, 12]. As data layout has program wide impact, these techniques have, by necessity, been more global in their consideration. They are unaffected by data dependences but there has been, until recently, difficulty in applying data transformations to reshaped arrays across procedure boundaries (in [19] a ....
J. Garcia, E. Ayguade and J. Labarta, A Novel Approach Towards Automatic Data Distribution, Proc. Automatic Data Layout and Performance Predictors, 1995.
....predefined standard distribution function for comparison. Instead of using heuristics to solve the alignment problem Kremer [14, 16] proposed to use 0 1 integer programming techniques. Their approach provides dynamic intraprocedural solutions of the alignment problem. The approach of Garcia et al. [11] introduces a new framework combining both alignment and data distribution analysis. Recently their work was extended to dynamic intraprocedural analysis taking control flow information into account [12] however their tool is currently restricted to intraprocedural analysis and one dimensional ....
....to be NP complete in [19] however Li and Chen presented an heuristic algorithm which reduced the problem to a bipartite graph matching problem. We make use of an adapted version of this algorithm as shown below. Other researchers tried to solve the problem using 0 1 integer programming algorithms [14, 11] neglecting the NP completeness of the problem. This requires them to accept a huge amount of computation time. Our heuristic algorithm begins by creating initial Alignment Sets: each Alignment Set contains a single element which is a dimension of an array in the code. The array selected must have ....
[Article contains additional citation context not shown here]
J. Garcia, E. Ayguad`e, and J. Labarta. A Novel Approach Towards Automatic Data Distribution. In Workshop on Automatic Data Layout and Performance Prediction, Houston, April 1995. Center for Research on Parallel Computing, Rice University.
....problem using a heuristic which combines loop nests (with potentially different distributions) in such a way that the largest potential communication costs are eliminated first while still maintaining sufficient parallelism. Bixby, Kennedy and Kremer [26] as well as Garcia, Ayguad e, and Labarta [58], have formulated the dynamic data partitioning problem in the form of a 0 1 integer programming problem by selecting a number of candidate distributions for each loop nest and constructing constraints from the data relations. Also, Sheffler, Schreiber, Gilbert and Pugh [158] have applied graph ....
J. Garcia, E. Ayguad'e, and J. Labarta, "A Novel Approach Towards Automatic Data Distribution," Proc. of the Workshop on Automatic Data Layout and Performance Prediction, Houston, TX, Apr. 1995.
....to automatically derive data and computation partitions by the compiler. In this paper, we describe an algorithm for automatically deriving data and computation partitions on SSMMs. The automatic derivation of data and computation partitions has been an active area of research in recent years [12, 13, 14, 15, 16]. This research targets distributed memory multiprocessors and derives partitions that minimize remote memory accesses (i.e. interprocessor communication) We show that an algorithm that derives data and computation partitions on SSMMs must take into consideration shared memory effects, including ....
....partitions of smaller program segments. They minimize interprocessor communication using the performance estimator developed by Balasundaram et al. 28] The program segments can have different data partitions. Data repartitioning is introduced in the program if it is beneficial. Garcia et al. [16] propose a technique to relate parallelism and data alignment. A static data partition is derived for the program using 0 1 integer programming. Anderson and Lam [12] present an algebraic framework for both distributed memory multiprocessors and SSMMs, and determine data and computation partitions ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. of the Workshop on Automatic Data Layout and Performance Prediction, 1995.
....of identifying split points is performed recursively until the cost of redistribution is greater than the cost of using a single static distribution. Ramanujam and Sadayappan [32] have used linear algebra methods to obtain a static distribution for a restricted class of programs. Garcia et al. [12,13] use a communication parallelism graph in which each node represents a dimension of an array while the weight of an edge reflects the cost of the alignment of the nodes it connects. They use hyper edges to represent the parallelism in the program. The communication parallelism graph for a program ....
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Supercomputing, December 1995.
....of identifying split points is performed recursively until the cost of redistribution is greater than the cost of using a single static distribution. Ramanujam and Sadayappan [32] have used linear algebra methods to obtain a static distribution for a restricted class of programs. Garcia et al. [12,13] use a communication parallelism graph in which each node represents a dimension of an array while the weight of an edge reflects the cost of the alignment of the nodes it connects. They use hyper edges to represent the parallelism in the program. The communication parallelism graph for a program ....
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Workshop on Automatic Data Layout and Performance Prediction, CRPC, April 1995.
....Insert synchronization for inter processor dependences Figure 3.1: Overview of the entire optimization process 3.2 Selecting Space Mappings 3.2. 1 Introduction The problem of automatically distributing computation has been addressed by a large number of authors [Gup92, Fea94, AL93a, BKK93, GAL95, SSP 95] My work improves on most previous work in the following ways: 1. I am not influenced by the order of the computation of the original program. I use methods to determine the parallelism inherent in the program rather than the parallelism that can be obtained using the computation ....
....two computations that could be run in parallel. The problem with this approach is that it is not possible to sacrifice some parallelism in a particular statement (by instead pipelining it or not distributing it) in order to reduce overall communication costs. Although some other systems [BKK93, GAL95] also use exact rather than greedy heuristic algorithms, the size of the problems and the methods used are very different. I consider a list of candidate distributions for each statement, whereas these systems consider a list of candidate distributions for each array in each phase. My search ....
[Article contains additional citation context not shown here]
Jordi Garcia, Eduard Ayguade, and Jesus Labarta. A novel approach towards automatic data distribution. In Workshop on Automatic Data Layout and Performance Predition, April 1995.
....Data partitions are allowed to change only across these small program segments. The costs of data partitions are estimated using a performance estimator developed by Balasundaram et al. 20] Garcia et al. also use 0 1 inter programming to determine static data partitions for arrays in a program [21], but use profiling to determine the frequency of execution of each loop nest, hence more accurately estimating the costs of data partitions. Anderson and Lam [12] present an algebraic framework for both distributed memory multiprocessors and SSMMs. They determine data and computation partitions ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. of the Workshop on Automatic Data Layout and Performance Prediction, 1995.
No context found.
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
....performed when this dimension is distributed. Several edges between a pair of nodes are replaced by a single edge with a weight equal to the sum of the original ones. Details about the matching of reference patterns to data movement routines and the estimation of their cost can be found elsewhere [11, 18]. After adding the data movement information, the CPG (without weights) that is obtained is shown in Figure 2.a. In this graph, the edges due to the assignment of array B to array A, the assignment of array C to array B, and the self assignment of array D are shown. The cost functions have been ....
....distribution tool, and with the results of some other authors or hand coded parallel versions. This approach is restricted to one dimensional array distributions which is a severe drawback in view of real applications. The extension to multi dimensional array distributions can be found elsewhere [11]. A lot of additional aspects should be considered in the problem formulation in order to improve the quality of the solutions generated, such as integrating communication optimizations (detection and elimination of redundant communication, overlapping of computation and communication, or ....
J. Garcia, E. Ayguad, and J. Labarta, "A Novel Approach Towards Automatic Data Distribution", 2nd Workshop on Automatic Data Layout and Performance Prediction, April 1995 (also available as Research Report CEPBA/UPC RR95-04).
....of array B and the first dimension of array D, while the third hyperedge links the first dimension of array B and the second dimension of array D. The resulting CPG filled only with the parallelism hyperedges can be seen in Figure 2.b. More details about the CPG construction can be found in [GAL95] 11 3.4 Alignment and Distribution Once the CPG has been built, it contains all the required information for us to estimate the performance effects of any selected distribution. For instance, the behavior of the code example in Figure 1 when distributing the first dimension of each array is ....
....hyperedges) that connect nodes inside the selected path, is minimized. This problem is formulated as a linear 0 1 integer programming problem, that is, a linear integer programming problem where each variable has two possible values: 0 or 1. Details about this formulation can be found in [GAL95] 3.7 Experimental Results The main components of our tool are described next. The parsing of the code is performed using the parser module of DDT which obtains all reference patterns in Fortran programs after performing some well known optimizations, such as expression substitution, subscript ....
J. Garcia, E. Ayguad'e, and J. Labarta. A Novel Approach Towards Automatic Data Distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
....purpose linear 0 1 integer programming solver, which finds the optimal solution in a small amount of time, and avoids the use of heuristics while computing the solution. The rest of this paper is organized as follows. In the next section we summarize the basic ideas of our proposal presented in [7] for static data distributions. In Section 3 the model is extended to generate dynamic distributions when necessary. Section 4 gives an overview of how the problem is modeled in order to obtain the optimal solution. Section 5 summarizes our implementation and gives some experimental results. ....
J. Garcia, E. Ayguade, and J. Labarta. A Novel Approach Towards Automatic Data Distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
....of array B and the first dimension of array D, while the third hyperedge links the first dimension of array B and the second dimension of array D. The resulting CPG filled only with the parallelism hyperedges can be seen in Figure 3.b. More details about the CPG construction can be found in [GAL95] 2.2 Alignment and Distribution Once the CPG has been built, it contains all the required information for us to estimate the performance effects of any selected distribution. For instance, the behavior of the code example in Figure 2 when distributing the first dimension of each array is ....
....specified: the sum of the weights of all selected edges, minus the sum of the weights of all selected hyperedges. The linear integer programming solver finds the optimal solution subject to the specified constraints. More details about the formulation of the minimal path problem can be found in [GAL95] Program Loops Parall Phases Arrays Dimens Patterns baro 98 86 24 38 2 428 shallow 39 38 27 14 2 282 tomcatv 18 9 11 7 2 77 x42 36 29 19 19 2 196 rhs 37 37 4 4 4 24 adi 15 10 9 3 2 48 Table 1: Characteristics of the selected programs. 5 Experimental Results The main components of our ....
J. Garcia, E. Ayguad'e, and J. Labarta. A Novel Approach Towards Automatic Data Distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
....the access to data is done locally as much as possible. On the other side, we have explored the feasibility of using 0 1 integer programming models to solve in a global way the alignment and distribution problems, both static and dynamic. A description of this implementation can be found elsewhere [24]. 8 Acknowledgements This work has been partially supported by CONVEX Computer Corporation, CONVEX Supercomputers S.A.E, CEPBA (European Center for Parallelism of Barcelona) and by the Ministry of Education of Spain under contract TIC880 92. We gratefully acknowledge the helpful comments of ....
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Workshop on Automatic Data Layout and Performance Prediction. Center for Research on Parallel Computing, Rice University, April 1995.
....0 or 1. The solution is computed in a single step, and includes array alignment, distribution, redistribution, and the corresponding loop parallelization strategy. This approach avoids the use of heuristics while finding the optimal solution. Details about the formulation can be found elsewhere [GAL95] 3 Two dimensional Data Distribution In this section we describe how to enhance the CPG in order to support two dimensional distributions, assuming that the processors topology is constant and known at compilation time. Let p be the number of processors in the target architecture. Usually p is ....
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
....whole description of the tool can be found in [AGG 97] The techniques proposed for distributed memory systems are currently applied to cache coherent shared memory parallel systems, in which the large cache sizes can perform as a local memory system. This is described in [AGGL96] In [GAL95b, GAL95a] we presented the CPG as a novel approach towards static automatic data distribution. The mappings supported at that point considered the whole program as a single phase. The minimal path problem was formally specified in [GAL96c] This basic model was extended in [GAL96b] in order to support ....
J. Garcia, E. Ayguad'e, and J. Labarta. A novel approach towards automatic data distribution. In Proceedings of Supercomputing'95, San Diego, CA, December 1995.
No context found.
J. Garcia, E. Ayguade, and J. Labarta. A novel approach towards automatic data distribution. In Proc. of the Workshop on Automatic Data Layout and Performance Prediction, 1995.
No context found.
Garcia J., Ayguad'e E. et.al. A novel approach towards automatic data distribution. Proceedings of the Workshop on Automatic Data Layout and Performance Prediction (AP'95), Houston, TX, April 1995.
No context found.
J. Garcia, E. Ayguade, and J. Labarta, "A novel approach towards automatic data distribution, " in Proceedings of the Workshop on Automatic Data Layout and Performance Prediction, Houston, TX, Apr. 1995.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC