| C. Gong, R. Gupta, and R. Melhem, "Compilation techniques for optimizing communication on distributed-memory systems," in Proceedings of the 22nd International Conference on Parallel Processing, St. Charles, IL, Aug. 1993, pp. II:39--46. 124 |
....of message vectorization, message coalescing and message aggregation [7, 31, 57] More recently some researchers have proposed techniques based on data flow analysis in order to optimize communication across multiple loop nests for the two way (send receive) communication model. Several works [1, 2, 3, 10, 16, 20, 36, 55, 56] present similar frameworks to optimize the send receive communications globally. Almost all these approaches (except for the work of Adve et al. 1, 2] use a variant of Regular Section Descriptors (RSD) introduced by Callahan and Kennedy [9] For each array referenced in the program, an RSD is ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, Volume II, pages 39--46, St. Charles, IL, August 1993.
....Notice that, compared with the message vectorized program in Figure 1(c) this version reduces both the number of messages and the communication volume. Recently a number of authors have proposed techniques based on data flow analysis to optimize communication across multiple loop nests [18, 22, 35, 13, 49, 50]. Most of these approaches use a variant of Regular Section Descriptors (RSD) introduced by Callahan and Kennedy [12] Two most notable representations are Available Section Descriptor (ASD) 22] and Section Communication Descriptor (SCD) 49, 50] Associated with each array that is referenced in ....
....2 = fD j 9D 0 2 DS 1 ; D 00 2 DS 2 st. N (D) N (D 0 ) N (D 00 ) and M(D) M(D 0 ) c M(D 00 )g When there is no ambiguity, we also use [ c and [ d instead of c and d , respectively. It should be noted that although these operations are similar to those given by Gong et al. [18], there is an important difference. Since we keep the communication sets accurately in terms of equalities and inequalities, we can optimize (e.g. coalesce) communication messages even if the messages do not have the same communication pattern (e.g. broadcast, point to point) or identical ....
[Article contains additional citation context not shown here]
C. GONG, R. GUPTA, and R. MELHEM. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, Volume II, pages 39--46, St. Charles, IL, August 1993.
....messages due to references to different arrays to the same destination processor into a single message. Many of these techniques are limited to a single loop nest. Recently a number of authors have proposed techniques based on dataflow analysis to optimize communication across multiple loop nests [5, 6, 10]. Almost all of the approaches use a variant of Regular Section Descriptors (RSD) introduced by Callahan and Kennedy [4] For each array referenced in the program, an RSD is defined which describes the portion of the array that is referenced. Although this representation is convenient for simple ....
....DS 1 # d DS 2 = fD j 9D 0 2DS 1 ; D 00 2DS 2 st. N #D#=N #D 0 #=N #D 00 # and M#D#=M#D 0 # #c M#D 00 #g In this paper, we sometimes use # c and # d instead of c and d , respectively. It should be noted that although these operations are similar to those presented by Gong et al. [5], there is an important difference. Since we keep the communication sets accurately in terms of equalities and inequalities, we can optimize (e.g. coalesce) communication messages even if the messages do not have the same communication pattern (e.g. broadcast, point to point) or identical ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, St. Charles, IL, August 1993.
....messages due to references to different arrays to the same destination processor into a single message. Many of these techniques are limited to a single loop nest. Recently a number of authors have proposed techniques based on dataflow analysis to optimize communication across multiple loop nests [5, 6, 10]. Almost all of the approaches use a variant of Regular Section Descriptors (RSD) introduced by Callahan and Kennedy [4] For each array referenced in the program, an RSD is defined which describes the portion of the array that is referenced. Although this representation is convenient for simple ....
....00 ) and M(D) M(D 0 ) c M(D 00 )g In Proceedings of the 1998 International Parallel Processing Symposium In this paper, we sometimes use [ c and [ d instead of c and d , respectively. It should be noted that although these operations are similar to those presented by Gong et al. [5], there is an important difference. Since we keep the communication sets accurately in terms of equalities and inequalities, we can optimize (e.g. coalesce) communication messages even if the messages do not have the same communication pattern (e.g. broadcast, point to point) or identical ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributedmemory systems. In Proc. International Conference on Parallel Processing, St. Charles, IL, August 1993.
....the value for the read reference. Communication optimizations based only on data dependence information usually result in redundant communications [14] The more recently developed optimizations use data flow information to reduce redundant communication and perform other optimizations. In [24] a data flow framework which can integrate a number of communication optimizations is presented. However, the method can only apply to a very small subset of programs which are constrained in the forms of loop nests and array indices. In [32] a unified framework which uses global array data flow ....
C. Gong, R. Gupta and R. Melhem. "Compilation Techniques for Optimizing Communication on Distributed-Memory System". International conference on Parallel Processing. Vol. II, pages 39-46, August 1993.
....communicated among processors [17] Chen and Sheu [8] Huang and Sadayappan [15] Ramanujam and Sadayappan [35; 36] and Wolf and Lam [40; 41] determined the data distribution and or degree of parallelism of a single nested loop based on the hyperplane method. In addition, Gong, Gupta, and Melhem [10] and Hudak and Abraham [16] developed compile time techniques for optimizing communication overhead. Like previous works in [11; 32] we will deal with the whole source program altogether; however, unlike them, we will deal with each Do loop independently. Data distribution schema between two ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributedmemory systems. In Proc. International Conf. on Parallel Processing, pages II--39--46, St. Charles, IL, Aug. 1993.
....each approach has its own unique features, the general idea is to apply an appropriate combination of message vectorization, message coalescing and message aggregation [9, 3] Recently some researchers have proposed techniques for optimizing communication across multiple loop nests. The works in [5], 7] 12] 4] and [14] present similar frameworks to optimize send recv communication globally and use variants of Regular Section Descriptors (RSD) Although this representation is convenient for simple array sections such as those found in block distributions, it is hard to embed alignment ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, Volume II, pages 39-- 46, St. Charles, IL, August 1993.
....framework, to propagate this information across arbitrary control flow. The combined approach allows us to perform more extensive optimizations than either of the two components would do on its own. There have been several attempts to use data flow analysis in order to optimize communication [8, 7, 11]. Most of these efforts have focused on extending the existing data flow analysis methods to work with some form of array section descriptors. In contrast, our dataflow analysis uses bit vectors (with each bit representing an array portion) and is thus likely to be more efficient. While this ....
....portions of the same array are communicated and from which there is a control flow path to the statement under consideration. Using this information we could split the array portion communicated into parts corresponding to array portions in reaching communications. This is similar to splitting in [7], but we only perform splitting when initializing the data flow framework. Since we are only interested in control flow reachability without taking array kill information into account, this approach does not require array data flow analysis. In our example on the left side of Figure 3, ....
[Article contains additional citation context not shown here]
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....distribution using augmented data access descriptors [19] Chen and Sheu [13] Huang and Sadayappan [20] Ramanujam and Sadayappan [37] 38] and Wolf and Lam [44; 45] determined data distribution and or degree of parallelism based on the hyperplane method. Furthermore, Gong, Gupta, and Melhem [14] and Hudak and Abraham [21] developed compile time techniques for optimizing communication overhead. The rest of this paper is organized as follows. In Section 2, we introduce the schema of data partition and distribution, and the communication primitives. In Section 3, we show an example of how ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributedmemory systems. In Proc. International Conf. on Parallel Processing, pages II--39--46, St. Charles, IL, Aug. 1993.
.... analysis to eliminate redundant monolithic globalmemory accesses across loop nests in the presence of conditionals [4] Gong, Gupta, and Melhem present a dataflow framework to separate sends and receives by placing sends at the earliest point at which the communication can be performed [3]. However, their technique does not eliminate partially redundant communication, handles only singly nested loops and one dimensional arrays, and does not provide balanced communication placement. The unified communication optimization framework developed by Gupta, Schonberg, and Srinivasan adapts ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributedmemory systems. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....reflect the position or the policy of the Government, and no official endorsement should be inferred. be achieved by initiating communication as early as possible. This exposes opportunities for overlapping communication with some intervening computation. There have been several efforts [5, 4, 7, 8] to use data flow analysis to determine communication placement that minimizes communication overhead. However, to reduce the complexity of the problem being solved, previous frameworks ignored most machine dependent resource constraints. This paper introduces a new framework for resource based ....
.... from n to post body node , Path(n) and the buffer required for a path from post body node to e, Total(Succs F (Header(n) test s e 7 8 11 12 15 a = 1 do k=2,999 10 6 4 5 y[k] x[k 1] 14 13 do j=2,999 z[k] y[k 1] 9 4, 992 ] 8, 992 ] 4, 996 ] 0, 1000 ] 0, 1000 ] Buffer, Avail ] [ 4, 996 4, 992 ] [ 4, 996 [ 4, 996 [ 4, 996 ] 4, 992 [ 4, 996 [ 4, 996 [ total buffer = 1000 z[j] x[j 1] y[i] z[iz[i] do i=1,1000 [ 1004, 4 [ 1004, 4 SEND z[ SEND y[ 2 1 3 SEND x[ a) test s e 1 2 3 7 8 9 11 12 15 do k=2,999 do j=2,999 10 6 4 5 z[k] y[k 1] y[k] x[k 1] 14 13 a = 1 z[j] x[j 1] ....
[Article contains additional citation context not shown here]
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....Notice that, compared with the message vectorized program in Figure 1(c) this last version reduces both the number of messages and the communication volume. Recently a number of authors have proposed techniques based on dataflow analysis to optimize communication across multiple loop nests [10, 12, 15, 9, 21]. Almost all of the approaches use a variant of Regular Section Descriptors (RSD) introduced by Callahan and Kennedy [8] Two most notable representations are Available Section Descriptor (ASD) 12] and Section Communication Descriptor (SCD) 21] For each array referenced in the program, an RSD ....
....d DS 2 = fD j 9D 0 2 DS 1 ; D 00 2 DS 2 st. N (D) N (D 0 ) N (D 00 ) and M(D) M(D 0 ) c M(D 00 )g In this paper, we sometimes use [ c and [ d instead of c and d , respectively. It should be noted that although these operations are similar to those presented by Gong et al. [10], there is an important difference. 7 1 do i = i l , i u 2 X(i 2,i) Y(i 1,i 1) X(i,1) 3 do j = j l , ju 4 X(i,j) Y(i 2,i 2) 5 if (cond) 6 X(i 1,j 2) Y(i 2,j 2) 7 Y(i,j) 8 else 9 X(i 1,j 3) Y(i 3,j 3) 10 endif 11 Z(i,j) Y(i 4,j) 12 enddo 13 enddo Figure 3: An example program ....
[Article contains additional citation context not shown here]
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, St. Charles, IL, August 1993.
.... analysis to eliminate redundant monolithic global memory accesses across loop nests in the presence of conditionals [37] Gong, Gupta, and Melhem present a data flow framework to separate sends and receives by placing sends at the earliest point at which the communication can be performed [36]. However, their technique does not eliminate partially redundant communication, handles only singly nested loops and one dimensional arrays, and does not provide balanced communication placement. 83 Gupta, Schonberg, and Srinivasan adapt PRE for communication placement and to develop a unified ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....to execute on distributed memory systems. Traditionally, data dependence analysis has been used to perform communication optimizations within a single loop nest [2, 10, 14] Recently, data flow analysis techniques have been developed to obtain information for global communication optimizations [4, 7, 9, 12, 13]. One approach, which will be referred to as the array dependence approach, refines data flow analysis for scalar with data dependence analysis [4, 12, 13] Another approach, which we will refer to as the array dataflow approach, performs global array data flow analysis [7, 9] The array dataflow ....
.... [4, 7, 9, 12, 13] One approach, which will be referred to as the array dependence approach, refines data flow analysis for scalar with data dependence analysis [4, 12, 13] Another approach, which we will refer to as the array dataflow approach, performs global array data flow analysis [7, 9]. The array dataflow approach can obtain more accurate data flow information at a higher analysis cost than the array dependence approach. The high analysis cost in the array dataflow approach results from the complexity of the data flow descriptor [7, 9] and that operations on the descriptors ....
[Article contains additional citation context not shown here]
C. Gong, R. Gupta and R. Melhem "Compilation Techniques for Optimizing Communication on Distributed-Memory Systems" In International Conference on Parallel Processing, Vol II, pages 39-46, August 1993.
....loop nests) Von Hanxleden et al. 23, 22] have developed a data flow framework for generating communication in the presence of indirection arrays. Their work focuses on irregular subscripts, and therefore does not attempt to obtain more precise information about array sections. Gong et al. [7] describe a data flow procedure that unifies optimizations like vectorizing communication and removing partially redundant communication. They only handle programs with singly nested loops and unidimensional arrays, and with very simple subscripts. This paper presents a framework, based on global ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication in distributed-memory systems. In Proc. 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....loop nests) Von Hanxleden et al. 31, 30] have developed a data flow framework for generating communication in the presence of indirection arrays. Their work focuses on irregular subscripts, and therefore does not attempt to obtain more precise information about array sections. Gong et al. [11] describe a data flow procedure that unifies optimizations like vectorizing communication and removing partially redundant communication. They only handle programs with singly nested loops and unidimensional arrays, and with very simple subscripts. Suppression of partially redundant code is a ....
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication in distributed-memory systems. In Proc. 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....that global communication optimizations can greatly reduce communication costs[5, 14] Two different approaches, one based on data dependence analysis [14] and the other using data This research is supported in part by NSF award CCR9157371 and by AFOSR award F49620 93 1 0023DEF. flow analysis[11, 8], have been proposed. While the data dependence approach is more efficient in terms of its analysis cost, data flow analysis technique has the advantage of better precision. However, the dataflow frameworks [11, 8] typically propagate information represented in some form of array section ....
....part by NSF award CCR9157371 and by AFOSR award F49620 93 1 0023DEF. flow analysis[11, 8] have been proposed. While the data dependence approach is more efficient in terms of its analysis cost, data flow analysis technique has the advantage of better precision. However, the dataflow frameworks [11, 8] typically propagate information represented in some form of array section descriptor. Due to the complexity of the array section descriptors, the propagation of data flow information can be expensive both in time and space. Furthermore, in traditional data flow approaches, obtaining data flow ....
C. Gong, R. Gupta and R. Melhem "Compilation Techniques for Optimizing Communication on DistributedMemory Systems" In International Conference on Parallel Processing, Vol II, pagges 39-46, August 1993.
....so that communication can be maximally overlapped with computation. By separating the send and receive points, blocking at the receives is minimized. Frameworks based upon global array data flow analysis that allow formulation of the above communication optimization problems have been proposed [33, 10, 62]. More sophisticated frameworks that handle complex situations have also been proposed. The Give N Take framework can generate communication in the presence of indirection arrays [76, 75] This framework has also been extended to exploit information about array sections [99] Agrawal s [3] ....
C. Gong, R. Gupta, and R. Melhem, "Compilation Techniques for Optimizing Communication in Distributed-memory Systems," Proc. 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
....then discuss several different fault detection strategies. In Section 4, some experiment results are presented, and our conclusions are given in Section 5. 2 The SPMD Execution Model The SPMD execution model provides a powerful approach for programming distributed memory systems [5] 10] 11] [7] [9] In this paper, we assume that the user writes a sequential program and uses directives to specify the distribution of data. The compiler then translates the program to execute in SPMD fashion according to the owner computes rule. A processor examines the statements in the program ....
C. Gong, R. Gupta, and R. Melhem, "Compilation Techniques for Optimizing Communication in Distributed-Memory Systems," Proc. International Conference on Parallel Processing, 1993.
....optimizations such as message vectorization and redundant communication elimination. Program analysis must be performed to obtain information required for the optimizations. Two different analysis approaches, one based on data dependence analysis [10] and the other using array data flow analysis [7,6], have been proposed. While the data dependence approach is more efficient in terms of the analysis cost, the array data flow analysis approach has the advantage of better precision. Array data flow analysis propagates some form of array section descriptor [6,7] Due to the complexity of the ....
....other using array data flow analysis [7,6] have been proposed. While the data dependence approach is more efficient in terms of the analysis cost, the array data flow analysis approach has the advantage of better precision. Array data flow analysis propagates some form of array section descriptor [6,7]. Due to the complexity of the array This research is supported in part by NSF award CCR 9157371 and by AFOSR award F4962093 1 0023DEF. section descriptor, the propagation of data flow information can be expensive both in time and space. Furthermore, in traditional data flow analysis ....
C. Gong, R. Gupta and R. Melhem, Compilation techniques for optimizing communication on distributed-memory systems, in Proc. International Conf. on Parallel Processing, St. Charles, IL, 1993, Vol. II, 39--46.
No context found.
C. Gong, R. Gupta, and R. Melhem, "Compilation techniques for optimizing communication on distributed-memory systems," in Proceedings of the 22nd International Conference on Parallel Processing, St. Charles, IL, Aug. 1993, pp. II:39--46. 124
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC