130 citations found. Retrieving documents...
J. Li and M. Chen. Compiling CommunicationEfficient Programs for Massively Parallel Machines. Journal of Parallel and Distributed Computing, 2(3):361--376, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Massively Parallel Processing Using Optical Interconnections - Salisbury, Melhem   (Correct)

....to the time needed to transmit a message in a high bandwidth optical network. A more flexible implementation of TDM can be used to balance these two types of delay. In some cases, a compiler can use sophisticated algorithms to identify communication patterns similar to the ones described earlier[33] and multiplex these patterns together[47] This use of compiled communication reduces establishment delays by providing the network with TDM to change connections rapidly. It also reduces access delay by establishing a sequence of connections tailored to the application s immediate needs. TDM can ....

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--375, 1991.


Communication Optimizations Used in the Paradigm.. - Palermo, Su, Chandy.. (1994)   (21 citations)  (Correct)

....a wide variety of machines. Summary When fully implemented, PARADIGM will be capable of performing all of the following tasks automatically: generation of data partitioning directives [9, 10] partitioning of computation and generation of communication [11] synthesis of high level communication [16], exploitation of functional parallelism [17] support of a multithreaded execution model [18] and support of irregular computations [19] 3. COMMUNICATION OPTIMIZATIONS The first three communication optimizations examined in this paper, message coalescing, message vectorization, and message ....

J. Li and M. Chen, "Compiling Communication-Efficient Programs for Massively Parallel Machines," IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 361--376, July 1991.


High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (9 citations)  (Correct)

....this proposal uses the terms static software DSM and dynamic software DSM to refer to members of the first and second class, respectively. Static Approaches Static software DSM systems are typified by compilers for FORTRAN style scientific codes targeting message passing multicomputers [9, 41, 51, 64, 79, 87]. These are typically data parallel systems with a single thread of control; parallelism can only be expressed in the form of a large number of similar (possibly identical) operations applied in parallel to elements of large, dense arrays according to some user specified iteration space (parallel ....

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, pages 361--376, July 1991.


A Linear Algebra Framework for Static HPF Code.. - Ancourt, Coelho.. (1995)   (63 citations)  (Correct)

.... Techniques to generate distributed code from sequential or parallel code using a uniform memory space have been extensively studied since 1988 [22, 70, 89] Techniques and prototypes have been developed based on Fortran [38, 39, 47, 18, 69, 88, 19, 20] C [8, 63, 6, 60, 7, 61] or others languages [74, 75, 58, 66, 57]. The most obvious, most general and safest technique is called run time resolution [22, 70, 74] Each instruction is guarded by a condition which is only true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and ....

J. Li and Marina Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--376, July 1991.


Compilation Techniques for Optimizing Communication on.. - Gong, Gupta, Melhem (1993)   (15 citations)  (Correct)

....sequentialization caused by communication; 2. Avoiding redundant communication; 3. Overlapping of communication and computation; and 4. Combining small messages sent to the same destination into larger messages. Significant work has been done in optimizing the com munication [1] 6] 4] [11]. In this paper, we propose to perform all of the above optimizations in a unifying framework. This framework allows us to deal with the tradeoff between conflicting optimizations. We develop a data flow framework for collecting the information needed to perform the above optimizations. We focus ....

....in the program. This does not allow effective overlapping of communication and computation. Gallivan and Jalby [3] were one of the first to formally treat the problem of optimizing data transfers in distributedmemory systems by avoiding redundant communication and avoiding sequentialization. In [11], the authors defined some array reference patterns and use pre implemented routines to match those patterns and achieve communication optimization. However, only restricted classes of communication patterns can be optimized in this way. In [9] the authors proposed optimizations similar to the ....

J. K. Li, and M. Chen, "Compiling Communication-Efficient Programs for Massively Parallel Machines," IEEE trans. on Parallel and Distributed Sys., Vol. 2, July, 1991.


Detecting and Using Affinity in an Automatic Data.. - Ayguadé, Garcia..   (Correct)

....the memories of the processors. This decision is done so that communication among processors or the amount of remote accesses is reduced as much as possible, and the communication patterns become as uniform as possible. The data distribution algorithm is based on the ideas proposed in [7] and [8]. The tool is named DDT (Data Distribution Tool) and its implementation has been done on top of ParaScope [9] The output of DDT is an HPF program in which the original program is annotated with a set of directives that specify how arrays are aligned to a set of virtual target arrays and how the ....

....For example, in MG3D, OCEAN and FPPPP reference patterns where an IV is used at one side and a different IV 0 or LCV is used at the other side of the reference pattern are the most frequent. On the average, this technique is useful in 12 of the affinity relations found. As proposed in [8], reference patterns derived directly from the input program can be optimized before building the DAG with the objective of achieving a more realistic characterization of the code in terms of data movement requirements. One of the most common optimization is to eliminate, when possible, identical ....

[Article contains additional citation context not shown here]

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed Systems, 2(3), July 1991.


A Framework for Integrating Data Alignment.. - Garcia..   (Correct)

....Other researchers propose the use of algorithms based on dynamic programming [8] however, in [20] they find an optimal solution to their alignment problem by using 0 1 integer programming techniques. In order to solve the distribution problem, an exhaustive search is usually performed. In [25] the authors describe a model that exhaustively explores all distribution options, based on pattern matching between the reference pattern of an assignment statement and a predefined set of communication primitives. In [15] they use a constraint based approach assuming a default distribution. ....

....to distinguish the considered search space of possible data mappings. With a communication no communication [24] or a cheap expensive [1] cost model it is quite simple to obtain reliable solutions in complex programs. Another option is to estimate performance through symbolic analysis of the code [25, 15, 3]; however, array data sizes have to be known at compile time, as well as the number of loop iterations and the probabilities of conditional statements. This information has to be provided by the user or obtained through profiling. In contrast, training sets [6] obtain good performance estimations ....

[Article contains additional citation context not shown here]

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, Vol 2(3), July 1991.


DDT: A Research Tool for Automatic Data Distribution in .. - Ayguadé, Garcia..   (Correct)

....of processors assigned to each of them. A good distribution maximizes the potential parallelism of the code, and offers the possibility of further reducing data movement by serializing. This goal could be trivially satisfied by assigning a datum to each processor, which maximizes parallelism. LC91] match the aligned reference patterns with a predefined set of data movement routines. Each routine has an architecture dependent cost parametrized in terms of number processors involved in the data motion and amount of data being moved. The cost function for all the patterns is minimized by ....

....that may be too computationally expensive to be included in a final compiler; however, this allows us to explore a rich set of solutions. The static module is based on the CAG but extended with some information regarding parallelism. We have also modified the original algorithms in [LC90, LC91] to improve the quality 3 of the mappings generated [AGG 94] The current version of the static module generates both inter and intra dimensional alignments and BLOCK and CY CLIC distributions. The dynamic module explores a rich set of combinations; it is not exhaustive thanks to mechanisms ....

[Article contains additional citation context not shown here]

J. Li and M. Chen. Compiling Communication-efficient Programs for Massively Parallel Machines. IEEE Trans. on Parallel and Distributed Systems, 2(3), July 1991.


A Matrix-Based Approach to Global Locality Optimization - Kandemir, Choudhary.. (1999)   (16 citations)  (Correct)

....2000 also confirm this argument. 3 Now an interesting question is that whether the programs that exhibit good processor locality need cache optimization techniques. After all, there are a number of powerful automatic data distribution techniques published in the literature (see for example [40, 5, 14, 21, 33, 47, 52] and the references therein) and for example, the SGI Origin gives the programmer fine grain control over data distribution, that can be optimized using any of the techniques mentioned. Our answer to this question, however, is no; that is, just ensuring good processor locality does not imply good ....

J. Li and M. Chen. Compiling communication efficient programs for massively parallel machines. Journal of Parallel and Distributed Computing, 2(3):361--376, 1991.


SPMD Execution of Programs with Pointer-based Dynamic Data.. - Gupta   (Correct)

.... applications using arrays includes the Fortran D system for distributing arrays [13, 1] techniques for identifying parallel loop iterations whose execution does not require communication [14] and identifying communication patterns for efficiently implementing interprocessor communication [21]. Work has also also been done on the compilation of systolic applications for fine grained distributed memory systems [25, 7] and compilation techniques for reducing communication on SIMD and VLIW machines [17, 11] In the next section we present language constructs which enable the user to ....

Li, J., and Chen, M. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parallel Distrib. Systems. 2, 3 (July 1991), 361-376.


Data Redistribution in an Automatic Data Distribution.. - Ayguade, Garcia.. (1995)   (2 citations)  (Correct)

....be done according to the access patterns within computational intensive phases and parallelism exploitation out of them. There has been a significant amount of work concerning static mappings, where the mapping of each array remains fixed along the execution of the whole program ( LC90] KLS90] LC91] Gup92] Who92] CGSS94b] AGG 94] Our work focuses on dynamic mappings in which the mapping of an array may change over its lifetime. Data remapping is one of the topics in this area subject of current research ( CP93] BKK94] CGSS94a] PB95] The main objective of this work has ....

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed Systems, 2(3), July 1991.


A Novel Approach Towards Automatic Data Distribution - Garcia, Ayguadé.. (1995)   (29 citations)  (Correct)

....patterns are matched with a set of predefined data movement routines. The routines that have been used are listed in Table 1, together with the reference patterns that matches them. All the information in this table is machine dependent and should be tailored to the specific target machine [LiCh91]. In the example of Figure 1, when analyzing the first assignment statement, the following edges between arrays B and C are inserted: The communication primitive that is assigned as weight in the edge that connects the first dimension of array C to the second dimension of the array B (C[1] B[2] ....

J. Li and M. Chen, "Compiling Communication-efficient Programs for Massively Parallel Machines", IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 3, July 1991.


Compilation and Communication Strategies for Out-of-core.. - Bordawekar, Choudhary (1996)   (3 citations)  (Correct)

....memory) Hence, the overall time for an out of core program depends on the communication pattern, available memory and I O access pattern. 3.3.1 Heuristic used Here we present a simple heuristic that can be used to decide which communication method to use. Fox and Li and Chen [GMG 88, LC91] showed that a compiler can take advantage of the highly regular communication patterns displayed by many computation and can match the pattern to collective communication routines such as shifts, broadcasts, all to all communication, transposes etc. We use the following technique to recognize ....

....be identified from a program s syntax [Tri77] Another important area of investigation was use of iteration tiling for exploiting hierarchical memory [IT88, SD90, RS92] Several researchers have analyzed communication patterns in parallel programs. Important work in this area includes [GMG 88, LC91, Gup92, HKT92] There has been a lot of interest in developing runtime libraries for improving I O performance of 27 I O intensive (not just out of core) parallel applications. Chameleon was the first runtime system which provided extensive support for parallel I O [GGL93] del Rosario et al. ....

Jinke Li and Marina Chen. Compiling Communication-Efficient Programs for Massively Parallel Machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--376, July 1991.


Compiler Support for Machine-Independent Parallelization of.. - von Hanxleden (1994)   (7 citations)  (Correct)

....parallel languages such as Linda [CG89] Strand [FT90, FO90] and Delirium [LS91] have been defined. Several compilation systems for exploiting fine grained parallelism have been and are being built, which include Al [Tse90] Aspar [IFKF90] C Dataparallel C [HQL 91, RS87] Crystal [LC91] Dino [RSW91] Id Nouveau [RP89] Mimdizer [SWW92] Oxygen [RA90] P 3 C [GAY91] Pandore [APT90] Parafrase 2 [GB92] Paragon [CCRS91] Spot [SS90, Soc90] Superb [ZBG88] and Vienna Fortran [BCZ92] While there is still much work to be done in this field in general, there has already been ....

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--376, July 1991.


Structured Parallel Programming: Theory meets Practice - Darlington, Guo, To (1995)   (2 citations)  (Correct)

....lead to considerable improvements in the cost of communication especially when communication can be completely removed. The algebraic axiomatisation of communication optimisation has been intensively studied in the context of developing an optimal compiler for conventional data parallel languages [9]. The commonly used approach is based on the analysis of index relations between two sides of an assignment. Since, using SCL, communication are explicitly specified in terms of a set of well defined communication operators, the index based analysis can be systematically replaced by transformation ....

Jingke Li and Marina Chen. Compiling communication efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--375, July 1991.


Modeling Communication Locality in Multiprocessors - Salisbury, Chen, Melhem (1999)   (2 citations)  Self-citation (Chen)   (Correct)

No context found.

J. Li and M. Chen, Compiling communication-efficient programs for massively parallel machines, IEEE Trans. Parallel Distrib. Syst. 2 (1991), 361#375.


Modeling Communication Locality in Multiprocessors - Salisbury, Chen, Melhem (1999)   (2 citations)  Self-citation (Chen)   (Correct)

....of the network architecture. We will quantify some intuitive notions of locality, and see how broadly the results can be applied. In some cases, techniques called compiled communication can be performed on parallel programs to identify static communication patterns prior to program execution [3, 12, 14, 22]. For this reason, we will consider the locality of a predetermined sequence of communication 2 requests that will be presented to an interconnection network. In particular, we define a quantitative measure that satisfies some intuitive properties of locality and can be applied to parallel ....

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361--375, 1991.


Program Analysis and Transformations for Fast Data Sharing - Li, Hermannsson, Wittie   Self-citation (Li)   (Correct)

....more successful projects follow. Fortran D at Rice provides a data parallel programming environment for both message passing and SIMD computers[9] Kali, at Purdue and ICASE, is a Pascal like programming environment which supports shared data structures for message passing systems [11] Crystal[10] at Yale and Id Nouveau [16] at Cornell are functional languages. Crystal uses syntactic reference patterns of assignments and an initial data layout to select appropriate communication routines and data decompositions. In Id Nouveau, array values can be assigned only once. Id Nouveau generates ....

J. Li and M. Chen. Compiling CommunicationEfficient Programs for Massively Parallel Machines. IEEE Trans. on Parallel and Distributed Systems, 2(3):361--376, July 1991.


Access Normalization: Loop Restructuring for NUMA Compilers - Li, Pingali (1992)   (34 citations)  Self-citation (Li)   (Correct)

.... is a contribution to the state of the art of compiling programs in languages like FORTRAN D that permit user defined data decomposition for parallel machines with a memory hierarchy, which is the goal of a number of projects including Parascope, Superb, Id Nouveau, Crystal, and other projects [7, 12, 14, 19, 23, 26, 31, 34, 39]. The emphasis in these projects has been on code generation mechanisms (such as the ownership rule discussed in Section 2) and on recognizing and exploiting special patterns of computation and communication such as reductions. Although it is wellknown that loop restructuring before code ....

J. Li and M. Chen. Compiling communication efficient program for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2:361-- 376, July 1991.


Automatic Partitioning of Data and Computations - Sudarsan Tandri Ibm   (Correct)

No context found.

J. Li and M. Chen. Compiling CommunicationEfficient Programs for Massively Parallel Machines. Journal of Parallel and Distributed Computing, 2(3):361--376, 1991.


Access Normalization: Loop Restructuring for NUMA Compilers - Li, Pingali (1992)   (34 citations)  (Correct)

No context found.

J. Li and M. Chen. Compiling communication efficient program for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2:361-- 376, July 1991.


Data Redistribution in an Automatic Data Distribution.. - Ayguade, Garcia.. (1995)   (2 citations)  (Correct)

No context found.

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed Systems, 2(3), July 1991.


Parallel Sparse Supports for Array Intrinsic Functions of.. - Chang, Chuang, Al. (2001)   (1 citation)  (Correct)

No context found.

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines, IEEE Transactions on Parallel and Distributed Systems, 2:361--376, 1991.


Detecting Affinity For Automatic Data Distribution - Ayguadé, Labarta, Garcia.. (1994)   (1 citation)  (Correct)

No context found.

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed Systems, 2(3), July 1991.


Using A 0-1 Integer Programming Model For Automatic.. - Garcia.. (1996)   (Correct)

No context found.

J. Li and M. Chen, "Compiling Communication-efficient Programs for Massively Parallel Machines", IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 3, July 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC