115 citations found. Retrieving documents...
R. Das, M. Uysal, J. Saltz, and Y. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures", Journal of Parallel and Distributed Computing, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Platforms for HPJava: Runtime Support for Scalable Programming in.. - Lim (2003)   (Correct)

....data structures and regular data access patterns problems, it is not particularly suitable for irregular data access. Some libraries are dealing directly with irregularly distributed data, DAGH [49] Kelp [27] and other libraries support unstructured access to distributed arrays, CHAOS PARTI [21] and Global Arrays [44] While the library based SPMD approach to data parallel programming may address weakness of HPF, it loses good features like the uniformity and elegance that promised by HPF. There are no compile time or compiler generated run time safety checks for the distributed arrays ....

....may be executed many times, repeating the same communication pattern. In this way, especially for iterative programs, the cost of computations and negotiations involved in constructing a schedule can often be amortized over many executions. This pattern was pioneered in the CHAOS PARTI libraries [21]. If a communication pattern is to be executed only once, simple wrapper functions are made available to construct a schedule, 46 execute it, then destroy it. The overhead of creating the schedule is essentially unavoidable, because even in the single use case individual data movements generally ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Interfacing Global Arrays and ARMCI with the PCRC library Adlib - Carpenter, Nieplocha (1999)   (Correct)

....be executed many times, repeating the same communication pattern. 7 In this way, especially for iterative programs, the cost of computations and negotiations involved in constructing a schedule can often be amortized over many executions. This paradigm was pioneered in the CHAOS PARTI libraries [5]. If a communication pattern is to be executed only once, simple wrapper functions can be made available to construct a schedule, execute it, then destroy it. The overhead of creating the schedule is essentially unavoidable, because even in the single use case individual data movements generally ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462- 479, September 1994.


Compile-time Composition of Run-time Data and Iteration.. - Strout, Carter, Ferrante (2003)   (4 citations)  (Correct)

....generating the run time reordering functions can be reduced with various optimizations, such as moving the data to new locations only once and traversing fewer dependences. Run time reordering transformations are implemented with inspectors and executors, originally developed for parallelization [6]. In this setting, the inspector traverses the index arrays that describe the data mappings and or dependences. Based on the values in these arrays, it generates data and or iteration reordering functions. The executor is a transformed version of the original loop that uses the reordered data ....

....complete automation of run time reordering transformations. Initially such transformations were incorporated into applications manually for parallelism [5] Next, libraries with run time transformation primitives were developed so that a programmer or compiler could insert calls to such primitives [6]. Currently, there are many run time reordering transformations for which a compiler can automatically analyze and generate the inspectors [25, 7, 21, 12] However, each transformation or composition of transformations are treated separately. Our framework provides a uniform representation for ....

R. Das, M. Uysal, J. Saltz, and Yuan-Shin S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, 1994.


Compiler Support for Scalable and Efficient Memory Systems - Barua, Lee, Amarasinghe.. (2001)   (2 citations)  (Correct)

....A[i 2j 3] 2j] is an affine access, but A[ij 4] and A[2i 1] are not. Low order interleaving is the distribution of array elements in a round robin manner across the memory banks. That is, for a low order interleaved array A[ element A[i] is allocated on bank i mod N . A[6] A[10] . A[3] A[5] A[1] A[8] A[8] A[11] A[7] A[5] A[9] A[4] A[2] A[6] A[7] A[11] A[4] A[0] A[8] A[1] A[1] A[5] A[9] A[2] A[6] A[10] A[10] A[3] A[7] A[11] for = 0 to 99 do endfor A[ i ] A[0] A[4] ....

....for a low order interleaved array A[ element A[i] is allocated on bank i mod N . A[6] A[10] A[3] A[5] A[1] A[8] A[8] A[11] A[7] A[5] A[9] A[4] A[2] A[6] A[7] A[11] A[4] A[0] A[8] A[1] A[1] A[5] A[9] A[2] A[6] A[10] . A[10] A[3] A[7] A[11] for = 0 to 99 do endfor A[ i ] A[0] A[4] A[99] c) endfor for i = 0 to 99 step 4 do A[ i 0] A[ i 1] A[ i 2] A[ i 3] A[1] A[0] b) a) Figure 6: Example of modulo unrolling. a) ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, 22(3), September 1994.


Object Based Concurrency for Data Parallel Applications.. - Diaconescu (2002)   (Correct)

....operator) The existing data parallel frameworks aimed at parallelizing the iterations of a loop encounter problems for applications as in Figure 8.1 because of statements like 3 and 6. The references j j and kk can only be found at run time. The existing run time frameworks solve this problem [38] in the context of fine grain parallelism, based on linear array representations and affine index expressions. Thus, they do not address the problem in statements 3 and 6. The work in [69] is an example of a unified treatment of regular, dense array computations with simple access patterns and ....

....cannot result in a completely connected graph. Additionally, it is not realistic to partition a tightly connected problem of size N onto N processors. Other data parallel frameworks that address irregular applications use distributed schedules to keep information on irregularly distributed data [38]. In [38] the map of the global array data is block distributed between processors and the location of an array element is symbolically computed. The accesses through an indirection array cannot be found until run time and thus a regular partitioning cannot account for data locality. To improve ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Evaluating the Impact of Memory System Performance on Software.. - Badawy, al. (2001)   (5 citations)  (Correct)

....bring accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations can be automated using an inspector executor approach developed for messagepassing machines [12]. In this paper, we apply efficient partitioning algorithms to the input data to bring reuse closer together, then follow up by lexicographically sorting loop iterations based on data access patterns, using algorithms specified elsewhere [17, 18] Our partitioning algorithm works by viewing data ....

....for linear algebra codes [23, 45, 11] and multiple loop nests across time step loops [43] In comparison we apply tiling to 3D stencil codes which cannot be tiled with existing methods. Researchers have examined irregular computations mostly in the context of parallel computing, using run time [12] and compiler [24] support to support accesses on message passing multiprocessors. A few have also looked at techniques for improving locality [1, 13] Few researchers have investigated data layout transformations for pointer based data structures. Chilimbi et al. investigated allocation time and ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


A Scientific Data Management System for Irregular.. - No, Thakur, Kaushik..   (Correct)

....of Energy, under Contract W 31 109 Eng 38, and in part by a Work for Others Subaward No. 751 with the University of Illinois, under NSF Cooperative Agreement #ACI 9619019. ularly discretized mesh. The data accesses in those applications make extensive use of arrays, called indirection array [7, 24] or map array [10] in which each value of the array denotes the corresponding data position in memory or in the file. The data distribution in irregular applications can be done either by using compiler directives with the support of runtime preprocessing [11, 12] or by using a runtime library ....

....or map array [10] in which each value of the array denotes the corresponding data position in memory or in the file. The data distribution in irregular applications can be done either by using compiler directives with the support of runtime preprocessing [11, 12] or by using a runtime library [7, 24]. Most of the previous work in the area of unstructured grid applications focuses mainly on computation and communication in such applications, not on I O. We have developed a software system for large scale scientific data management, called Scientific Data Manager (SDM) 23] that combines the ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Maps: A Compiler-Managed Memory System for Raw Machines - Barua, Lee, Amarasinghe.. (1998)   (17 citations)  (Correct)

....split phase code generation and dependence inheritance; e) one possible outcome after partitioning. for i = 0 to 99 step 4 do A[ i 0] endfor A[ i 1] A[ i 2] A[ i 3] c) b) a) for i = 0 to 99 do A[ i ] endfor A[0] A[4] A[8] A[1] A[5] A[9] A[2] A[6] A[10] A[3] A[7] A[11] Tile 0 Tile 1 Tile 2 Tile 3 Unrolling Modulo Figure 6: Example of modulo unrolling. a) shows the original code; b) shows the distribution of array A on a 4 processor Raw machine; c) shows the code after unrolling. After ....

....researchers have parallelized some of the benchmarks in this paper. Automatic parallelization has been demonstrated to work well for dense matrix scientific codes [8] In addition, some irregular scientific applications can be parallelized on multiprocessors using the inspector executor method [5]. Typically these techniques involve user inserted calls to a runtime library such as CHAOS [12] and are not automatic. The programmer is responsible for recognizing cases amenable to such parallelization, namely those where the same communication pattern is repeated for the entire duration of ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, 22(3), September 1994.


Evaluating the Impact of Memory System Performance on.. - Badawy, Aggarwal.. (2001)   (5 citations)  (Correct)

....bring accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations can be automated using an inspector executor approach developed for message passing machines [12]. In this paper, we apply efficient partitioning algorithms to the input data to bring reuse closer together, then follow up by lexicographically sorting loop iterations based on data access patterns, using algorithms specified elsewhere [17, 18] Our partitioning algorithm works by viewing data ....

....for linear algebra codes [23, 45, 11] and multiple loop nests across time step loops [43] In comparison we apply tiling to 3D stencil codes which cannot be tiled with existing methods. Researchers have examined irregular computations mostly in the context of parallel computing, using run time [12] and compiler [24] support to support accesses on message passing multiprocessors. A few have also looked at techniques for improving locality [1, 13] Few researchers have investigated data layout transformations for pointer based data structures. Chilimbi, Hill, and Larus investigate ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


HPspmd: Data Parallel SPMD Programming Models from Fortran to.. - Carpenter, Fox (1998)   (Correct)

.... NWChem [3, 27] While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS PARTI [35, 16] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library based SPMD approach to data parallel programming certainly lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all ....

....applications, but of course they are far from exclusive. Many important problems involve data structures too irregular to express purely through HPF style distributed arrays. Our third class of libraries therefore includes libraries designed to support irregular problems. These include CHAOS [35, 16] and DAGH [34] We anticipate that irregular problems will still bene t from regular dataparallel language extensions (because, at some level they usually resort to representations involving regular arrays) But lower level SPMD programming, facilitated by specialized class libraries, are likely ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


Semantic Checking in HPJava - Carpenter, Fox, Zhang (2000)   (Correct)

.... Toolkit [11] While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS PARTI [4] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library based SPMD approach to data parallel programming lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all describe ....

....and v are N 2. The distribution group of b can be identi ed with the distribution group of the parent array a. But sections constructed using a scalar subscript, eg 2 For a sequential dimension the result of rng(r) is a member of the subclass CollapsedRange. 6 a[7,0] a[7,1] a[7,2] a[5,3] a[5,4] a[5,5] a[4,6] a[4,7] a[6,0] a[6,1] a[6,2] a[5,0] a[5,1] a[5,2] a[6,6] a[6,7] a[7,6] a[7,7] a[5,6] a[5,7] a[7,3] a[7,4] a[7,5] a[6,3] a[6,4] a[6,5] a[0,0] a[0,1] a[0,2] a[0,3] a[0,4] a[0,5] a[0,6] a[0,7] a[4,3] a[4,4] a[4,5] a[1,0] a[1,1] a[1,2] a[1,6] a[1,7] a[2,6] a[2,7] a[3,6] ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


Towards a Java Environment for SPMD Programming - Carpenter, Zhang, Fox, Li, Li, ..   (8 citations)  (Correct)

....subscripts cannot be computed randomly in parallel loops without violating the fundamental SPMD restriction that all accesses be local. This is not regarded as a shortcoming: on the contrary it forces explicit use of an appropriate library package for handling irregular accesses (such as CHAOS [6]) Of course a suitable binding of such a package is needed in our language. A complementary approach to communication in a distributed array environment is the one sided communication model of Global Arrays (GA) 9] For task parallel problems this approach is often more convenient than the ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


An HPspmd Programming Model (Extended Abstract) - Carpenter, al. (2000)   (Correct)

.... and NWChem [3] While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS PARTI [8] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library based SPMD approach to data parallel programming certainly lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all ....

....applications, but of course they are far from exclusive. Many important problems involve data structures too irregular to represent purely through HPF style distributed arrays. Our third category of libraries therefore includes libraries designed to support irregular problems. These include CHAOS [8] and DAGH [19] We anticipate that irregular problems will still bene t from regular data parallel language extensions at some level they usually resort to representations involving regular arrays. But lower level SPMD programming, facilitated by specialized class libraries, is likely to take a ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


Language Bindings for a Data-Parallel Runtime - Carpenter, Fox, Leskiw, Li.. (1998)   (3 citations)  (Correct)

....expressivity comparable to full HPF. 2 Background: runtime kernel The kernel of NPAC library is a C class library. It is most directly descended from the run time library of an earlier research implementation of HPF [7] with in uences from the Fortran 90D run time and the CHAOS PARTI libraries [1, 11, 5]. The kernel is currently implemented on top of MPI. The library design is solidly object oriented, but eciency is maintained as a primary goal. The overall architecture of the library is illustrated in gure 1. At the top level there are several compilerspeci c interfaces to a common run time ....

....subscripts cannot be computed randomly in parallel loops without violating the fundamental SPMD restriction that all accesses be local. This is not regarded as a shortcoming: on the contrary it forces explicit use of an appropriate library package for handling irregular accesses (such as CHAOS [5]) Of course a suitable binding of such a package is needed in our language. A complementary approach to communication in a distributed array environment is the one sidedcommunication model of Global Arrays (GA) 8] For task parallel problems this approach is often more convenient than the ....

R. Das, M. Uysal, J. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, Sept. 1994.


A Comparison of Parallelization Techniques for Irregular.. - Han, Tseng (2001)   (2 citations)  (Correct)

....compilers for distributed memory multiprocessors. Since these compilers must generate explicit interprocessor communication for nonlocal accesses, the compiler first generates an inspector to preprocess the reduction, creating a list of data to be communicated as well as assigning their locations [4]. The reduction is then transformed into an executor which gathers nonlocal data, performs the computation using local buffers, and scatters nonlocal results to other processors. Fortunately, this level of precision is not needed for shared memory multiprocessors. In this section, we review a ....

....applications is well established [13, 16] Irregular reductions have been recognized as being particularly vital. The inspector executor paradigm was first developed as a general scheme to optimize irregular applications for message passing multiprocessors and used in the CHAOS run time system [4, 14]. Compiler techniques were developed to automatically generate calls to the run time routines [11] Researchers have also developed techniques to improve the data locality of irregular computations, using dynamic copying of data elements [5] or partitioning computation and data [1, 15] We ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Locality Optimizations For Adaptive Irregular Scientific Codes - Han, Tseng (2000)   (2 citations)  (Correct)

....run time techniques first developed in the context of identifying interprocessor communication for message passing machines. Saltz et al. designed a compiler which can generate calls to an inspector to process memory access patterns at run time to identify non local data needed by each processor [6]. Inspectors for locality optimizations are given as run time library calls. These locality inspectors are inserted by compilers and modify the data and computation order at run time. The inspectors are expensive, but can be also amortized over many time steps. Researchers on data parallel ....

.... ways to provide efficient run time and compiler support as well as efficient implementations of parallel irregular reductions [8, 18] The inspector executor paradigm used by locality optimizations was first developed for message passing multiprocessors and employed in the CHAOS run time system [6]. Most research on improving data locality has focused on dense arrays with regular access patterns. Locality optimizations have also been developed in the context of sparse linear algebra [4] Some researchers have investigated improving locality for irregular scientific applications. Das et al. ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Considerations in HPJava language design and implementation - Zhang, Carpenter, Fox, Li, .. (1998)   (5 citations)  (Correct)

....locality and parallelism are encouraged. It also dramatically simpli es the work of the compiler. Because the communication sector is considered an add on to the basic language, HPJava should interoperate more smoothly than HPF with other successful SPMD libraries, including MPI [6] CHAOS [4], Global Arrays [8] and so on. 4.3 Datatypes in HPJava In a parallel language, it is desirable to have both local variables (like the ones in MPI programming) and global variables (like the ones in HPF programming) The former provide exibility and are ideal for task parallel programming; the ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


HPJava: data parallel extensions to Java - Carpenter, Zhang, Fox, Li, Wen (1997)   (28 citations)  (Correct)

....parallel computation, including shared memory programming, explicit message passing, and array parallel programming. Other paradigms (for example, Linda or coarse grained data ow) may come later, to1 gether with bindings to higher level libraries and application speci c libraries such as CHAOS [7], ScaLAPACK [1] Global Arrays [8] or DAGH [9] This is a large vision, and the current article only discusses some rst steps towards a general framework. In particular we will make speci c proposals for the sector of HPJava most directly related to its namesake: High Performance Fortran. We ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientic computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.


Software Support For Improving Locality in Advanced Scientific Codes - Tseng (2000)   (Correct)

....for both irregular meshes and pointer based codes, and will be usable by either compilation systems or programmers who wish to tune their application performance by hand. We will model our software distribution along the lines used by Prof. Joel Saltz in distributing his CHAOS run time library [21]. Finally, we intend to collaborate with Prof. Jeff Hollingsworth, who is working on user performance tools to isolate and display memory bottlenecks using hardware cache miss counters [6] Similar tools such as MemSpy have proven very useful in helping users identify problematic data layouts and ....

....techniques which can be automated in a compiler. Compared to existing research, we attempt to find compiler and run time solutions to locality problems specific to 3D iterative solvers. We point out several difficult problems and propose solutions. Researchers have investigated both run time [21, 35] and compiler [14, 33, 53, 52, 68] support for irregular computations, mostly in the context of message passing multiprocessors. Researchers have also examine efficiently supporting irregular computations on software DSMs [5, 30, 53, 63] including hybrid regular irregular codes [12, 50, 78] ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


A Comparison of Locality Transformations for Irregular Codes - Han, Tseng (2000)   (8 citations)  (Correct)

.... in a compiler [25] Researchers have investigated compiler analyses [22, 23] and ways to provide efficient run time and compiler support [4, 29] Libraries which can efficiently communicate data between processors using an inspector executor approach were key to achieving good performance [7, 16, 21]. Compiler techniques for automatically generating and inserting these inspectors were developed [16, 13, 14] Most research on improving data locality has focused on dense arrays with regular access patterns, applying either loop [26, 33] or data layout [18, 30] transformations. Locality ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Locality Optimizations For Adaptive Irregular Scientific Codes - Han, Tseng (2000)   (2 citations)  (Correct)

....run time techniques first developed in the context of identifying interprocessor communication for message passing machines. Saltz et al. designed a compiler which can generate calls to an inspector to process memory access patterns at run time to identify non local data needed by each processor [6]. Inspectors for locality optimizations are given as run time library calls. These locality inspectors are inserted by compilers and modify the data and computation order at run time. The inspectors are expensive, but can be also amortized over many time steps. Researchers on data parallel ....

.... ways to provide efficient run time and compiler support as well as efficient implementations of parallel irregular reductions [8, 18] The inspector executor paradigm used by locality optimizations was first developed for message passing multiprocessors and employed in the CHAOS run time system [6]. Most research on improving data locality has focused on dense arrays with regular access patterns. Locality optimizations have also been developed in the context of sparse linear algebra [4] Some researchers have investigated improving locality for irregular scientific applications. Das et al. ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


A Parallel Software Infrastructure for Dynamic Block-Irregular.. - Kohn (1995)   (12 citations)  (Correct)

....mechanisms are inadequate for dynamic problems. Such applications will also require sophisticated run time support to manage changing data distributions and communication patterns. A number of run time support systems have already been developed, including CHAOS (formerly called PARTI) [60], multiblock PARTI [3] and Multipol [40] Both CHAOS and multiblock PARTI have been used as run time support for data parallel Fortran compilers. CHAOS has been very successful in addressing unstructured problems such as sparse linear algebra and finite elements [58] Multiblock PARTI has been ....

.... (an unstructured Region) to address other classes of irregular scientific applications, such as unstructured finite element 38 problems and irregularly coupled regular meshes [45] The goal of SA is to unify several previous domain specific systems, including LPARX, multiblock PARTI [3] and CHAOS [60]. 2.4.2 Parallel Languages The parallel programming literature describes numerous languages, each of which provides facilities specialized for its own intended class of applications. In the following survey, we evaluate various parallel languages on their ability to solve the dynamic, ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang, Communication optimizations for irregular scientific computations on distributed memory architectures, Journal of Parallel and Distributed Computing, (to appear).


EXPLORER: Supporting Run-time Parallelization of DO-ACROSS Loops .. - Liu, King   (Correct)

....and NCHC 86 08 024. gain by running such programs on NOWs without special hardware or software supports. It follows that only few attempts have been made so far for runtime parallelization on distributed memory systems, not to mention on NOWs. Furthermore, only restricted cases have been solved [3, 4, 5]. This paper reports a new attempt to runtime parallelization, with an emphasis on NOWs without special hardware or system software supports. Our approach, called EXPLORER, can handle loops with and without loop carried dependences. To address the problem of high communication overhead in NOWs, ....

....array of f to indicate which element is to be written in which iteration. Given the inverse array, the loop in Fig. 1(b) can be handled in the same way as that in Fig. 1(a) Thus, we will mainly consider the latter in the following discussions. 2. 2 Previous Approaches The CHAOS PARTI system [3] provides a set of library routines to identify data access patterns at runtime and to optimize data distribution and communication based on the inspector executor model. The routines hide low level messaging passing details of a distributed memory system from the intended irregular applications. ....

R. Das, M. Uysal, J. Saltz, and Y. S. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures," Jorunal of Parallel and Distributed Computing, 22(3), pp.462-479, September 1994.


Flexible Communication Mechanisms for Dynamic Structured.. - Fink, Baden, Kohn (1996)   (19 citations)  (Correct)

....to implement due to elaborate, dynamic data structures. Since these structures give rise to unpredictable communication patterns, parallelization is difficult. To ease the programmer s burden, programming languages and libraries can hide many low level details of a parallel implementation [1, 2, 3, 4, 5, 6]. We present Kernel Lattice Parallelism (KeLP) a C class library that provides high level abstractions to manage data layout and data motion for dynamic irregular block structured applications 1 . KeLP supports data orchestration, a model which enables the programmer to express dependence ....

....1b, the irregularly shaped fine level communicates with the irregularly shaped coarse level over the points falling in the shadow cast by the fine level. KeLP represents a data motion pattern between XArrays using the MotionPlan abstraction. The MotionPlan is a first class communication schedule [3, 4] object representing an atomic block copy between XArrays. The programmer builds and modifies a MotionPlan using Region calculus operations. This functionality gives the user powerful mechanisms to describe highly irregular data motion patterns. To understand how to build a MotionPlan we must ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures," Journal of Parallel and Distributed Computing, vol. 22, no. 3, pp. 462--479, 1994.


Maps: A Compiler-Managed Memory System for Software-Exposed.. - Barua (2000)   (1 citation)  (Correct)

....have parallelized some of the benchmarks in this paper for multiprocessors. Automatic parallelization has been demonstrated to work well for dense matrix scientific codes [22, 23] In addition, some irregular scientific applications can be parallelized using the inspector executor method [70]. Typically these techniques involve user inserted calls to a runtime library such as CHAOS [71] and are not automatic. The programmer is responsible for recognizing cases amenable to such parallelization, namely those where the same communication pattern is repeated for the entire duration of ....

....the same communication pattern is repeated for the entire duration of the loop, and inserting several library calls. In contrast, the Maps approach is more general and requires no user intervention. Its generality stems from its exploitation of ILP rather than coarse grain parallelism targeted by [70] and [22] Multiprocessors are mostly restricted to such coarse grain parallelism because of their relatively high communication and synchronization costs. Unfortunately, finding coarse grain parallelism often requires whole program analysis by the compiler, which works well only in restricted ....

Raja Das, Mustafa Uysal, Joel Saltz, and Yuan-Shin Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, 22(3), September 1994.


Improving Locality For Adaptive Irregular Scientific Codes - Han, Tseng (1999)   (4 citations)  (Correct)

.... efficient run time and compiler support [6, 39] as well as efficient implementations of parallel irregular reductions [14, 16, 47] The inspector executor paradigm used by locality optimizations was first developed for message passing multiprocessors and employed in the CHAOS run time system [10]. Compiler techniques were developed to automatically generate calls to the run time routines [18, 19] An algorithm for building incremental communication schedules was implemented to reduce processing overhead for adaptive irregular codes [22] In comparison, locality optimizations do not need ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Design and Implementation of a Parallel I/O.. - No, Park.. (1998)   (1 citation)  (Correct)

....of these problems is normally irregular, for example, describing a physical structure, which is discretized. Figure 2(a) illustrates an example of irregular programs in which x 5 and y are data arrays and a(i) b(i) c(j) and d(j) which are used to index data arrays are called indirection arrays [4]. This problem can be abstracted into the one shown in Figure 2(b) and (c) We assume that data is distributed using some partitioning scheme (which may be application dependent) On each node, there is an indirection array which describes the location of the corresponding data element in a global ....

Raja Das, Mustafa Uysal, Joel Saltz, and Yuan-Shin Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Improving Fine-Grained Irregular Shared-Memory Benchmarks.. - Hu, Cox, Zwaenepoel (2000)   (9 citations)  (Correct)

....make the right choice. We have incorporated a data reordering library in a number of fine grained irregular programs from these standard benchmark suites. In particular, we have used Barnes Hut, FMM, and Water Spatial from SPLASH 2 [33] and Moldyn and Unstructured from the Chaos benchmark suite [11]. We have evaluated the modified programs on a hardware shared memory machine (a 16 processor SGI Origin 2000 [23] and two software shared memory systems (TreadMarks [1] and HLRC [35] on a cluster of 16 Pentium II based computers) Our results show that data ordering during initialization ....

....time to obtain a full page is 1,308 microseconds. 4.2. Applications We used five irregular applications in this study: Barnes Hut with sequential tree building, Fast Multipole Method, and Water Spatial from the SPLASH 2 benchmark suite [33] Moldyn and Unstructured from the Chaos benchmark suite [11]. 2 2 In their original form, the Chaos benchmarks are message passing programs that use the Chaos collective communication library. We ported P2 P3 P0 Hilbert Row major P0 P1 P1 P2 P3 Figure 6: Boundary points using the Hilbert ordering are likely to be in more pages than using row ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994.


An HPspmd Programming Model (Extended Abstract) - Carpenter, al. (1999)   (Correct)

.... and NWChem [3] While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS PARTI [8] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library based SPMD approach to data parallel programming certainly lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all ....

....applications, but of course they are far from exclusive. Many important problems involve data structures too irregular to represent purely through HPF style distributed arrays. Our third category of libraries therefore includes libraries designed to support irregular problems. These include CHAOS [8] and DAGH [19] We anticipate that irregular problems will still benefit from regular data parallel language extensions at some level they usually resort to representations involving regular arrays. But lower level SPMD programming, facilitated by specialized class libraries, is likely to take a ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Evaluating the Impact of Memory System Performance on .. - Aggarwal, Badawy.. (2000)   (5 citations)  (Correct)

....accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations can be automated using an inspector executor approached developed for message passing machines [13]. In many irregular scientific applications, each loop iteration tends to compute interactions for a pair of data. Such interactions tend to occur between nearby data items. Partitioning data based on either geometric coordinate data or the underlying interaction graph can thus increase the ....

....for linear algebra codes [24, 44, 12] and multiple loop nests across time step loops [42] In comparison we apply tiling to 3D stencil codes which cannot be tiled with existing methods. Researchers have examined irregular computations mostly in the context of parallel computing, using runtime [13] and compiler [25] support to support accesses on message passing multiprocessors. A few have also looked at techniques for improving locality [1, 14] Few researchers have investigated data layout transformations for pointer based data structures. Chilimbi, Hill, and Larus investigate ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Semantic Checking in HPJava - Carpenter, Fox, Zhang   (Correct)

.... Toolkit [11] While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS PARTI [4] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library based SPMD approach to data parallel programming lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all describe ....

....and v are N 2. The distribution group of b can be identified with the distribution group of the parent array a. But sections constructed using a scalar subscript, eg 2 For a sequential dimension the result of rng(r) is a member of the subclass CollapsedRange. 6 a[7,0] a[7,1] a[7,2] a[5,3] a[5,4] a[5,5] a[4,6] a[4,7] a[6,0] a[6,1] a[6,2] a[5,0] a[5,1] a[5,2] a[6,6] a[6,7] a[7,6] a[7,7] a[5,6] a[5,7] a[7,3] a[7,4] a[7,5] a[6,3] a[6,4] a[6,5] a[0,0] a[0,1] a[0,2] a[0,3] a[0,4] a[0,5] a[0,6] a[0,7] a[4,3] a[4,4] a[4,5] a[1,0] a[1,1] a[1,2] a[1,6] a[1,7] a[2,6] a[2,7] a[3,6] ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Evaluating Locality Optimizations For Adaptive Irregular.. - Han, Tseng   (Correct)

.... efficient run time and compiler support [6, 40] as well as efficient implementations of parallel irregular reductions [14, 16, 50] The inspector executor paradigm used by locality optimizations was first developed for message passing multiprocessors and employed in the CHAOS run time system [10]. Compiler techniques were developed to automatically generate calls to the run time routines [18, 19] An algorithm for building incremental communication schedules was implemented to reduce processing overhead for adaptive irregular codes [23] In comparison, locality optimizations do not need ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Java Data Parallel Extensions with Runtime System Support - Wen, Carpenter, Fox, Zhang (1998)   (1 citation)  (Correct)

....in the operation of distributed arrays are managed in the run time library. With the SPMD programming model, we can handle the regular access in the distributed arrays efficiently. With the genuinely irregular case, the necessary subscripting cannot usually be directly expressed in our language [5]. We need to explicitly use an appropriate library package to handle it. The language extensions described were devised partly to provide a convenient interface to a distributed array library developed in the Parallel Compiler Runtime Consortium (PARC) project [4] The pre compiler to translate ....

R. Das, M. Uysal, J.H. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


An Evaluation of Computing Paradigms for N-body.. - McCurdy, Mellor-Crummey (1999)   (Correct)

....These non local values need only be gathered and stored once on that processor. Figure 5 shows what the sample code might look like after transformation to inspector executor style. The statements within would be translated to one or more calls to routines from a library such as CHAOS [4]. To use values collected by an inspector based communication, the indirection array (INTER in Figure 4) must be updated to re ect the locations of the nonlocal elements of the indexed array (C) before use in the executor loop. In their standard usage, inspectorexecutors transform potentially ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scienti c computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, Sept. 1994. 11


Efficient Compiler and Run-Time Support for Parallel Irregular.. - Han, Tseng (2000)   (4 citations)  (Correct)

....multiprocessors which require explicit messages to access nonlocal data. One approach to parallelizing irregular reductions, taken by distributed memory compilers which generate explicit interprocessor messages (e.g. Fortran D [20] is to rely on sophisticated run time systems (e.g. CHAOS [5], PILAR [26] which can identify and gather nonlocal data. A second approach is to combine shared memory compilers (e.g. SUIF [11] with software DSM systems (e.g. TreadMarks [30] CVM [24] which provide a shared memory interface. Software DSMs are less efficient than explicit messages, but ....

....latency. At run time, the contents of the index array are analyzed and nonlocal data requests are aggregated. Reduction results are stored in a local buffer, then globally updated in parallel in a pipelined manner. Experiments show performance on an SP 2 is comparable to that achieved by CHAOS [5], which supports irregular reductions using GatherScatter [30] 1.3 Contributions In this paper, we introduce LocalWrite, a new compiler and run time parallelization technique which can improve performance for certain classes of irregular reductions. We evaluate the performance of different ....

[Article contains additional citation context not shown here]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


A Comparison of Locality Transformations for Irregular Codes - Hwansoo Han Chau-Wen (2000)   (8 citations)  (Correct)

.... in a compiler [23] Researchers have investigated compiler analyses [20, 21] and ways to provide efficient run time and compiler support [4, 27] Libraries which can efficiently communicate data between processors using an inspector executor approach were key to achieving good performance [7, 14, 19]. Compiler techniques for automatically generating and inserting these inspectors were developed [14, 11, 12] Most research on improving data locality has focused on dense arrays with regular access patterns, applying either loop [24, 31] or data layout [16, 28] transformations. Locality ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


The Use of Java in High Performance Computing: A Data Mining.. - Walker, Rana   (Correct)

....and combines tools, class libraries and language extensions to support parallel processing paradigms such as shared memory programming, message passing and array parallel programming. Once such a framework is in place, bindings to higher level libraries and application specific codes such as CHAOS [7], and ScaLAPACK [6] may also be developed. The first step in developing extensions to Java for parallel programming, is to introduce characteristic ideas of other high performance languages, such as the distributed array model and array intrinsic functions and libraries of HPF. The resulting ....

R. Das, M. Uysal, J. Salz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, 1994.


An Overview of the RAPID Run-time System for Parallel.. - Tao Yang And   (Correct)

....factorizations. 1 Introduction Program transformation and parallelization techniques for structured codes have been shown successful in many application domains [WCD 95] However it is still difficult to parallelize unstructured codes, which can be found in many scientific applications. In [DUSH94] an important class of unstructured and sparse problems which involve iterative computations is identified. The CHAOS PARTI system has been developed to perform run time preprocessing that extracts data accessing patterns through distributed indirection arrays, optimizes data distribution and ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures . Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Runtime and Language Support for Compiling Adaptive.. - Hwang, Moon, Sharma (1995)   (25 citations)  Self-citation (Das Saltz Hwang)   (Correct)

No context found.

Raja Das, Mustafa Uysal, Joel Saltz and Yuan-Shin Hwang, `Communication optimizations for irregular scientific computations on distributed memory architectures', Journal of Parallel and Distributed Computing, 22(3), 462--479 (1994). Also available as University of Maryland Technical Report CS-TR-3163 and UMIACS-TR-93-109.


Processing Large-Scale Multidimensional Data in.. - Beynon, Chang.. (2002)   (2 citations)  Self-citation (Saltz)   (Correct)

....the number of processors available for transparent copies of a filter. In such situations DD should perform well, outperforming the other write policies. 29 5 Related Work Reduction operations have long been recognized as an important source of parallelism for many scientific applications [26,35,36,67]. Most techniques for optimizing parallel reductions have been developed for scenarios where data can fit into processor memory, and the main goal is to partition the iterations among processors to achieve good load balance with low induced interprocessor communication overhead. Brezany et. al ....

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994.


Array Prefetching for Irregular Array Accesses in Titanium - Jimmy Su And (2004)   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures", Journal of Parallel and Distributed Computing, 1993.


A New Optimization Technique for the Inspector-Executor Method - Yokota, Chiba, Itano (2002)   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz and Y. S. Hwang, Communication optimizations for irregular scientific computations on distributed memory architectures, Technical Report CS-TR-3163, University of Maryland, Oct., 1993.


Predicting Hierarchical Phases in Program Data Behavior - Xipeng Shen Yutao   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Adaptive Data Partitioning Using Probability Distribution - Shen, Zhong, Ding (2003)   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Automatic Parallelization of the Conjugate Gradient.. - Kotlyar, Pingali, Stodghill   (Correct)

No context found.

Raja Das, Mustafa Uysal, Joel Saltz, and Yuan-Shin Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994. Also available as University of Maryland Technical Report CS-TR-3163 and UMIACS-TR-93-109.


Array Prefetching for Irregular Array Accesses in Titanium - Su, Yelick (2004)   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures", Journal of Parallel and Distributed Computing, 1993.


Predicting Whole-Program Locality Through Reuse Distance Analysis - Ding, Zhong (2003)   (6 citations)  (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994.


Adaptive Data Partition Using Probability Distribution - Xipeng Shen And   (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994.


Improving Effective Bandwidth through Compiler Enhancement of.. - Ding (2000)   (10 citations)  (Correct)

No context found.

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.


Optimization Techniques for Parallel Codes of Irregular.. - Guo, Chang, Pan (2003)   (Correct)

No context found.

) Das, R., Uysal, M., Saltz, J. and Hwang, YS. : Communication optimizations for irregular scientific computations on distributed memory architectures, Journal of Parallel and Distributed Computing, Vol.22, No.3, pp.462--479 (Sept. 1994).

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC