| K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, February 1990. |
....by Ballance et al. 18] Their # and # functions are also similar to ADG transformer nodes. However, the motivations behind them are very different. 5. 2 The preference graph Another representation that has been used in alignment analysis is the preference graph, which has several variants [11, 19, 20, 21]. The preference graph is an undirected, weighted graph constructed from the reference patterns in the program. The nodes of the preference graph correspond to dimensions of array occurrences in the program, the edges reflect beneficial alignment relations, and the weights reflect the relative ....
....graph, some alignment preferences may remain unhonored and will result in residual communication. In this case, we must decide which preferences to honor and which to break. There are two principal variants of this general framework. The Knobe Lukas Steele algorithm Knobe, Lukas, and Steele [19] call the nodes of the preference graph cells. They introduce an additional attribute of cells (called independence anti preference) to model the constraint that a dimension occurrence has sufficient parallelism and should not be serialized. An alignment conflict can only occur within a cycle of ....
[Article contains additional citation context not shown here]
Kathleen Knobe, Joan D. Lukas, and Guy L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, February 1990.
....data locality is important when compiling array parallel languages for distributed memory parallel computers. The languages mentioned above require the user to provide data placement directives in the source code. There has also been considerable interest in automating the task of data placement [2, 5, 6, 16, 17, 18, 22]. This compiler optimization is important for insuring the portability of new scientific codes and for supporting old codes developed without a distributed memory model in mind. Completion time has two components: computation and communication. Communication can be separated into intrinsic and ....
....offsets, we also allow the components of f to be affine functions of loop induction variables. We call such alignments mobile. We use linear programming to determine static offset al..ignments, and extend this algorithm to determine good mobile alignments. 1. 3 Related work Knobe, Lukas, and Steele [17] laid the foundation for data layout optimization. They addressed axis, stride, and offset al..ignment in a unified framework. Our algorithm for axis and stride alignment amplifies their claims of the importance of data layout optimization, and improves upon their methods in several ways. First, we ....
[Article contains additional citation context not shown here]
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
....= A#k,1:100# V#k:k 99# enddo enddo (a) b) Figure 1: a) A Fortran 90 program fragment requiring mobile alignment. b) Amobile alignment for the program fragment. the optimal replication strategy can be reduced to a network flow problem. Several other authors have considered static alignment [2, 9, 12, 13, 17]. Our earlier research [4, 5, 8] dealt with static alignment. We extend that work to handle mobile alignment here. Knobe, Lukas, and Steele [12] and Knobe, Lukas, and Dally [11] address the issue of dynamic alignment. Their notion of dynamic alignment is alignment depending on quantities whose ....
....fragment. the optimal replication strategy can be reduced to a network flow problem. Several other authors have considered static alignment [2, 9, 12, 13, 17] Our earlier research [4, 5, 8] dealt with static alignment. We extend that work to handle mobile alignment here. Knobe, Lukas, and Steele [12] and Knobe, Lukas, and Dally [11] address the issue of dynamic alignment. Their notion of dynamic alignment is alignment depending on quantities whose values are known only at runtime, which may include loop induction variables as well as other arbitrary runtime values. This paper focuses on ....
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
....In Sect. 2 we briefly review related works, then in Sect. 3 we describe the problem we consider and present its modelization in Sect. 4. We then detail the tools used to solve our problem along with an example in Sect. 5. And we finally conclude in Sect. 6. 2 Related Work Many researchers [1,10,11,14 16] have studied automatic data distribution. Estimating communication costs has been the key factor to determine the quality of a data distribution. Most of the previous works have studied compile time estimation of these communication costs. Indeed, they use the fact that most program parameters ....
Kathleen Knobe, Joan D. Lukas, and Guy L. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, 1990.
....the WARP machine. The techniques used in the AL compiler need to be extended significantly to allow the distribution of multiple dimensions, and to recognize automatically which dimensions to distribute. Knobe, Lukas, and Steele have developed techniques for automatic data layout on SIMD machines [41, 42]. They use the concept of preferences between data references to guide the layout process, which is similar in spirit to our use of constraints to guide the choice of data distribution parameters. A significant feature unique to our approach is the analysis carried out to record the quality ....
....processor mesh) is one, the problem becomes trivial, as all array dimensions are mapped to the same mesh dimension. The problem in its general form was first discussed and formulated in graph theoretic terms by Li and Chen [48] The idea of conformance preference, introduced by Knobe et al. [41] in the context of SIMD machines is also similar, but directed towards individual array elements instead of array dimensions. We use the Component Affinity Graph (CAG) framework [48] for the alignment problem. The CAG constructed for the program has nodes representing dimensions of arrays. For ....
K. Knobe, J. Lukas, and G. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102-- 118, 1990.
.... explicit mapping of the data onto the topol ogy [16, 9, 14] others are more abstract and offer ei ther sets of directives for the compiler or interactive or knowledge based environments that help determine the alignment of array dimensions and mapping func tions [3, 10, 7, 1, 2] Recent work [5, 6, 4, 15, 8] focuses on static compile time analysis to automatically find a data decomposition that achieves both goals for vector and data parallel operations. Modula 2 [17] is designed for high level, problem oriented, and machine independent parallel programming. The programmer can focus on the problem ....
Kathleen Knobe, Joan D. Lukas, and Guy L. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102-118, February 1990.
....acute when the communication is synchronous, such as in the case of SIMD machines. In addition, different aJJgnments of multi dimensions2 arrays on a grid connected SIMD architecture result in different communication patterns during paraJlel program execution. The usus2 approach to this problem [28, 29] is to select the best aJJgnment for each array in the program independently of other arrays. Hence, such an approach does not succeed when the independently found aJJgnments conflict with each other. Similarly, the s2gorithm presented in [30] finds the minimum communication cost of evs2uating an ....
K. Knobe, J. Lukas, and G. Steele Jr., "Data optimization: Allocation of arrays to reduce communication on SIMD machines," Journal of Parallel and Distributed Computing, 1990, vol. 8, pp. 112-118.
....an algorithm for partitioning doing a global analysis across loops. They allow simple index expression accesses of the form c i c2, but not general affine functions. They do not allow for the possibility of hyperparallelepiped data tiles, and do not account for caches. Knobe, Lucas and Steele [9] give a method of allocating arrays on SIMD machines. They align arrays to minimize communication for vector instructions, which access array regions specified by subranges on each dimension. Wolf and Lain [13] deal with the problem of taking sequential nested loops and applying transformations ....
Kathleen Knobe, Joan Lukas, and Guy Steele Jr. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines. Journal of Parallel and Distributed Computing, 8(2): 102-118, August 1990.
....this proposal uses the terms static software DSM and dynamic software DSM to refer to members of the first and second class, respectively. Static Approaches Static software DSM systems are typified by compilers for FORTRAN style scientific codes targeting message passing multicomputers [9, 41, 51, 64, 79, 87]. These are typically data parallel systems with a single thread of control; parallelism can only be expressed in the form of a large number of similar (possibly identical) operations applied in parallel to elements of large, dense arrays according to some user specified iteration space (parallel ....
K. Knobe, J. Lukas, and G. Steele Jr. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines. Journal of Parallel and Distributed Computing, pages 102--118, 1990.
....in such cases either by manual analysis of the design or using experiments on a simulator. Compiler Optimizations Most compiler optimizations are in essence choices between several ways of performing a set of operations. the optimizer tries to choose the fastest way. A great deal of research [27, 67, 75] has been done for example in the area of optimizing data distributions, alignment, and redistributions in data parallel languages such as High Performance Fortran [68, 59] While some of that research uses fairly crude cost models, some of the research on performance modeling for high performance ....
K. Knobe, J. D. Lucas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, 1990.
....where each processor has a separate address space, e.g. CM 5 or Intel iPSC. It is usually assumed that the programmer specifies how data is distributed and the compiler tries to optimize communication by grouping references to remote data so the high cost of remote accesses can be amortized [5, 10, 11, 15, 14, 16, 17, 19, 21]. These methods only work well when the granularity of the computation is large and regular. Some recent work has looked at compilation for machines with a shared address space, physically distributed memory and globally coherent caches [3, 6] In these machines, each processor controls a local ....
Kathleen Knobe, Joan Lukas, and Guy Steele Jr. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines. Journal of Parallel and Distributed Computing, 8(2): 102-118, August 1990.
....framework (as the simple abstractions) and still relatively easy to use. For simplicity, we currently leave the burden of distributing and migrating objects across processors memories to the programmer, and simply provide constructs to allow this to be expressed in the language. Ongoing compiler [8, 16] and operating systems [2, 9] research has enjoyed some success in automatically distributing objects, and could reduce this burden on the programmer. 4.1 The Abstractions In CooL, information about the program is provided by identifying the objects that are important for locality for a task. ....
....better load balancing. 7 Related Work In this section we begin with a brief outline of automatic techniques to address data locality in programs. We then describe other approaches based on explicit programmer support, and contrast them with CooL. Automatic Techniques: Compiler based approaches [8, 16] primarily address loop level parallelism in regular numerical algorithms, and include optimizations such as loop transformations to improve reuse in the cache, and array alignment and distribution for better memory locality. However, these approaches focus on affine array accesses within a loop ....
K. Knobe, J. D. Lukas, and G. L. Steele. Data optimization: Allocation of arrays to reduce communication on S1MD machines. Journal of Parallel and Distributed Computing, 8:102 118, 1990.
....an algorithm for partitioning doing a global analysis across loops. They allow simple index expression accesses of the form c i q c2, but not general affine functions. They do not allow for the possibility of hyperparallelepiped data tiles, and do not account for caches. Knobe, Lucas and Steele [12] give a method of allocating arrays on SIMD machines. They align arrays to minimize communication for vector instructions, which access array regions specified by subranges on each dimension. Wolf and Lam [17] deal with the problem of taking sequential nested loops and applying transformations to ....
Kathleen Knobe, Joan Lukas, and Guy Steele Jr. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines. Journal of Parallel and Distributed Computing, 8(2): 102-118, August 1990.
....array[16] Their algorithm works by first calculating the projection vector, which is similar to what we call the partition, of the computation mapping. Many projects have examined the problem of finding array alignments (what we call data orientations and displacements) for data parallel programs[8, 11, 21, 30, 35]. These approaches focus on element wise array operations, and try to eliminate the communication between consecutive loops. Li and Chen prove the problem of finding optimal orientations NP complete[28] and have developed a heuristic solution which is used to implement their functional language ....
K. Knobe, J. D. Lukas, and G. L. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, 1990.
No context found.
K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, February 1990.
No context found.
K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, February 1990.
No context found.
K. Knobe, J.D. Lukas, and G.L. Steele. Dataoptimization: Allocation of arrays to reduce communication on simd machines. Journal of Parallel and Distributed Computing, 8, February 1990.
No context found.
Knobe K., Lukas J. et.al. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102-118, February 1990. 60
No context found.
K. Knobe, J. Lukas, and G. Steele, Jr., "Data optimization: Allocation of arrays to reduce communication on SIMD machines," Journal of Parallel and Distributed Computing, vol. 8, no. 2, pp. 102--118, Feb. 1990.
No context found.
Kathleen Knobe, Joan D. Lukas, and Guy L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, February 1990.
No context found.
K. Knobe, J. D. Lukas, and G. L. Steele. Data optimization: allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102--118, 1990.
No context found.
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
No context found.
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
No context found.
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
No context found.
K. Knobe, J. D. Lukas, and G. L. Steele Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8(2):102--118, Feb. 1990.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC