42 citations found. Retrieving documents...
W. Wolfe, "Loop Skewing: The Wavefront Method Revisited", International Journal on Parallel Programming, Vol. 15, No. 4, 1986, pp. 279-293.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Increasing Temporal Locality with Skewing and Recursive.. - Jin, Mellor-Crummey.. (2001)   (3 citations)  (Correct)

....an inner spatial loop i w.r.t. an enclosing spatial loop j with a skew factor s i,j involves adding s i,j times the loop index variable of j to the upper and lower bounds of i and subtracting the same quantity from every use of the loop index variable of i inside the loop as Wolfe described [20]. Skewing a spatial loop i w.r.t. another spatial loop j in a sibling loop nest with skew factor s i,j is applied by adding s i,j to the upper and lower bounds of i and subtracting the same amount from each occurrence of the index variable of i. We call skew factors of both types of skewing ....

....the dependence tails. If the condition is not satisfied, no serial ordering of the partitions will be possible. To satisfy this condition, the distance vectors for all true and anti dependences must contain only non negative distances for loops whose index variables appear in array subscripts [20]. For perfect loop nests, this condition is equivalent to there being no interchange preventing dependences. Figure 3a shows a pictorial representation of a pattern of true and anti dependences within an iteration space. Figure 3b shows a transformed iteration space in which the ....

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279--293, Aug. 1986.


Automatic Generation of Parallel Programs with Dynamic Load .. - May Cmu-Cs- School   (Correct)

....size and communication overhead, especially those that can be parameterized for control at run time. Grain size is increased by restructuring loops so that communication is moved out of inner loops. We do not discuss transformations such as loop interchange [2, 25, 43, 76, 78] and loop skewing [75, 78] because they are difficult to parameterize and are not always applicable. e.g. loop interchange can be parameterized by conditionally selecting different copies of the loop nest [76] but does not provide a continuum of grain size choices. 50 CHAPTER 3. AUTOMATIC SELECTION OF GRAIN SIZE 3.2.1 ....

Michael Wolfe. Loop Skewing: The Wavefront Method Revisited. International Journal of Parallel Programming, 15(4):279--293, August, 1986. BIBLIOGRAPHY 197


Lock Coarsening: Eliminating Lock Overhead in Automatically.. - Diniz, Rinard (1996)   (9 citations)  (Correct)

....object containing the instance variable) can access the variable without synchronization. We believe this model of parallel computation is general enough to support the majority of parallel computations for shared memory machines. Exceptions include computations (such as wave front computations [22, 6]) with precedence constraints between different parallel threads, chaotic computations (such as chaotic relaxation algorithms [19] that can access out of date copies of data without affect10 ing the final result, and nondeterministic computations (such as LocusRoute[18] that can tolerate ....

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279--293, August 1986. 29


Load Balancing of Parallel Affine Loops by Unimodular.. - O'Boyle, Hedayat (1992)   (1 citation)  (Correct)

....processors. They study the effect of unimodular transformations by giving a measure of parallelism, load imbalance and volume of communication which are again restricted to the two dimensional rectangular loop case. The unimodular transformations used within our paper are related to loop skewing [9] and loop interchange[10] The polytope and related notation is based upon the work of [8] The main concern of his thesis was the unification of the systolic framework based on uniform recurrences with data dependency analysis. Although the polytope notation was developed quite extensively, it ....

Wolfe M., Loop Skewing: The Wavefront Method Revisited, International Journal of Parallel Programming, Vol. 15, No. 4 pp 279-294, August 1986. 20


Unknown -   (Correct)

....and Distributed Systems, vol. 3, Jan. 1992. 30]A. Aiken and A. Nicolau, Perfect Pipelining: A New Loop Parallelization Technique, Proc. 1988 European Symp. Programming, 1988. 31]P.M. Kogge, The Microprogramming of Pipelined Processors, Proc. ACM IEEE Int l Symp. Computer Architecture, 1977. [32]J.H. Patel and E.S. Davidson, Improving the Throughput of a Pipeline by Insertion of Delays, Proc. ACM IEEE Int l Symp. Computer Architecture, 1976. 33]A. Zaky and P. Sadayappan, Optimal Static Scheduling of Sequential Loops on Multiprocessors, Proc. Int l Conf. Parallel Processing, 1989. ....

....Programming and Network Flows, p. 270. Addison Wesley, 1969. 31]J. Wang and E. Eisenbeis, A New Approach to Software Pipelining for Complicated Loops with Branches, research report, Institut National de Recherche en Informatique et an Automatique (INRIA) Rocquencourt, France, Jan. 1993. [32]G. Gao and Q. Ning, Loop Storage Optimization for Dataflow Machines, Proc. Fourth Int l Workshop Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds. Lecture Notes in Computer Science 589, pp. 359 373, Santa Clara, Calif. Intel Corp. ....

M. Wolfe, "Loop Skewing: The Wavefront Method Revisited," Int'l J. Parallel Programming, vol. 15, no. 4, pp. 284-294, Aug. 1986.


Multi-Dimensional Interleaving for Synchronous Circuit.. - Passos, Sha, Chao (1994)   (Correct)

....of the problems is the foundation for the high parallelism achievable, usually superior to results obtained through traditional methods based on one dimensional techniques. Recent studies have considered the optimization of nested loops, a software point of view of the multi dimensional problems [1, 2, 6, 18, 29, 30]. In general, these methods transform the loops in such a way to obtain a new sequence of execution characterized by a higher parallelism. This sequence of execution is commonly associated with a schedule vector, also called an ordering vector, that affects the order in which the iterations are ....

.... shows the required memory for our proposed method, chained shows the results of the chained MD retiming [23] the column hyp shows the requirements imposed by methods based solely on the selection of a new schedule or ordering vector, as in the unimodular transformations [2, 30] loop skewing [29], and other traditional wavefront methods [1, 14] The last column, aff, presents results that could be obtained by modifying affine by statement methods developed for systolic arrays [16, 24, 25] and other methods focused on fine grain parallelism that also depend on the selection of a new ....

M. Wolfe, " Loop Skewing: the Wavefront Method Revisited,". International Journal of Parallel Programming, Vol. 15, No. 4, pp. 284-294, August 1986.


Scanning Polyhedra with DO Loops - Ancourt, Irigoin (1991)   (161 citations)  (Correct)

....are usually compatible with the semantics of the initial program. Loop interchange[29] 1] which let the compiler move a parallel loop inwards to use a vector unit is probably the simplest one. The hyperplane method[18] 19] and its simpli ed version obtained by a combination of loop skewing[29][30] and loop interchange, as well as loop permutation[5] are based on more complex change of bases. These transformations must map integer points onto integer points on a one to one basis to preserve the iteration set and the program semantics. The change of basis matrix U must be unimodular as is ....

M. Wolfe, Loop Skewing: The Wavefront Method Revisited, Int'l Journal of Parallel Programming, V. 15, n. 4, 1986, pp. 279-294


Analysis and Transformation in the ParaScope Editor - Kennedy, McKinley, Tseng (1991)   (9 citations)  (Correct)

....of loop interchange [4, 55] Distance vectors, first used by Kuck and Muraoka [38, 43] are more precise versions of direction vectors that specify the actual number of loop iterations between two accesses to the same memory location. They are utilized by transformations to exploit parallelism [9, 39, 54, 56] and the memory hierarchy [12, 24] 3 Work Model Ped is designed to exploit loop level parallelism, which comprises most of the usable parallelism in scientific codes when synchronization costs are considered [18] In the work model best supported by Ped, the user first selects a loop for ....

....the transformed source. Ped supports a large set of transformations that have proven useful for introducing, discovering, and exploiting parallelism. Ped also supports transformations for enhancing the use of the memory hierarchy. These transformations are described in detail in the literature [2, 4, 12, 32, 33, 36, 41, 56]. We classify the transformations in Ped as follows. Reordering Transformations Loop Distribution Loop Interchange Loop Skewing Loop Reversal Statement Interchange Loop Fusion Dependence Breaking Transformations Privatization Scalar Expansion Array Renaming Loop Peeling Loop Splitting ....

[Article contains additional citation context not shown here]

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279-- 293, August 1986.


Practical Dependence Testing - Goff, Kennedy, Tseng (1991)   (93 citations)  (Correct)

....and profitable [4, 25, 53] Distance vectors, first used by Kuck and Muraoka [34, 42] are more precise versions of direction vectors that specify the actual distance in loop iterations between two accesses to the same memory location. They may be used to guide optimizations to exploit parallelism [23, 27, 36, 51, 54] or the memory hierarchy [11, 19, 43] Dependence testing thus has two goals. First, it tries to disprove dependence between pairs of subscripted references to the same array variable. If dependences may exist, it tries to characterize them in some manner, usually as a minimal complete set of ....

....MIV subscript h2j; 2j Gamma 2i 5i. The GCD test can now detect independence since the GCD of the coefficients of all the indices is 2, which does not divide evenly into the constant term 5. Distance Vectors The Delta test is particularly useful for analyzing dependences in skewed loops [27, 36, 54], including upper triangular loops skewed by loop normalization [3, 53] Consider the following simplified kernel from the Livermore Loops [41] DO 10 i = 1, N DO 10 j = 1, N 10 A(i,j) A(i 1,j) A(i,j 1) A(i 1,j) A(i,j 1) Since all subscripts are separable, the strong SIV test can be ....

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279-- 293, August 1986.


Reshaping Access Patterns for Generating Sparse Codes - Bik, Knijnenburg, Wijshoff (1994)   (3 citations)  (Correct)

....For an extensive overview of the theory, consult [4, 5, 15, 25, 26] 3. 1 Loop Transformations Every iteration level loop transformation on n perfectly nested loops with stride 1 and regular loop bounds, consisting of a combination of loop interchanging, loop skewing, or loop reversal (see e.g. [1, 22, 24, 27, 28, 29]) can be modeled by a mapping between the original and target iteration space, namely a linear transformation that is represented by a unimodular matrix U . A unimodular matrix is an n Theta n integer matrix, i.e. all elements are integers, for which j det(U )j = 1 holds. Each iteration in ....

Michael J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, Volume 15:279--293, 1986.


Chain-Based Scheduling: Part I -- Loop Transformations and Code .. - Peiyi Tang (1992)   (Correct)

....parallel codes for the rectangular tiles as will be seen shortly DO j 1 = j min 1 , j max 1 . DO j n = Gn , Hn B0(j 1 ; Delta Delta Delta ; j n ) ENDDO . ENDDO Figure 3. Nested loop after skewing Loop skewing is a loop transformation originally used to improve vectorization [8]. Loop skewing and other two loop transformation, loop interchange and loop reversing, have been recently unified into a single loop transformation called unimodular transformation [4,9] An n Theta n integer matrix T is unimodular if its determinant is Sigma1, i.e. j det(T )j = 1. A unimodular ....

M. Wolfe, "Loop Skewing: the Wavefront Method Revisited," Kuck and Associate, Inc., , 1987.


Scheduling And Behavioral Transformations For Parallel Systems - Chao (1993)   (16 citations)  (Correct)

....among cells is explored. Data dependency graphs are drawn on points in the iteration space, which is composed of discrete points representing iterations. This iteration space approach modifies the order in which iterations are executed without changing the structure of an iteration. Loop skewing [80] and loop quantization [1] are two techniques of this class. Another approach tries to explore instruction level parallelism by looking at the structure of each cell. The technique of Doacross loop [17] belongs to this class. Doacross loop computes a time interval (initiation interval) for ....

Wolfe, M. Loop skewing: the wavefront method revisited. International Journal of Parallel Programming 15, 4 (Aug. 1986), 284--294.


The ParaScope Parallel Programming Environment - Cooper (1993)   (43 citations)  (Correct)

....of the transformation. Ped supports a large set of transformations that have proven useful for introducing, discovering, and exploiting parallelism. Ped also supports transformations for enhancing the use of the memory hierarchy. These transformations are described in detail in the literature [36, 54, 55, 56, 49, 57, 58, 59]. Figure 4 shows a taxonomy of the transformations supported in Ped. Reordering transformations change the order in which statements are executed, either within or across loop iterations. They are safe if all dependences in the original program are preserved. Reordering transformations are used ....

M. J. Wolfe, "Loop skewing: The wavefront method revisited," International Journal of Parallel Programming, vol. 15, pp. 279--293, Aug. 1986.


Implementation of Fourier-Motzkin Elimination - Bik, Wijshoff (1994)   (12 citations)  (Correct)

....applied. This problem is still an important research topic [WS90] An important step forwards in solving the phase ordering problem has been accomplished by the observation that any combination of the iteration level loop transformations loop interchanging, loop skewing and loop reversal (see e.g. [AK87, Ban93, PW86, Pol88, Wol86, Wol88, Wol89, Zim90]) can be represented by a unimodular matrix [Ban90, Ban93, Dow90, WL91] The advantage of this approach is that the order and validity of individual transformations becomes irrelevant, because a unimodular transformation can be constructed directly for a particular goal provided that dependence ....

Michael J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, Volume 15:279--293, 1986.


Tiling of Iteration Spaces for Multicomputers - Ramanujam Sadayappan (1992)   (20 citations)  (Correct)

.... we reiterate a fundamental result due to Karp et al. 13] dealing with parallel execution of computations expressed as uniform recurrence equations) and Lamport [19] dealing with parallel execution of nested loops in FORTRAN) this result has been used by Moldovan [9] Miranker [21] Wolfe [29] and others. Lamport considered partitioning of the indices I = I1 ; I2 ; In) such that they lie on a family of parallel hyperplanes such that all indices lying on one hyperplane can be executed simultaneously. Let H be a vector (t1 ; t2 ; tn ) Then, t1 x1 t2 x2 Delta ....

M. Wolfe, "Loop Skewing: The Wavefront Method Revisited," International Journal of Parallel Programming, Vol. 15, No. 4, 1986, pp. 279-294.


Memory-Hierarchy Management - Carr (1992)   (29 citations)  (Correct)

....claim. He does not consider increasing the intelligence of the compiler to improve its effectiveness. Lam and Wolf present a framework for determining memory usage within loop nests and use that framework to apply loop interchange, loop skewing, loop reversal, tiling and unroll and jam [WL91, Wol86b] Their method does not work on non perfectly nested loops and does not encompass a technique to determine unroll and jam amounts automatically. Additionally, they do not necessarily derive the best block algorithm with their technique, leaving the possibility that suboptimal performance is still ....

M. Wolfe. Loop skewing: The wavefront method revisited. Journal of Parallel Programming, 1986.


Interactive Parallel Programming Using the ParaScope Editor - Kennedy, McKinley, Tseng (1991)   (40 citations)  (Correct)

....ffl Loop skewing adjusts the iteration space of two perfectly nested loops by shifting the work per iteration in order to expose parallelism. When possible, Ped computes and suggests the optimal skew degree. Loop skewing may be used with loop interchange in Ped to perform the wavefront method [38, 54]. ffl Loop reversal reverses the order of execution of loop iterations. ffl Loop adjusting adjusts the upper and lower bounds of a loop by a constant. It is used in preparation for loop fusion. ffl Loop fusion can increase the granularity of parallel regions by fusing two contiguous loops when ....

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279--293, August 1986.


Automatic and Interactive Parallelization - McKinley (1994)   (14 citations)  (Correct)

....10 Both may be used to calculate loop carried dependences. Additionally, direction vectors are sufficient to determine the safety and profitability of loop interchange [AK87, Wol82] Distance vectors are often required by other transformations that exploit parallelism [Ban90b, KMT91a, Lam74, WL90, Wol86] and improve data locality [CCK90, KMT91a, GJG87] Data dependence also characterizes reuse of individual memory locations [CCK90] 2.2 Interprocedural dependence analysis The presence of procedure calls complicates the process of analyzing dependences. Without interprocedural analysis worst ....

....set of transformations that have proven useful for introducing, discovering, and exploiting parallelism. Ped also supports transformations for improving data locality. Each transformation is briefly introduced below. Many are found in the literature [AC72, AK87, CCK90, KM90, KMT91b, KKLW84, Lov77, Wol86] In Ped, their novel aspect is the analysis of their applicability, safety, prof 17 itability and the incremental updates of source and dependence information. We classify the transformations implemented in Ped as follows. Reordering Transformations Loop Distribution Loop Interchange Loop ....

[Article contains additional citation context not shown here]

M. J. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279--293, August 1986.


Iteration Space Tiling for Distributed Memory Machines - Ramanujam, Sadayappan (1992)   (1 citation)  (Correct)

....shown. atomically, the (1; 1) is automatically covered by the other two. Therefore, the tiles in the Tile Space Graph (TSG) have uniform dependence vectors (0; 1) and (1; 0) For scheduling of tiles, the scheduling hyperplane must satisfy the conditions of Lamport s result on wavefront execution [17, 26]. The scheduling hyperplane given by (1; 1) which is a line at angle of 45 degrees satisfies the conditions for a valid schedule. Figure 5 shows the wavefront schedule for 2 D tile spaces. If both tile space vectors are present, 1; 1) defines the optimal scheduling of tiles. The computations ....

M. Wolfe, "Loop Skewing: The Wavefront Method Revisited," International Journal of Parallel Programming, Vol. 15, No. 4, 1986, pp. 279-294.


Beyond Induction Variables - Wolfe (1992)   (60 citations)  Self-citation (Wolfe)   (Correct)

.... when the lower limit contains other variables, as shown by the work on Parafrase [SLY90] For this reason, and because it can adversely affect the kinds of transformations allowed in programs (such as loop interchanging) this author has in the past argued against implementing loop normalization [Wol86] It is interesting to note that this formulation of induction variables essentially normalizes all loops. One example used to argue against normalization is: L23: for i = 1 to n loop L24: for j = i 1 to n loop A(i,j) A(i 1,j) Modern dependence analysis applied to this example will typically ....

....coarse [IT88] A direction vector encodes the sign of the elements of the distance vector; while less precise in general, it can be used effectively when the distance vector is not constant. It may also force compilers to implement loop skewing and loop interchanging as a single transformation [Wol86] this has been formulated in the past as a linear transformation on the index set [KMW67] and is currently in vogue as unimodular transformations [WL91, Ban91] 7 Summary and Conclusions We have shown a fast and simple algorithm for classifying all induction variables in a loop; this ....

Michael Wolfe. Loop skewing: The wavefront method revisited. International J. Parallel Programming, 15(4):279--294, August 1986.


Scalar vs. Parallel Optimizations - Wolfe (1990)   Self-citation (Wolfe)   (Correct)

....There has been a great deal of work on compiler optimizations for parallel computers. We chose a small set of four transformations to illustrate the problems that can arise when trying to interleave parallel and scalar optimizations. Other transformations that are also relevant are loop skewing [Wol86], loop collapsing and loop coalescing [Pol87] loop reversal, index set splitting, loop unrolling [DoH79] scalarization, alignment and replication [ACK87] and so on. Note that the goal here is not so much automatic detection of parallelism from sequential programs, but automatic generation of ....

M. Wolfe, Loop Skewing: The Wavefront Method Revisited, Intl J. Parallel Programming 15, 4 (August 1986), 279-294.


Auto-CFD: Efficiently Parallelizing CFD Applications on Clusters - Li Xiao Xiaodong   (Correct)

No context found.

W. Wolfe, "Loop Skewing: The Wavefront Method Revisited", International Journal on Parallel Programming, Vol. 15, No. 4, 1986, pp. 279-293.


Adaptable Parallel Components - For Grid Programming   (Correct)

No context found.

M. Wolfe. Loop skewing: the wavefront method revisited. In Journal of Parallel Programming, Volume 15, pages 279--293, 1986.


Using Code Parameters for Component Adaptations - Jan Dunnweber Sergei (2006)   (Correct)

No context found.

M. Wolfe. Loop skewing: the wavefront method revisited. In Journal of Parallel Programming, Volume 15, pages 279--293, 1986.


Improving Register Allocation for Subscripted Variables - Callahan, Carr, Kennedy (1990)   (120 citations)  (Correct)

No context found.

M. Wolfe. Loop skewing: The wavefront method revisited. Journal of Parallel Programming, 15(4):279--293, Aug. 1986.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC