10 citations found. Retrieving documents...
Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependencies. In

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Distribution of Data and Computations - Feautrier (2000)   (1 citation)  (Correct)

....There are reasons to believe, rstly that this is the only class of programs which have a well de ned compilation algorithm toward parallel computers. Less constrained programs can be handled either by approximate methods [CBF95] or by run time parallelization methods. Secondly, the authors of [SLY89] have shown that a large proportion of numerical programs about 80 belongs to the static control class. An important research domain deals with methods for converting some subclasses of non static control programs to static control. Relevant methods include elimination of GOTO [Amm92] ....

Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependencies. In


Performance Analysis of Multiprocessors Memory System - Temam   (Correct)

....l ) we want to find a vector x, such that y = f l (x) Note that, if F is singular, there exist multiple such vectors x. Proposition 3. 1 Let y 2 Im (f l ) a vector of dimension r, then 9U 0 1 an r Theta n matrix, 9x 0 1 a vector of dimension r such that x 0 1 = U 0 1 y: 1 As shown in [8], this type of indices encompass about 90 of the subscripts found in real numerical codes. 11 Proof If y f 0 2 f(P ) then y 2 Im (f l ) Let y 0 = U Gamma1 y and x 0 = V x, then y 0 = y 0 1 y 0 2 = I r x 0 1 0 , i.e. y 0 1 = x 0 1 , where x 0 1 and y ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Dataflow Analysis of Array and Scalar References - Feautrier (1991)   (126 citations)  (Correct)

....these unsolved points have been noted where appropriate. Extending the technique to languages with fewer restrictions than we introduced in section 2.2 would be highly interesting. Some estimate of the applicability of our technique may be deduced from the statistics of Zhiyu 30 Shen et al.[23]. The main difficulty is non linear indices. In this paper, which analyses more than 100 000 lines of code, about 53 of all indices are found to be linear, about 13 are partially linear, and the remaining 34 are non linear. An index is classified as partially linear as soon as it contains a ....

Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependencies. In 1989 Int. Conf. on Parallel Processing, pages II 145--152, 1989.


Execution of Regular DO Loops on Asynchronous Multiprocessors - Ouyang   (Correct)

....vectors d 1 = 0; 2; 3] d 2 = 1; Gamma1; 2] and d 3 = 3; 1; 1] Let d i be denoted by [d i1 ; d i2 ; d i3 ] for 1 i 3. The algorithm recursively divide each dimension into regions until it reaches the last dimension. Initially, dimension 1 is divided into three regions [1,1] 2,3] and [4,10] according to d 11 , d 21 and d 31 . By doing this way, we have ffl For each iteration in the subspace [1; 1] Theta [1; 10] Theta [1; 10] it may depend on other iterations only via d 1 ; ffl For each iteration in the subspace [2; 3] Theta [1; 10] Theta [1; 10] it may depend on other ....

....on other iterations only via d 1 or d 2 ; ffl For each iteration in the subspace [4; 10] Theta [1; 10] Theta [1; 10] it may depend on other iterations via d 1 , d 2 , or d 3 . With dimension 1 restricted to the region [1; 1] dimension 2 will be divided, using d 12 only, into [1,2] and [3,10]. By doing this way, we have ffl For each iteration in the subspace [1; 1] Theta [1; 2] Theta [1; 10] it may not depend on any other iterations; ffl For each iteration in the subspace [1; 1] Theta [3; 10] Theta [1; 10] it may depend on other iterations only via d 1 . Finally, with ....

[Article contains additional citation context not shown here]

Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew, "An Empirical Study on array Subscripts and Data Dependencies, " Proc. Int. Conf. Parallel Processing, vol. II, pp. 145-152, 1989.


Cache Miss Equations: An Analytical Representation of Cache Misses - Ghosh (1997)   (70 citations)  (Correct)

....loops or DO loops. We limit subscript expressions of array references to affine or linear combinations of the loop indices. This restriction is not too stringent in practice as most of the array references commonly found in numerical or scientific codes have subscript expressions of the above type [ZLY89, Wol92]. Similarly, the bounds of a loop index are assumed to be linear combinations of the indices enclosing that loop. We have also assumed that the loops contain no conditional expressions. This implies that the entire loop body is executed in every iteration of the loop. Since virtually all loops ....

S. Zhiyu, Z. Li, and P. C. Yew. An Empirical Study on Array Subscripts and Data Dependencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Cache Awareness in Blocking Techniques, Part II - Temam, Fricker, Jalby   (Correct)

....7 This technical report is not referenced here because of the review process is anonymous. 8 Note that only simple dependences are considered; for instance the reuse associated with reference A(j1 j2) is not considered. The most frequently found dependences are simple ones, as mentioned in [8]. Definition 8.2.4 kTRS(R)k is the size of the theoretical reuse set of R expressed in array lines, i.e. the number of cache lines used by TRS(R) assuming an infinite cache size. Definition 8.2.5 kARS(R)k is the size of the actual reuse set of R expressed in cache lines. Example Consider ....

....corresponds to a 3 deep loop nest where reuse occurs on the third loop. For such a reuse set, the algorithm needs only to be applied once (except if the access stride is not equal to one) Reuse sets defined over three loops (i.e. 4 deep loop nests) are less common in primitives and real codes [8], or are less likely to be exploited (for instance, loop blocking is rarely performed over more than the two innermost loops) 8.3 Estimating the Optimal Block Size Section 8.3.1 describes a simple heuristic for evaluating TLB misses. Section 8.3.2 is the most important one: it is shown how to ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Using Virtual Lines to Enhance Locality Exploitation, Part III - Temam, Jegou (1994)   (8 citations)  (Correct)

....exploiting the spatial locality of array D will generally outperform the cache pollution it induces. Cache line size as a performance bottleneck In numerical codes, the ability to use large cache lines could significantly increase performance. Indeed, array references are often stride 1 accesses [16], and besides, arrays without temporal locality are also commonly found (like a matrix access in a matrix vector multiply primitive) The only way to reduce the number of cache misses of such references (called cold start misses) is to use large line sizes. This is all the more true that ....

....used based on the address being currently referenced and the stride, and then prefetch the corresponding data. Though simulations proved the efficiency of such schemes, they require relatively heavy and complex implementations. Besides, array references within a loop nest are often stride 1 (see [16]) making complex stride detection mechanism less necessary. Combining the virtual line scheme with prefetching The virtual line scheme actually constitutes a convenient architecture base for introducing simple stride prefetching. Consider a physical line of a virtual line that has been stored in ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Cache Interference Phenomena - Temam, Fricker, Jalby (1994)   (53 citations)  (Correct)

....if N 4, it is equal to 8 Gamma N cache lines if 4 N 8, and it is empty if 8 N . 5 Note that only simple dependences are considered; for instance the reuse associated with reference A(j 1 j 2 ) is not considered. The most frequently found dependences are simple ones, as mentioned in [17]. 6 Floor and ceiling functions have often been omitted in this paper because experiments showed they generally don t have a significant impact on precision. 3.3 Interference Set The definition of the set of array elements that can interfere with a reuse set is very similar to the definition ....

....defined over two loop levels corresponds to a 3 deep loop nest where reuse occurs on the third loop. For such a reuse set, the algorithm needs only to be applied once (except if the access stride is not equal to one) Reuse sets defined over three loops are less common in primitives and real codes [17], or are less likely to be exploited (for instance, loop blocking is rarely performed over more than the two innermost loops) 3.5 Self Interferences Assume the reuse loop level is l for reference R. Proposition 3.4 The number of cache misses due to self interferences of R is equal to N 1 Theta ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Cache Awareness in Blocking Techniques - Temam, Fricker, Jalby (1998)   (1 citation)  (Correct)

....of simplicity, a number of restrictive hypotheses have been adopted on the loop nests considered in this study. First, only uniformly generated dependences with constant dependence distances are considered. This excludes subscripts such as A(j 1 j 2 ) for instance. Note that previous studies [13] show that such hypotheses still encompass a large majority of the subscripts found in numerical codes. Second, the loop boundaries are assumed to be constant. The consequence of both hypotheses is to have constant dependence distances. Extending the model to linear dependence distances (the most ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.


Using Virtual Lines to Enhance Locality Exploitation - Temam, Jegou (1994)   (8 citations)  (Correct)

....exploiting the spatial locality of array D will generally outperform the cache pollution it induces. Cache line size as a performance bottleneck In numerical codes, the ability to use large cache lines could significantly increase performance. Indeed, array references are often stride 1 accesses [17], and besides, arrays without temporal locality are also commonly found (like a matrix access in a matrix vector multiply primitive) The only way to reduce the number of cache misses of such references (called cold start misses) is to use large line sizes. This is all the more true that efficient ....

....used based on the address being currently referenced and the stride, and then prefetch the corresponding data. Though simulations proved the efficiency of such schemes, they require relatively heavy and complex implementations. Besides, array references within a loop nest are often stride 1 (see [17]) making complex stride detection mechanism less necessary. Combining the virtual line scheme with prefetching The virtual line scheme actually constitutes a convenient architecture base for introducing simple stride prefetching. Consider a physical line of a virtual line that has been stored ....

She Zhiyu, Zhiyuaun Li, and Pen-Chung Yew. An Empirical Study on Array Subscripts and Data Dependendencies. Technical Report 840, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, August 1989.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC