| L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing R & D, March 1996. |
....to address this limitation. 12 4.1 Stale Reference Analysis Three main program analysis techniques are used in stale reference analysis : stale reference detection, array data flow analysis, and interprocedural analysis. Extensive algorithms for these techniques were previously developed [6, 7], and implemented using the Polaris parallelizing compiler [25] We make use of these algorithms in the CCDP scheme. Since the detailed algorithms are described in [6] we will only state the functions of the stale reference analysis techniques. To find the potentially stale data references, it ....
....array data flow analysis, and interprocedural analysis. Extensive algorithms for these techniques were previously developed [6, 7] and implemented using the Polaris parallelizing compiler [25] We make use of these algorithms in the CCDP scheme. Since the detailed algorithms are described in [6], we will only state the functions of the stale reference analysis techniques. To find the potentially stale data references, it is necessary to detect the memory reference sequences which might violate cache coherence. The stale reference detection algorithm [6] accomplishes this by performing ....
[Article contains additional citation context not shown here]
L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing R & D, March 1996.
....for the CCDP scheme are designed to address this limitation. 4.1. Stale Reference Analysis Three main program analysis techniques are used in stale reference analysis : stale reference detection, array dataflow analysis, and interprocedural analysis. We make use of extensive algorithms [4, 5] which were previously developed for these techniques. Since the detailed algorithms are described in [4] we will only state the functions of the stale reference analysis techniques. To find the potentially stale data references, it is necessary to detect the memory reference sequences which ....
....main program analysis techniques are used in stale reference analysis : stale reference detection, array dataflow analysis, and interprocedural analysis. We make use of extensive algorithms [4, 5] which were previously developed for these techniques. Since the detailed algorithms are described in [4], we will only state the functions of the stale reference analysis techniques. To find the potentially stale data references, it is necessary to detect the memory reference sequences which might violate cache coherence. The stale reference detection algorithm [4] accomplishes this by performing ....
[Article contains additional citation context not shown here]
L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing R & D, Mar. 1996.
....stores multiple words, which might lead to implicit RAW (read after write) and WAW (write after write) data dependencies. As a result, a write reference followed by a read reference during a later epoch to data which are mapped on the same cache line is also a possible stale reference condition [5]. These stale reference sequences can be detected by the compiler. 3.1.3 Cache Coherence Enforcement by Data Prefetching 3.1.3.1 Cache coherence schemes Cache coherence schemes use a combination of techniques for the detection, prevention, and avoidance of stale references [6] Most ....
....ensure correctness. 4 Compiler Support 4.1 Stale Reference Analysis Three main program analysis techniques are used in stale reference analysis : stale reference detection, array data flow analysis, and interprocedural analysis. Extensive algorithms for these techniques were previously developed [5, 6] and implemented using the Polaris parallelizing compiler [27] We make use of these algorithms in the CCDP scheme. Since the detailed algorithms are described in [5] we will only discuss the main ideas and functions of the stale reference analysis techniques used. To find the potentially stale ....
[Article contains additional citation context not shown here]
L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing R & D, March 1996.
....stale reference analysis algorithm identifies the potentially stale data references in a program by detecting stale reference sequences. Three main program analysis techniques are used in stale reference analysis : stale reference detection, array data flow analysis, and interprocedural analysis [1]. 3.2 Prefetch Target Analysis Since prefetch operations introduce instruction execution and network traffic overheads, it is important to minimize the number of unnecessary prefetches. The prefetch target analysis algorithm determines which potentially stale references should be prefetched. The ....
L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, Center for Supercomputing R & D, UIUC, March 1996.
....4 Compiler Support 4.1 Stale Reference Analysis Three main program analysis techniques are used in stale reference analysis : stale reference detection, array dataflow analysis, and interprocedural analysis. Extensive algorithms for these techniques were previously developed by Choi and Yew [3], and they have been implemented using the Polaris parallelizing compiler [12] We make use of these algorithms in the CCDP scheme. The interested reader can refer to [3] for the details of these algorithms. 4.2 Prefetch Target Analysis Our prefetch target analysis algorithm makes use of simple ....
....analysis, and interprocedural analysis. Extensive algorithms for these techniques were previously developed by Choi and Yew [3] and they have been implemented using the Polaris parallelizing compiler [12] We make use of these algorithms in the CCDP scheme. The interested reader can refer to [3] for the details of these algorithms. 4.2 Prefetch Target Analysis Our prefetch target analysis algorithm makes use of simple heuristics which are easy to implement and are likely to be effective. The algorithm starts by including all potentially stale references of the program in the target set ....
[Article contains additional citation context not shown here]
L. Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing R & D, Mar. 1996.
....data word prefetched on a cache miss WAR WAR implicit WAW (a) Potentially stale reference marked at compile time Time Read, W(i) 2 Time Read, X(i) 3 Figure 2: Epochs and stale data references in a parallel program. a modified def use chain analysis used in standard data flow analysis techniques [15, 16]. Software cache bypass scheme (SC) Once all the potentially stale data references are identified by the compiler, cache coherence can be enforced if we guarantee that all such references access up to date data from main memory rather than from the stale cache copies. This can be enforced using a ....
....are identified by finding the first occurrence of upwardly exposed uses in an epoch in our algorithm. variables W and X, assuming they are modified 2 and 3 epochs before the procedure Q returns. The details of compiler algorithms are beyond the scope of this paper and are described in [15, 16]. Compiler implementation All the compiler algorithms explained earlier have been implemented on the Polaris parallelizing compiler [33] Figure 6 shows the flowchart of our reference marking algorithm. First, we construct a procedure call graph. Then, based on the bottom up scan of the call ....
[Article contains additional citation context not shown here]
Lynn Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. Ph.D. thesis, Computer Science Dept., University of Illinois, Mar. 1996.
....interprocedural analysis. Time reads with offset 2 and 3 are generated for the last two read references to the variables W and X, assuming they are modified 2 and 3 epochs before the procedure Q returns. The details of the compiler algorithms are beyond the scope of this paper and are described in [8]. Compiler implementation All the compiler algorithms explained earlier have been implemented on the Polaris parallelizing compiler [17] Figure 2 shows the flowchart of our reference marking algorithm. First, we construct a procedure call graph. Then, based on the bottom up scan of the call ....
....We also calculate more precise offsets. We then transform the program in the GSA form back to the original program with the reference marking information, and generate appropriate cache and memory operations. The guarded execution technique can be used to further optimize the code generation [8]. The compiler marking algorithms developed here are general enough to be applicable to other compiler directed coherence schemes [5, 6] 3 Hardware Implementation Issues Tag V Timetag0 Epoch Counter CPU Word 0 V Timetag1 Word 1 Secondary Data Cache TIME READ ADDR, OFFSET Address Bus Data Bus ....
[Article contains additional citation context not shown here]
Lynn Choi. Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors. Ph.D. thesis, Computer Science Dept., University of Illinois, Mar. 1996.
....Feautrier [14] gave an algorithm to calculate them exactly. Pugh [21] developed some exact techniques that are substantially faster than Feautrier s. Our implementation is based on the regular section analysis[5] which is less accurate but allows large programs to be analyzed efficiently. In [10], we discuss how our algorithm can be applied to existing compiler directed coherence schemes [6, 9] 6. Conclusion Private caches can greatly improve the performance of large scale shared memory multiprocessors if they can be used to cache remote shared data. However, maintaining cache coherence ....
L. Choi and P.-C. Yew. Hardware and compiler support for cache coherence in large-scale multiprocessors. Technical Report, University of Illinois, Computer Science Department, Ph.D. Thesis, Feb. 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC