11 citations found. Retrieving documents...
C.-H. Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In Proceedings of the

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Evaluating the Impact of Memory System Performance on Software.. - Badawy, al. (2001)   (5 citations)  (Correct)

....in isolation. Software prefetching for affine array accesses has been studied in [31, 22, 4] Hardware prefetching [7, 34, 15, 14, 20] is similarly limited to affine array accesses, but uses hardware to identify the access pattern automatically. Prefetch engines for affine array accesses [44, 6, 10, 8] provide hardware support for prefetching, but rely on the programmer or compiler to identify the access pattern. Prefetching for pointer chasing traversals uses one of four approaches. The first approach inserts additional pointers, called jump pointers, into dynamic data structures to connect ....

C.-H. Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In Proceedings of the


Evaluating the Impact of Memory System Performance on.. - Badawy, Aggarwal.. (2001)   (5 citations)  (Correct)

....in isolation. Software prefetching for affine array accesses has been studied in [30, 22, 4] Hardware prefetching [7, 34, 15, 14, 20] is similarly limited to affine array accesses, but uses hardware to identify the access pattern automatically. Prefetch engines for affine array accesses [44, 6, 10, 8] provide hardware support for prefetching, but rely on the programmer or compiler to identify the access pattern. Prefetching for pointer chasing traversals uses one of three approaches. The first approach inserts additional pointers, called jump pointers, into dynamic data structures to connect ....

Chi-Hung Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In In Proceedings of the 1994 International Conference on Parallel Processing, pages I--263--I--266, August 1994.


Evaluating the Impact of Memory System Performance on .. - Aggarwal, Badawy.. (2000)   (5 citations)  (Correct)

....result for higher memory latencies. 7 Related Work Conventional software prefetching [31, 22, 5] has investigated prefetching for arrays. Work in hardware prefetching [8, 34, 16, 15, 20] is similarly limited to arrays, but uses hardware to identify what to prefetch automatically. Prefetch engines [43, 7, 11, 9] provide hardware support for prefetching, but rely on the programmer or compiler to identify the access pattern. Like hardware and software prefetching, prefetch engines have focused on arrays as well. Recently, researchers have begun investigating novel prefetching techniques for ....

Chi-Hung Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In In Proceedings of the 1994 International Conference on Parallel Processing, pages I--263--I--266, August 1994.


Compiling Techniques for Improving Decoupled Virtual Shared Memory.. - Zhu   (Correct)

....47, 58, 76, 62, 39] 1.4.3 Integrated Prefetching To obtain satisfactory performance, it is usually required to combine both hardware and software based prefetching schemes in one system. When one fails, the other is hoped to work. Four of these prefetching schemes can be found in the literature [17, 74, 21, 40]. 1.5 Decoupled Virtual Shared Memory Architecture The key to efficient prefetching is to attempt to predict the future usage of data and prefetch operands into local memory before they are needed [84] Although working quite well for a range of programs, software based prefetching has problems ....

C. H. Chi. Compiler optimization technique for data cache prefetching using a small cam array. In International Conference on Parallel Processing, 1994.


A Data Prefetch Mechanism for Accelerating General-Purpose.. - Luddy Harrison (1994)   (4 citations)  (Correct)

....an effective technique for tolerating the cachemiss latency in high performance processors. Hardware, software, and hybrid hardware software schemes have all been explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [CB94a, BL94, CR94, Chi94a, Chi94b, DS95, Gor95, PK94, SMH94, YGHH94, DDS93, EV93, JT93, TE93, CB92, FPJ92, SH92, McF92, MLG92, Sel92, Skl92, VS92, CMH 92, BC91, CKP91, FP91, KL91, MG91, CMCH91, GGV90, Jou90, SR88, LYL87, Smi78] Unfortunately, most published data prefetching schemes are generally limited to generating ....

Chi-Hung Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In Proceedings of the 1994 International Conference on Parallel Processing, volume I, pages 263--266, August 1994.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....compromises in the cache reuse analysis framework that is used for inserting prefetches into dense loops. This is because the cache reuse analysis requires dependence testing on explicit array index expressions to reduce prefetching overhead. Finally, work by Gornish [15] Yamada [19] Chi [13], Chiueh [14] and Chen [23] is representative of what we characterize as hybrid data prefetching schemes. Such schemes use compiler assistance to either identify candidate loads for which historical information is recorded in on chip cache structures, or to perform data structure transformations ....

C.-H. Chi, "Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array," in Proceedings of the 1994 International Conference on Parallel Processing, vol. I, pp. 263--266, August 1994.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....compromises in the cache reuse analysis framework that is used for inserting prefetches into dense loops. This is because the cache reuse analysis requires dependence testing on explicit array index expressions to reduce prefetching overhead. Finally, work by Gornish [15] Yamada [19] Chi [13], Chiueh [14] and Chen [23] is representative of what we characterize as hybrid data prefetching schemes. Such schemes use compiler assistance to either identify candidate loads for which historical information is recorded in on chip cache structures, or to perform data structure transformations ....

C.-H. Chi, "Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array," in Proceedings of the 1994 International Conference on Parallel Processing, vol. I, pp. 263--266, August 1994.


Interconnection Networks And Data Prefetching For Large-Scale.. - Kim (1995)   (Correct)

....is not very useful. In addition to the overhead of the prefetch instruction execution, prefetches tend to compete with loads or stores for access to load store units, which are a relatively scarce resource in most processor architectures. 2.2. 3 Hybrid Data Prefetching In hybrid prefetching [67, 68], code is produced at compile time to generate information on memory access patterns. The information is stored in a separate table and is later used by hardware to prefetch data. Such code is inserted outside a loop and is executed once for the entire loop. When the loop is executed, the table is ....

C.-H. Chi, "Compiler optimization technique for data cache prefetching using a small CAM array," in International Conference on Parallel Processing, vol. I, pp. 263--266, 1994.


Quantifying the Performance Potential of a Data Prefetch.. - Mehrotra, Harrison (1995)   (5 citations)  (Correct)

....in high performance processors. It is also an extensively researched subject. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors 1 [CB95, CR94, Chi94a, Chi94b, PK94, YGHH94, EV93, JT93, FPJ92, SH92, McF92, MLG92, Sel92, Skl92, VS92, CKP91, KL91, CMCH91, Jou90, SR88, Smi78] Unfortunately, most published data prefetching schemes are generally limited to generating prefetches for arithmetic progressions of memory addresses in loop nests, ....

....for multiprocessors have been elided. int i, m, a[100] for (i=0; i 100; i ) A m = m a[i] Figure 1: A linear array traversal been studied in the literature [SH92, McF92, Jou90, Smi78] Several software and hybrid hardware software prefetching schemes have also been reported [APS95, Chi94a, Chi94b, YGHH94, MLG92, Sel92, CKP91, KL91, CMCH91] The hybrid schemes either populate a stride table using software support, or reschedule the code being generated by the compiler to assist the hardware being proposed. The software schemes, of course, attempt to analyze the program s cache ....

Chi-Hung Chi. Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array. In Proceedings of the 1994 International Conference on Parallel Processing, volume I, pages 263--266, August 1994.


Adaptive And Integrated Data Cache Prefetching For Shared-Memory.. - Gornish (1995)   (11 citations)  (Correct)

....this basic method to allow for software directed prefetching. The key difference between their scheme and ours is that, in their scheme, the prefetching degree is fixed by the degree of interleaving in the cache. Another scheme similar to our integrated prefetching method is presented by Chi in [38]. This scheme is less flexible than ours because it cannot handle NCSASs, and it only uses a lookahead of one iteration. Chi suggests how a larger lookahead can be used, but does not fully describe such an implementation. In addition, no performance results are presented. We note that this ....

C.-H. Chi, "Compiler optimization technique for data cache prefetching using a small cam array," in International Conference on Parallel Processing, 1994.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....compromises in the cache reuse analysis framework that is used for inserting prefetches into dense loops. This is because the cache reuse analysis requires dependence testing on explicit array index expressions to reduce prefetching overhead. Finally, work by Gornish [15] Yamada [19] Chi [13], Chiueh [14] and Chen [23] is representative of what we characterize as hybrid data prefetching schemes. Such schemes use compiler assistance to either identify candidate loads for which historical information is recorded in on chip cache structures, or to perform data structure transformations ....

C.-H. Chi, "Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array," in Proceedings of the 1994 International Conference on Parallel Processing, vol. I, pp. 263--266, August 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC