149 citations found. Retrieving documents...
D. Callahan, K. Kennedy, and A. K. Porterfield, "Software prefetching," Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 40-52, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Simple and Effective Array Prefetching in Java - Cahoon, McKinley (2002)   (Correct)

....software controlled data prefetching to improve memory performance by tolerating cache miss latency. The goal of prefetching is to bring data into the cache before the demand access to that data. Prior research shows that software controlled prefetching is effective in array based Fortran programs [8, 19, 5, 16] We describe a new data flow analysis to identify loop induction variables, and a method to schedule prefetches for array references that contain induction variables in the index expression. We rely on a simplified form of common subexpression elimination to remove redundant prefetches. Our new ....

....analysis and loop transformations. We focus on Java arrays that contain features that make code and data transformations challenging compared to Fortran arrays. Callahan, Kennedy, and Porterfield present and evaluate a simple algorithm for prefetching array references one loop iteration ahead [8]. Their paper illustrates the potential for software prefetching, but they present cache miss rate results only. Santhanam, Gornish, and Hsu evaluate software prefetching for Fortran programs on the HP PA 8000, a 4 way superscalar processor [21] The compiler creates symbolic address expressions ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Santa Clara, CA, Apr. 1991.


Transparent Threads: Resource Sharing in SMT Processors for.. - Dorai, Yeung (2002)   (2 citations)  (Correct)

.... a new subordinate multithreading technique, called Transparent Software Prefetching (TSP) TSP performs software data prefetching by instrumenting the prefetch code in a separate prefetch thread rather than inlining it into the main computation code, as is done in conventional software prefetching [4, 10, 15]. Prefetch threads run as background threads, prefetching on behalf of the computation thread which runs as a foreground thread. Because they run transparently, prefetch threads incur near zero overhead, and thus never degrade the computation thread s performance. TSP solves a classic problem ....

D. Callahan, K. Kennedy, and A. Porterfield. Software Prefetching. In 4th International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.


Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

....cycles every time it fetches data from main memory. This is a severe problem since the microprocessor could execute up to 200 floating point instructions during that time. This problem is called latency problem. Researchers have developed several techniques, like software and hardware prefetching [CKP91, MLG92, CB94] non blocking caches [Kro81, SF91] stream buffers [Jou90, PK94] multithread 2.2 The Bottleneck: Memory Performance 9 Processor Bandwidth Out of Order Cache (I D L2) Sun Ultra 3 4.8 Gbyte s none 32 K 64 K Intel Pentium 4 3.2 Gbyte s 126 ROPs 12 K 8 K 256 K ....

....to min(jj B 1; n) do instruction is often handled as a hint for the processor to load a certain data item but the fulfillment of the prefetch is not guaranteed by the CPU. Prefetch instructions can be inserted into the code manually by the programmer or automatically by a compiler [Por89, KL91, CKP91, Mow94] In both cases prefetching involves overhead. The prefetch instructions themselves have to be executed, i.e. pipeline slots will be filled with prefetch instructions instead of other instructions ready to be executed. Furthermore, the memory address of the prefetched data must be ....

D. Callahan, K. Kennedy, and A.K. Porterfield. Software Prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, April 1991.


Guided Region Prefetching: A Cooperative.. - Wang, Burger.. (2003)   (2 citations)  (Correct)

.... of the loads that will cause cache misses at runtime is complex, requiring both knowledge of hardware parameters (cache block size, capacity, and associativity) and sophisticated code analysis (e.g. to determine the volume of other data accessed between references to a particular block) [7, 17, 33, 45]. The compiler also faces the difficult challenge of issuing the prefetches sufficiently early to hide the memory latency, but not so early that useful data are needlessly evicted. To find that point, the compiler must estimate cache miss latencies and run time instruction execution rates [25] ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Santa Clara, CA, Apr. 1991.


Hardware Support for Dynamic Access Ordering: Performance of Some.. - McKee (1993)   (1 citation)  (Correct)

....speeds. These include altering the placement of data to exploit concurrency [Gup88] reordering the computation to increase locality, as in blocking [Lam91] address transformations for conflict free access to interleaved memory [Har89, Rau91, Val91] software prefetching data to the cache [Cal91, Kla91, Soh91], and hardware prefetching vector data to cache [Bae91, Fu91, Jou90, Skl92] For a more detailed discussion of how these schemes relate to dynamic access ordering, see [McK93b] The main difference between these techniques and the complementary one we propose here is that we reorder stream ....

Callahan, D., et. al., "Software Prefetching", Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.


Optimizing SMT Processors for High Single-Thread Performance - Dorai, Yeung, Choi (2003)   (Correct)

....However, as was seen for the baseline processor, this mechanism is not transparent and degrades foreground thread performance on the reduced processor as well (these results have been omitted to conserve space) 5. Transparent Software Prefetching: Design and Evaluation Software prefetching [17, 18, 15] is a promising technique to mitigate the memory latency bottleneck. It hides memory latency by scheduling non blocking loads (special prefetch instructions) early relative to when their results are consumed. While these techniques provide visible latency hiding benefits, they also incur limiting ....

....5.1 Implementation Conventional Software Prefetching. As mentioned before, software prefetching schedules prefetch instructions to bring data into the cache before they are consumed by the program. Conventional software prefetching inlines the prefetch code along with the main computation code [17, 18, 15]. The inlined prefetches are software pipelined via a loop peeling transformation. Figure 17(a) shows a simple code example, and Figure 17(b) illustrates the prefetch inlining and loop peeling transformations needed to instrument software prefetching for this code example. The transformations are ....

[Article contains additional citation context not shown here]

D. Callahan, K. Kennedy, and A. Porterfield, "Software Prefetching," in Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 40--52, April 1991.


Access Pattern based Local Memory Customization for Low.. - Grun, Dutt, Nicolau (2001)   (2 citations)  (Correct)

.... memory represents a major performance and power bottleneck [25] Increasingly, the performance gap between processor and memory has been addressed using faster memory modules (such as SDRAM, RAMBUS, DDRAM [7, 10] or by fetching more data into local memories (for instance through prefetching [3, 5, 19]) However, such techniques while aggressively targeting performance, rely on significantly increased bandwidth. For instance, the burst mode, page mode, pipelined accesses [24, 25] in SDRAM, RAMBUS, increase performance by increasing the memory bandwidth. Similarly, prefetching results in ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In ASPLOS, 1991.


Runahead Execution: An Alternative to Very Large.. - Mutlu, Stark.. (2003)   (9 citations)  (Correct)

....memory latency by exploiting the temporal and spatial reference locality of applications. Kroft [19] improved the latency tolerance of caches by allowing them to handle multiple outstanding misses and to service cache hits in the presence of pending misses. Software prefetching techniques [5, 22, 24] are effective for applications where the compiler can statically predict which memory references will cause cache misses. For many applications this is not a trivial task. These techniques also insert prefetch instructions into applications, increasing instruction bandwidth requirements. ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, 1991.


A Framework for Data Prefetching using Off-line Training of.. - Kim, Palem, Wong (2002)   (Correct)

....hardware approaches and hybrid approaches. We will briefly mention some representative work and refer the interested reader to a detailed survey on the matter that was recently published [25] In the field of software prefetching early work include that done by Callahan, Kennedy, and Porterfield [4], and Klaiber and Levy [16] The former proposed the insertion of data prefetch instructions in data intensive loops while the latter studied efficient architectural support mechanisms for data prefetch instructions. Mowry, Lam and Gupta [22] showed that careful analysis and selective prefetching ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, 1991.


Design and Evaluation of a Compiler Algorithm for Prefetching - Mowry, Lam, Gupta (1992)   (320 citations)  (Correct)

....Ideally, isolating the cache miss instances will not increase the instruction overhead. One of the advantages of having implemented the prefetching schemes in the compiler is that we can quantify this instruction overhead. Previous studies have only been able to estimate instruction overhead [4]. Table 4 shows the number of instructions required to issue each 109 Prefetch memory overhead memory access stalls N C S N C S N C S N C S N CS N CS N CS N CS N CS N CS N CS N CS N C S Figure 5: Loop splitting effectiveness (N = no prefetching, C = selective prefetching with conditional ....

....Work Several strategies for utilizing prefetching have been presented in the past. Some of these approaches use software support to issue prefetches, while others are strictly hardware based. In this section, we discuss previous work in both categories. 5. 1 Software Prefetching Porterfield [4, 23] presented a compiler algorithm for inserting prefetches. He implemented it as a preprocessing pass that inserted prefetching into the source code. His initial algorithm prefetched all array references in inner loops one iteration ahead. He recognized that this scheme was issuing too many ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 452, April 1991.


Evaluating the Impact of Memory System Performance on Software.. - Badawy, al. (2001)   (5 citations)  (Correct)

....we use detailed executiondriven simulation of a modern processor. Although relatively little work has compared software prefetching and locality optimizations, a large body of work has studied the techniques in isolation. Software prefetching for affine array accesses has been studied in [31, 22, 4]. Hardware prefetching [7, 34, 15, 14, 20] is similarly limited to affine array accesses, but uses hardware to identify the access pattern automatically. Prefetch engines for affine array accesses [44, 6, 10, 8] provide hardware support for prefetching, but rely on the programmer or compiler to ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991.


Conflict Miss Elimination by Time-stride Prefetch - Tang, Veidenbaum, Nicolau.. (2000)   (1 citation)  (Correct)

....5 presents the simulation results. The conclusions of this work are summarized in Section 6. 2 Related work A number of prefetch schemes for array and linked list prefetching have been proposed . All aim at identifying predicting next element to be accessed and prefetching it. Software prefetch [1, 8, 11] uses a special PREFETCH instruction to bring data into the cache in advance. It has the advantage of utilizing global information to identify memory addresses 1 most likely to miss from the cache . One disadvantage is that it will incur an execution cost even if the data is already in the cache. ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Architectural Support for Programming Languages and Operating Systems-IV, pages 40--52, 1991.


Multi-Chain Prefetching: Effective Exploitation of.. - Kohout, Choi, Kim, Yeung (2001)   (6 citations)  (Correct)

....multi chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone. This research was supported in part by NSF Computer Systems Architecture grant CCR 0093110 and NSF CAREER Award CCR 0000988. 1. Introduction Prefetching, whether using software [11, 7, 1], hardware [3, 13, 5] or hybrid [2, 4] techniques, has proven successful at hiding memory latency for applications that employ regular data structures (e.g. arrays) Unfortunately, these techniques are far less successful for applications that employ linked data structures (LDSs) due to the ....

D. Callahan, K. Kennedy, and A. Porterfield. Software Prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, April 1991.


Software and Hardware Techniques to Optimize.. - Zalamea, Llosa.. (2001)   (4 citations)  (Correct)

....total execution time. Section II.A overviews the main proposals in this direction. Other optimizations applied to loops (such as unrolling, common subexpression elimination, back substitution, 8] techniques oriented towards hiding the negative effects of cache misses (such as prefetching [12] or blocking) breaking the data dependences (such as data speculation) or breaking the control dependence flow (predication, control speculation) increase even more the register requirements. When a loop requires more registers than those available in the target architecture, register pressure ....

D. Callahan, K. Kennedy, and A. Porterfield, "Software prefetching," in Proc Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), April 1991, pp. 40--52.


Comparing and Combining Read Miss Clustering and Software.. - Pai, Adve (2001)   (1 citation)  (Correct)

....out of order instruction window (called read miss clustering) 22] An alternate, widelyused latency tolerance technique is software controlled non binding prefetching. Prefetching helps tolerate latencies by initiating (often multiple overlapping) data fetches ahead of expected demand misses [4]. On the surface, both techniques seem to target the same types of latencies and use the same system resources; therefore, without further analysis, it is unclear which technique is superior or if both techniques can be used together. Since prefetching is already widely used, such an analysis is ....

D. Callahan, K. Kennedy, and A. Porterfield. Software Prefetching. In Proc. of the 4th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Apr. 1991.


Two-level Hierarchical Register File Organization.. - Zalamea, Llosa.. (2000)   (13 citations)  (Correct)

....more than 64 registers) they represent close to 40 of the total execution time. Other optimizations applied to loops (such as unrolling, common subexpression elimination, back substitution, 17] techniques oriented towards hiding the negative effects of cache misses (such as prefetching [2] or blocking) breaking the data dependences (such as data speculation) or breaking the control dependence flow (predication, control speculation) increase even more the register requirements. When a loop requires more registers than available, register pressure must be decreased by either ....

....pref. REG64: Useful Figure 9. Execution time when scheduling loops using hit latency or selective binding prefetching. ever a dependent instruction needs the datum brought from memory (in case of lockup free caches) Binding prefetching can be used to tolerate the latency of these cache misses [2]. Binding prefetching consists in scheduling the load instructions assuming cache miss latency. Binding prefetching does not increase memory traffic but increases register pressure, as shown in Section 3.3. However, the higher capacity of the proposed TWO16 organization allows us to apply ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proc Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 40--52, April 1991.


Modulo Scheduling with Integrated Register Spilling.. - Zalamea, Llosa..   (4 citations)  (Correct)

....performance. This generates a valid schedule that stalls the processor whenever a cache miss occurs or whenever a dependent instruction needs the datum to be brought up from memory (in case of lockup free caches) Binding prefetching can be used to tolerate the latency of these cache misses [3]. Binding prefetching consists in scheduling the load instructions assuming cache miss latency. Binding prefetching does not increase memory traffic but increases register pressure. Therefore, configurations based on clustering are able to offer higher capacity than non clustered organizations, ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proc Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 40--52, April 1991.


Data Locality Analysis of the SPECfp95 - Sanchez, Gonzalez (1998)   (Correct)

....the different types of cache misses into the three commonly used categories (compulsory, capacity, conflict) can be important to choose different types of optimizations. Capacity misses could be best reduced by blocking [5] 3] conflict misses by padding [13] and compulsory misses by prefetching [2][11] among other possibilities. Identifying the parts of the program that are responsible for most penalties may help to reduce the optimization effort by focusing on such cases. Conflict misses are the dominant type of misses for many numerical applications. Identifying which data ....

D. Callahan, K. Kennedy and A. Porterfield, "Software Prefetching", in Procs. of IV Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91), pp. 40-52, April 1991


Loop Optimization Techniques On Multi-Issue Architectures - Kaiser   (Correct)

....prefetch instructions is that speculative loads do not consume additional instruction bandwidth. Prefetch instructions are non blocking, non exceptional instructions which provide a hint to the memory system that a data item will be used soon. Callahan, et al. implement prefetch instructions in [23]. In this study, prefetch load instruction were provided along with standard loads. Both load instructions put data into a single unified cache. A compiler prepass was used to add prefetch instructions to the source code. A prefetch load was added for the following loop iteration to every simple ....

....hardware and resulting inefficiency. 41 Perhaps a combination of prefetch and speculative load instructions would provide a better means of hiding memory latency than a DAE architecture. The possibility of adding prefetch instruction to a VLIW architecture is raised by Callahan and Kennedy in [23]. They speculate that a VLIW implementation may reduce the overhead, making prefetch instructions profitable: Software prefetching should be particularly useful on high performance systems that can issue more than one instruction per cycles if the costs of issuing the prefetch instruction ....

[Article contains additional citation context not shown here]

D. Callahan, K. Kennedy, A. Porterfield, Software Prefetching, Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, vol. 19, pp. 40-52.


Data Prefetching Using Offline Learning - Kim, Palem, Wong   (Correct)

....We will briefly mention some representative work that is relevant to our discussion. We refer the interested reader to a detailed survey on the matter that was recently published [26] In the field of software prefetching early works include that done by Callahan, Kennedy, and Porterfield [2], and Klaiber and Levy [15] The former proposed the insertion of data prefetch instructions in data intensive loops while the latter studied efficient architectural support mechanisms for data prefetch instructions. Mowry, Lam and Gupta [20] showed that careful analysis and selective prefetching ....

D. Callahan, K. Kennedy and A. Porterfield, "Software Prefetching," In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 40 -- 52, 1991.


Bridging Processor and Memory Performance in ILP Processors via .. - Palem, al. (2001)   (2 citations)  (Correct)

....memory access patterns (MAPs) for which current optimizations are mostly inadequate. The majority of strategies advocated to address the memory bottleneck either attempt to hide long access latencies or enhance data locality. Examples of latency masking optimizations include prefetching[24, 20, 6] and load sensitive scheduling algorithms[15, 30] However, such strategies are vulnerable to unpredictable memory reference patterns and may degrade performance. Specifically, prefetch strategies waste bandwidth and pollute caches when data is unnecessarily requested. Similarly, poor or ....

D. Callahan, K. Kennedy, and A. Poterfield. "Software prefetching". In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40-52, April 1991.


Data Flow Analysis for Software Prefetching Linked Data.. - Cahoon, McKinley (2001)   (9 citations)  (Correct)

....1. Introduction Software controlled data prefetching improves memory performance by hiding memory latency. Its goal is to bring data into the cache before the demand access to that data. Existing research shows the benefits of software prefetching techniques for array based, scientific programs [6, 21, 4, 19]. Given an array, the size of each element, and a regular access pattern, a compiler can compute the address of any element in the array and prefetch it. Prefetching in pointer based codes is difficult because separate dynamically allocated objects are disjoint, and the access patterns are thus ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Apr. 1991.


Data-Specific Optimizations - Jinturkar (1996)   (1 citation)  (Correct)

....begun to appear. Second, high performance RISC processors are being used for numeric problems. Such problems do not show high degree of data reuse, and therefore render caches ineffective. Consequently, researchers have begun to focus on organizations and technologies like software assisted caches [Call91], speculative loads [Smit92] and stream memory controllers [McKe94] Most software approaches that tackle the memory bandwidth problem focus on reducing the memory bandwidth requirements of a program. One of the fundamental compiler optimizations for reducing a program s memory bandwidth ....

Callahan, D., Kennedy K., and Porterfield A., "Software Prefetching", Proceedings of the Fourth International Symposium on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991, pp. 132-141.


Evaluating the Impact of Memory System Performance on.. - Badawy, Aggarwal.. (2001)   (5 citations)  (Correct)

....we use detailed execution driven simulation of a modern processor. Although relatively little work exists in comparing software prefetching and locality optimizations, a large body of work has studied the techniques in isolation. Software prefetching for affine array accesses has been studied in [30, 22, 4]. Hardware prefetching [7, 34, 15, 14, 20] is similarly limited to affine array accesses, but uses hardware to identify the access pattern automatically. Prefetch engines for affine array accesses [44, 6, 10, 8] provide hardware support for prefetching, but rely on the programmer or compiler to ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991.


Comparative Evaluation of Latency-Tolerating and -Reducing.. - Grahn, Stenström (2000)   (1 citation)  (Correct)

....We start with a qualitative discussion in Section 4.1 and then a quantitative evaluation of two prefetching schemes in Section 4.2. 815 LATENCY TOLERATING TECHNIQUES 4.1. Qualitative Evaluation Prefetching is a latency tolerating technique that appears in the literature as a software controlled [3, 26, 30] as well as a hardware based technique [6, 9, 10] We start with the general characteristics of prefetching and then specifically discuss the schemes we use in this study. In this study we only consider nonbinding, readshared prefetching, i.e. the block is fetched in a shared mode. We will ....

....sequential prefetching, the coverage is between 4 and 880, and the prefetch efficiency ranges from about 30 up to 920 in the best case. Continuing with stride prefetching, coverage values range from 2 up to 740 and prefetch efficiencies from 13 to 900. In software controlled prefetching schemes [3, 26, 30, 31], the compiler inserts special prefetch instructions into the code based on static program information. For example, the software prefetching schemes presented in [31] have high coverage numbers (between 75 and 980) Unfortunately, the prefetch efficiency varies from as low as 11 up to 850. In ....

D. Callahan, K. Kennedy, and A. Porterfield, Software prefetching, in Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV)," pp. 40#52, April 1991. 831 LATENCY-TOLERATING TECHNIQUES


Relative Performance of Hardware and Software-Only Directory .. - Grahn, Stenström (1995)   (Correct)

....S hw = 7 3. 2 Prefetching A Technique that Increases the Protocol Execution Overhead Prefetching is a latency tolerating technique that have been proposed and evaluated, both as a software controlled technique [3, 19, 21] and as a hardware based technique [6, 10] We start with the general characteristics of prefetching and then specifically discuss the scheme used in this study. Further, in this study we only consider non binding, read shared prefetching, i.e. the block is fetched in a shared mode, but we will ....

....blocks we exploit most of the spatial locality in the applications, the gains are limited. Adaptive sequential prefetching also suffers from a too low prefetch efficiency to be really effective in a software only directory protocol. Therefore, we expect software controlled prefetching schemes [3, 19, 21, 22] to have a larger potential to increase the performance of software only directory protocols since they can exploit knowledge about the application behavior. We have also studied a stride prefetching scheme [9] as an alternative to adaptive sequential prefetching. Since stride prefetching is more ....

D. Callahan, K. Kennedy, and A. Porterfield, "Software Prefetching," In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 40-52, April 1991.


Software Data Prefetching for Software Pipelined Loops - Sánchez, González (1999)   (Correct)

....due to true dependences with previous memory operations. The alternative of scheduling all loads using the cache miss latency requires considerable instruction level parallelism and increases register pressure ( 1] Software prefetching is an effective technique to tolerate memory latency ([4]) Software prefetching can be performed through two alternative schemes: binding and nonbinding prefetching. The first alternative, also known as early scheduling of memory operations, moves memory instructions away from those instructions that depend on them. The second alternative introduces in ....

....minimize the execution time of a software pipelined loop. Finally, we show that schemes based on binding prefetch are more effective than those based on nonbinding prefetch for software pipelined schedules. The use of binding and nonbinding prefetching has been previously studied in [13] 1] and [4][9] 14] 18] 3] respectively among others. However, there are very few works analyzing the interactions of these prefetching schemes with software pipelining techniques. The selective scheduling ( 1] schedules some operations with cache hit latency and others with cache miss latency, like the ....

D. Callahan, K. Kennedy and A. Porterfield, "Software Prefetching", in Procs. of IV-ASPLOS, pp.40.52, April 1991


Software Support For Improving Locality in Advanced Scientific Codes - Tseng (2000)   (Correct)

....Software prefetching relies on the programmer or compiler to insert explicit prefetch instructions into the code. It has been shown to be effective in reducing memory stall cycles for both sequential and parallel applications, particularly for scientific programs making regular memory accesses [9, 62, 61]. Techniques have also been developed to apply prefetching to pointer based data structures [54] As support for software prefetching becomes more common, we wish to determine the usefulness of software prefetching combined with our locality optimizations, and determine how they interact. Working ....

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991.


A Prefetch Taxonomy - Srinivasan, Davidson, Tyson (2004)   (1 citation)  (Correct)

....now apply PTMT to variants of two common prefetching techniques and derive some insights into how to improve their performance. 3.1 Implementable Prefetching Algorithm Many software and hardware methods have been proposed to predict addresses and issue prefetches. Software prefetching techniques [3, 6, 7, 11, 12, 13, 15, 18] derive hints from global program analysis 13 and insert explicit prefetch instructions only when they are deemed likely to be useful. But existing software based selection methods are limited to the kind of data access patterns (constants or strides) that are easiest to recognize at compile ....

D. Callahan, K. Kennedy, A. Porterfield, "Software Prefetching," Proceedings of the 4th Symposium on Architectural Support for Programming Languages and Operating Systems, April 1991, pp 40-52.


Reducing Garbage Collector Cache Misses - Boehm (2000)   (8 citations)  (Correct)

....this may require that the prefetch instruction is issued hundreds of instructions ahead of the actual object reference, since the cache miss will probably require on the order of 100 machine cycles. Most commonly prefetch instructions are introduced by a compiler optimizing numerical code (cf. [3]) Array references can often be automatically predicted. In nonnumerical pointer chasing code, this is less feasible (but see for example [4] Nonetheless, in this case, we can manually predict object references quite far ahead of time. As soon as wegrey an object, i.e. push it onto the mark ....

D. Callahan, K. Kennedy, and A. Portereld. Software prefetching. In ########### ## ### ###### ############# ######################### ####### ### ########### ######### ### ######### #######, pages 40-52. ACM, April 1991.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. K. Porterfield, "Software prefetching," Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 40-52, 1991.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52. ACM Press, 1991. 2.1.3


Cache-conscious Frequent Pattern Mining on a Modern - Processor Amol Ghoting (2005)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 1991.


Improving Software Pipelining with Hardware Support for.. - Carr, Sweany (1998)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Santa Clara, California, 1991.


Compiler Orchestrated Prefetching via Speculation.. - Rabbah.. (2004)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, 1991.


Software Data Prefetching for Software Pipelined Loops - Sanchez, Gonzalez (1999)   (Correct)

No context found.

D. Callahan, K. Kennedy and A. Porterfield, "Software Prefetching", in Procs. of IV-ASPLOS, pp.40.52, April 1991


Next-Generation Memory Systems - Wang (2004)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Santa Clara, CA, April 1991.


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. SIGPLAN Notices, 26:40--52, April 1991.


Guided Region Prefetching: A Cooperative.. - Wang, Burger.. (2003)   (2 citations)  (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Santa Clara, CA, Apr. 1991.


Optimizing Software Data Prefetches with Rotating Registers - Gautam Doshi Mission   (2 citations)  (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield, "Software Prefetching", in Proceedings of the Fourth International Conference on Architecture Support for Programming Languages and Operating Systems, 1991, 40-52.


Real-Time High-Throughput Sonar Beamforming Kernels Using - Native Signal Processing (1999)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield, "Software Prefetching." Proc. IEEE Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 4052, Apr. 1991.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52. ACM Press, 1991. 2.1.3


Predictor-Directed Data Prefetching for Pointer-based Applications - Sair (2003)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV),April 1991.


Optimizing Communication and Data Distribution for.. - Palermo   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield, "Software prefetching," in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, Apr. 1991, pp. 40--52.


Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)

No context found.

Callahan, D., Kennedy, K., and Porterfield, A., "Software Prefetching", Proc. Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April, 1991.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, April 1991.


A Programmable Memory Hierarchy for Prefetching Linked Data.. - Yang, Lebeck   (Correct)

No context found.

Callahan, D., Kennedy, K., Porter eld, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). (1991) 40-52


Masking Memory Access Latency with a Compiler-Assisted Data.. - VanderWiel (1998)   (Correct)

No context found.

Callahan, D., K. Kennedy and A. Porterfield, "Software Prefetching," Proc. Fourth International Conf. on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991, p. 40-52.


Principles and Applications of Continual Computation - Horvitz (2001)   (13 citations)  (Correct)

No context found.

D. Callahan, K. Kennedy, A. Porterfield, Software prefetching, in: Proc. 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vol. 26 (4), ACM Press, New York, 1991, pp. 40--52.


Improved Configuration Prefetch for Single Context.. - Hauck, Li   (Correct)

No context found.

D. Callahan, K. Kennedy, A. Porterfield, "Software Prefetching", International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 40-52, 1991. 18

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC