14 citations found. Retrieving documents...
D.M. Tullsen and S.J. Eggers,"Effective Cache Prefetching on Bus-Based Multiprocessors". ACM Transactions on Computer Systems, Vol. 13, No. 1, February 1995, pp. 57-88.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Effective Compile-Time Analysis for Data Prefetching in Java - Cahoon (2002)   (Correct)

.... scheme for multiprocessors [41] Zhang and Torrellas describe techniques for prefetching pointer based programs on multiprocessors using a scheme that is similar to greedy prefetching [123] Tullsen and Eggers evaluate compiler 29 assisted software prefetching on shared memory multiprocessors [106]. Ranganathan, Pai, Abdel Shafi, and Adve examine the effectiveness of software prefetching for scientific programs on a shared memory multiprocessor built with modern ILP processors [87] 2.6.4 Prefetching Linked Structures: Luk and Mowry Luk and Mowry develop three prefetching schemes for ....

Dean M. Tullsen and Susan J. Eggers. Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, 13(3):57--88, August 1995.


Improving Memory Hierarchy Performance for Irregular .. - Mellor-Crummey.. (2001)   (6 citations)  (Correct)

....systems are being constructed with deeper hierarchies. Achieving high performance on such systems requires tailoring the reference behavior of applications to better match the characteristics of a machine s memory hierarchy. Techniques such as loop blocking [1, 2, 3, 4, 5, 6] and data prefetching [4, 7, 8] have significantly improved memory hierarchy utilization for regular applications. A limitation of these techniques is that they aren t as effective for irregular applications. Improving performance for irregular applications is extremely important since large scale scientific and engineering ....

D. M. Tullsen and S. J. Eggers, "Effective cache prefetching on bus-based multiprocessors," ACM Transactions on Computer Systems 13(1) pp. 57-88 (Feb 1995).


General-Purpose Architectures for Media Processing.. - Parthasarathy..   (Correct)

.... at a later time, with the goal of bringing the location into the processor s cache before it issues a demand memory access [17] Previous studies have shown that software controlled non binding prefetching can eliminate a large fraction of memory stall time in shared memory multiprocessors [84, 124, 100]. However, these studies have been mainly limited to conventional scientific and engineering workloads. In this section, we study the effectiveness of software prefetching for our media processing workloads. As discussed in Section 2.3.3, we follow the well known software prefetching compiler ....

D.M. Tullsen and S.J. Eggers. Effective Cache Prefetching on Bus-Based Multiprocessors. ACM Transactions on Computer Systems, 13(1):57--88, 1995.


Exploiting Instruction-Level Parallelism for Memory System.. - Pai (2000)   (Correct)

....processors. These observations suggest that ILP systems have a greater need for both conventional and novel additional techniques to tolerate or reduce memory latency. A commonly used technique for better latency tolerance is software controlled non binding prefetching [CKP91, MG91, MLG92, Mow94, TE95, LM96] Chapter 3 evaluates the interaction of this technique with instruction level parallelism. Our results also motivate a specific technique novel to ILP processors: application level clustering of read misses. Chapter 4 proposes and evaluates code transformations that improve miss clustering ....

.... into the processor s cache before it issues a demand memory access for that location [CKP91] Previous studies have shown that software controlled non binding prefetching can eliminate a large fraction of memory stall time in shared memory multiprocessors and uniprocessors [MG91, MLG92, Mow94, TE95, LM96] However, the multiprocessor studies used previous generation processors with single issue, static scheduling, and blocking reads. Some of the uniprocessor studies modeled ILP processors, but did not specifically relate their benefits or limitations to ILP features. Consequently, such ....

[Article contains additional citation context not shown here]

D.M. Tullsen and S.J. Eggers. Effective Cache Prefetching on Bus-Based Multiprocessors. ACM Transactions on Computer Systems, 13(1):57--88, February 1995.


Design and Performance of Multithreaded Architectures - Thekkath (1995)   (Correct)

.... techniques have been proposed to eliminate or tolerate long memory latencies in multiprocessors, e.g. cache memories [Tan76, Smi82, Goo83, BW89, Lee87, WJ87, Jou90, Prz90, JW94] relaxed memory consistency models [CBZ91, GAG 92, KCZ92, ZB92] data prefetching [Por89, CMCH91, KL91, MLG92, TE95] and dataflow architectures [Ian88, NA89, CSS 91, PT91] These techniques have been investigated in detail. Although they can successfully reduce memory 2 latency in many instances, they do not eliminate it completely. A more extensive discussion of these techniques is in Chapter 6. ....

....dense matrix codes showed lowered cache miss rates and performance increases of up to two times [MLG92] But benefits from prefetching are not guaranteed in shared memory multiprocessors. Thread data sharing effects can interact negatively with prefetching, resulting in increased memory traffic [TE95] For example, additional prefetching traffic could not be tolerated in machines where the processor memory interconnections are already near saturation. Another potential problem arises when prefetched data is invalidated by another cache before it is used by the prefetched processor. This can ....

D. M. Tullsen and S. J. Eggers. Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, to be published in 1995.


Latency Tolerance For Dynamic Processors - Bennett, Flynn (1996)   (1 citation)  (Correct)

....fetching data that isn t referenced) the memory bus traffic increases. 4 Discussion 4.1 Software techniques This study has compared various hardware techniques for tolerating memory latency. There has also been work on software techniques for for tolerating memory latency, such as prefetching[MLG92, TE95] and balanced scheduling[KE93] Hardware and software techniques are compared in [BP92] CCMH91] and [CB94] In general, it appears that software and hardware techniques are complementary. Compile time optimizations for memory latency tolerance can include large scale code motion, such as loop ....

D. Tullsen and S. Eggers. Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, 13:57--88, February 1995. 17


IPU/LTB: A Method for Reducing Effective Memory Latency - Jr., Appelbe, Das (1997)   (Correct)

....regular dense matrix code[10] where the pattern of data access is known statically. Various compiler algorithms have been studied in [8, 1, 6] The overhead of additional instructions plus a dependence on sophisticated compile time analysis limits the effectiveness of this type of prefetching ([12]) While software prefetching is almost always a form of data prefetching, a form of instruction prefetching in software was proposed by Young, et al. in [14] In this method, instructions are inserted before branches to advise about prefetching from the branch target. This type of prefetching ....

Dean Tullsen and Susan J. Eggers. Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, V13(1):57--88, February 1995.


Prefetching Techniques for Client/Server, Object-Oriented.. - Knafla   (Correct)

.... to reduce wrong prefetches is to defer the start of a prefetch operation behind a loop or branch [Rogers and Li, 1992] Furthermore, it is not worthwhile to start the prefetch operation too early because the data could be invalidated or already replaced from the buffer [Tullsen and Eggers, 1993; Tullsen and Eggers, 1995]. A solution to this problem is to limit the deferment of a prefetch to a maximal distance [Chen and Baer, 1992; Chen and Baer, 1994] Program based techniques are not important for databases because they do not consider the content of the buffer pool for a prefetch decision. Nevertheless the ....

Tullsen, D. and Eggers, S. (1995). Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Database Systems, 13(1):57--88.


Compiler Support for Software Prefetching - McIntosh (1998)   (10 citations)  (Correct)

.... data on the overall effectiveness of prefetching and demon 9 strated that exclusive mode prefetching can be beneficial in terms of reducing network traffic on machines such as the DASH [65] Tullsen and Eggers examined prefetching in the context of a bus based, bandwidthlimited multiprocessor [94, 95]. Their study was based on off line trace analysis, in which prefetch instructions were inserted into a previously generated program trace by consulting an oracle . Their results were in three general areas. First, they found that for multiprocessors with very limited memory bandwidth, software ....

Dean M. Tullsen and Susan J. Eggers. Effective cache prefetching on bus-based multiprocessors. ACM Transactions on Computer Systems, 15(1):57--89, February 1995.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....in attempting to prefetch pointer linked data structures. However, the solution they propose for prefetching linked lists uses a table structure that remembers the target address for each link traversed, which therefore requires a large table to be effective. The work by Tullsen and Eggers [10], Mowry [17] Selvidge [27] Mowry and Gupta [31] Klaiber and Levy [30] Gornish [32] and Porterfield [34] is representative of research in software data prefetching. Of these, Selvidge s is the only scheme to focus on a compiler solution for prefetching pointer linked data structures. It ....

D. M. Tullsen and S. J. Eggers, "Effective Cache Prefetching on Bus-Based Multiprocessors, " ACM Transactions on Computer Systems, vol. 13, pp. 57--88, February 1995.


The Interaction of Software Prefetching with ILP.. - Ranganathan, Pai.. (1997)   (3 citations)  (Correct)

.... processor at a later time, with the goal of bringing the location into the processor s cache before it issues a demand memory access [4] Previous studies have shown that software controlled non binding prefetching can eliminate a large fraction of memory stall time in sharedmemory multiprocessors [24, 33]. However, all such studies used previous generation processors with single issue, static scheduling, and blocking reads. Consequently, such studies do not account for the interactions between software prefetching and the other latency tolerating techniques already incorporated in ILP based ....

....decrease in late prefetches occurs at the expense of an increase in early prefetches. Many prefetches arrive at the cache much before the demand access. These are vulnerable to cache replacements or invalidations for a longer time, as also observed in studies of previous generation multiprocessors [33]. In Mp3d, these early prefetches hurt performance because they prematurely invalidate other processors cache lines (due to false and true sharing) In Radix, most early prefetches are replaced prefetches that do not adversely affect other processors. Second, on applications that are ....

[Article contains additional citation context not shown here]

D. Tullsen and S. Eggers. Effective Cache Prefetching on Bus-Based Multiprocessors. ACM Transactions on Computer Systems, 13(1):57--88, 1995.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....in attempting to prefetch pointer linked data structures. However, the solution they propose for prefetching linked lists uses a table structure that remembers the target address for each link traversed, which therefore requires a large table to be effective. The work by Tullsen and Eggers [10], Mowry [17] Selvidge [27] Mowry and Gupta [31] Klaiber and Levy [30] Gornish [32] and Porterfield [34] is representative of research in software data prefetching. Of these, Selvidge s is the only scheme to focus on a compiler solution for prefetching pointer linked data structures. It ....

D. M. Tullsen and S. J. Eggers, "Effective Cache Prefetching on Bus-Based Multiprocessors, " ACM Transactions on Computer Systems, vol. 13, pp. 57--88, February 1995.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....in attempting to prefetch pointer linked data structures. However, the solution they propose for prefetching linked lists uses a table structure that remembers the target address for each link traversed, which therefore requires a large table to be effective. The work by Tullsen and Eggers [10], Mowry [17] Selvidge [27] Mowry and Gupta [31] Klaiber and Levy [30] Gornish [32] and Porterfield [34] is representative of research in software data prefetching. Of these, Selvidge s is the only scheme to focus on a compiler solution for prefetching pointer linked data structures. It ....

D. M. Tullsen and S. J. Eggers, "Effective Cache Prefetching on Bus-Based Multiprocessors, " ACM Transactions on Computer Systems, vol. 13, pp. 57--88, February 1995.


Characterization and Improvement of Load/Store - Cache-Based Prefetchi Ng   (Correct)

No context found.

D.M. Tullsen and S.J. Eggers,"Effective Cache Prefetching on Bus-Based Multiprocessors". ACM Transactions on Computer Systems, Vol. 13, No. 1, February 1995, pp. 57-88.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC