| A. C. Klaiber and H. M. Levy, "An architecture for software-controlled data prefetching," 1991. |
....The data dependence is a result of pointer indirection. Pointer dereferences are required to generate addresses for successive elements in a linked data structure (LDS) This is commonly called the pointerchasing problem. This serialization hinders e orts to choose appropriate prefetch distances [1] so that memory latency can be fully overlapped with computation. We recently proposed a novel data movement model called push to overcome the above limitations [2] The push model performs pointer dereferences at lower levels of the memory hierarchy and pushes data up to the processor. This ....
Klaiber, A.C., Levy, H.M.: An architecture for software-controlled data prefetching. In: Proceedings of the 18th Annual International Symposium on Computer Architecture. (1991) 43-53
.... a new subordinate multithreading technique, called Transparent Software Prefetching (TSP) TSP performs software data prefetching by instrumenting the prefetch code in a separate prefetch thread rather than inlining it into the main computation code, as is done in conventional software prefetching [4, 10, 15]. Prefetch threads run as background threads, prefetching on behalf of the computation thread which runs as a foreground thread. Because they run transparently, prefetch threads incur near zero overhead, and thus never degrade the computation thread s performance. TSP solves a classic problem ....
A. C. Klaiber and H. M. Levy. An Architecture for SoftwareControlled Data Prefetching. In 18th Annual International Symposium on Computer Architecture, May 1991.
....large to im24 prove execution times. The authors suggest a few changes to reduce the overheads to make prefetching profitable. Klaiber and Levy describe an algorithm for software controlled data prefetching that holds the prefetched data in a separate fully associative buffer instead of the cache [60]. The algorithm works by inserting a prefetch for an array element one or more iterations before the actual load of the datum. Results show that prefetching improves performance on the Livermore Loops benchmarks using average time per memory reference as the metric. The algorithm is most effective ....
Alexander C. Klaiber and Henry M. Levy. An architecture for software-controlled data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43--53, Toronto, Canada, May 1991.
....= jj to min(jj B 1; n) do instruction is often handled as a hint for the processor to load a certain data item but the fulfillment of the prefetch is not guaranteed by the CPU. Prefetch instructions can be inserted into the code manually by the programmer or automatically by a compiler [Por89, KL91, CKP91, Mow94] In both cases prefetching involves overhead. The prefetch instructions themselves have to be executed, i.e. pipeline slots will be filled with prefetch instructions instead of other instructions ready to be executed. Furthermore, the memory address of the prefetched data must be ....
A.C. Klaiber and H.M. Lewy. Architecture for Software--controlled Data Prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43--63, Toronto, Canada, May 1991.
....VSCAP does not solely rely on overlapping communication with computation, it also overlaps communication operations with other communication operations to hide even more network latency. Research in prefetching can be divided into software [13] hardware [4,9,5] and hybrid prefetching [16,10,12]. VSCAP s prefetching approach is quite different as it does not address parallel architectures with cache coherent memory, it rather targets machines with distributed memory and explicit communication operations where data distribution is the responsibility of the programmer and not the system. ....
A. Klaiber and H. Levy. An architecture for software-controlled data prefetching. In Eighteenth Annual International Symposium on Computer Architecture, pages 43--53, Toronto, May 1991.
....33, 45] The compiler also faces the difficult challenge of issuing the prefetches sufficiently early to hide the memory latency, but not so early that useful data are needlessly evicted. To find that point, the compiler must estimate cache miss latencies and run time instruction execution rates [25]. The compiler is further constrained in that it cannot schedule a prefetch until it can compute the effective address. While this constraint is not significant for arrays [6, 33] it limits compiler based greedy pointer prefetching [5, 30, 36] Jump pointers bypass this limitation by identifying ....
A. C. Klaiber and H. M. Levy. An architecture for software-controlled data prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43--53, Toronto, Canada, May 1991.
....speeds. These include altering the placement of data to exploit concurrency [Gup88] reordering the computation to increase locality, as in blocking [Lam91] address transformations for conflict free access to interleaved memory [Har89, Rau91, Val91] software prefetching data to the cache [Cal91, Kla91, Soh91], and hardware prefetching vector data to cache [Bae91, Fu91, Jou90, Skl92] For a more detailed discussion of how these schemes relate to dynamic access ordering, see [McK93b] The main difference between these techniques and the complementary one we propose here is that we reorder stream ....
Klaiber, A., et. al., "An Architecture for Software-Controlled Data Prefetching", 18th International Symposium on Computer Architecture, May 1991.
....However, as was seen for the baseline processor, this mechanism is not transparent and degrades foreground thread performance on the reduced processor as well (these results have been omitted to conserve space) 5. Transparent Software Prefetching: Design and Evaluation Software prefetching [17, 18, 15] is a promising technique to mitigate the memory latency bottleneck. It hides memory latency by scheduling non blocking loads (special prefetch instructions) early relative to when their results are consumed. While these techniques provide visible latency hiding benefits, they also incur limiting ....
....5.1 Implementation Conventional Software Prefetching. As mentioned before, software prefetching schedules prefetch instructions to bring data into the cache before they are consumed by the program. Conventional software prefetching inlines the prefetch code along with the main computation code [17, 18, 15]. The inlined prefetches are software pipelined via a loop peeling transformation. Figure 17(a) shows a simple code example, and Figure 17(b) illustrates the prefetch inlining and loop peeling transformations needed to instrument software prefetching for this code example. The transformations are ....
[Article contains additional citation context not shown here]
A. C. Klaiber and H. M. Levy, "An Architecture for Software-Controlled Data Prefetching, " in Proceedings of the 18th International Symposium on Computer Architecture, (Toronto, Canada), pp. 43--53, ACM, May 1991.
....approaches. We will briefly mention some representative work and refer the interested reader to a detailed survey on the matter that was recently published [25] In the field of software prefetching early work include that done by Callahan, Kennedy, and Porterfield [4] and Klaiber and Levy [16]. The former proposed the insertion of data prefetch instructions in data intensive loops while the latter studied efficient architectural support mechanisms for data prefetch instructions. Mowry, Lam and Gupta [22] showed that careful analysis and selective prefetching could provide significant ....
A. C. Klaiber and H. M. Levy. An architecture for softwarecontrolled data prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43 -- 53, 1991.
....of their time stalled for memory accesses. 1. 2 Memory Hierarchy Optimizations Various hardware and software approaches to improve the memory performance have been proposed recently[15] A promising technique to mitigate the impact of long cache miss penalties is softwarecontrolled prefetching[5, 13, 16, 22, 23]. Software controlled prefetching requires support from both hardware and software. The processor must provide a special prefetch instruction. The soft ware uses this instruction to inform the hardware of its intent to use a particular data item; if the data is not currently in the cache, the ....
....into effective scientific engines. 1.3 An Overview This paper proposes a compiler algorithm to insert prefetch instructions in scientific code. In particular, we focus on those numerical algorithms that operate on dense matrices. Various algorithms have previously been proposed for this problem [13, 16, 23]. In this work, we improve upon previous algorithms and evaluate our algorithm in the context of a full optimizing compiler. We also study the interaction of prefetching with other data locality optimizations such as cache blocking. There are a few important concepts useful for developing ....
[Article contains additional citation context not shown here]
A. C. Klaiber and H. M. Levy. Architecture for softwarecontrolled data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43 63, May 1991.
....we use detailed executiondriven simulation of a modern processor. Although relatively little work has compared software prefetching and locality optimizations, a large body of work has studied the techniques in isolation. Software prefetching for affine array accesses has been studied in [31, 22, 4]. Hardware prefetching [7, 34, 15, 14, 20] is similarly limited to affine array accesses, but uses hardware to identify the access pattern automatically. Prefetch engines for affine array accesses [44, 6, 10, 8] provide hardware support for prefetching, but rely on the programmer or compiler to ....
A. Klaiber and H. Levy. An Architecture for Software-Controlled Data Prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43--53, Toronto, Canada, May 1991. ACM.
....5 presents the simulation results. The conclusions of this work are summarized in Section 6. 2 Related work A number of prefetch schemes for array and linked list prefetching have been proposed . All aim at identifying predicting next element to be accessed and prefetching it. Software prefetch [1, 8, 11] uses a special PREFETCH instruction to bring data into the cache in advance. It has the advantage of utilizing global information to identify memory addresses 1 most likely to miss from the cache . One disadvantage is that it will incur an execution cost even if the data is already in the cache. ....
A. C. Klaiber and H. M. Levy. An architecture for software-controlled data prefetching. In Int'l Symp. Computer Architecture, pages 43--53, 1991.
....multi chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone. This research was supported in part by NSF Computer Systems Architecture grant CCR 0093110 and NSF CAREER Award CCR 0000988. 1. Introduction Prefetching, whether using software [11, 7, 1], hardware [3, 13, 5] or hybrid [2, 4] techniques, has proven successful at hiding memory latency for applications that employ regular data structures (e.g. arrays) Unfortunately, these techniques are far less successful for applications that employ linked data structures (LDSs) due to the ....
A. C. Klaiber and H. M. Levy. An Architecture for SoftwareControlled Data Prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43-- 53, Toronto, Canada, May 1991. ACM.
....of the prefetch engine appears in Figure 5. The design requires three additions to a commodity microprocessor: a prefetch buffer, the prefetch engine itself, and two new instructions called SINIT and SY NC. Our prefetch buffer is similar to the prefetch buffers proposed for software prefetching [3]. It is a fully associative buffer that stages prefetched data prior to its access by the processor. Each time the prefetch engine issues a new prefetch request, the request address is checked in the processor s cache and in the prefetch buffer. If a miss occurs in both places, a new entry in the ....
Alexander C. Klaiber and Henry M. Levy. An Architecture for Software-Controlled Data Prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43--53, Toronto, Canada, May 1991. ACM.
....[10, 5, 1, 9] In software prefetching, it is the programmer or compiler who is responsible for deciding when and what is going to be brought to the cache or to a register. Most research on software prefetching has been devoted to regular access patterns as those found in numerical applications [7, 2, 11, 8], but lately there has also been research that tries to detect and prefetch recursive data structures [13, 17] which appear in non numerical applications. Software prefetching can be classified to be non binding or binding, depending on whether the data is brought to L1 or to the register file. ....
A. Klaiber and H. Levy. An architecture for software controlled data prefetching. 18th Annual Symposium on Computer Architecture, May 1991.
....loop iteration to every simple array reference in an inner loop, i.e. references which make direct use of the loop induction variable. They report a 20 improvement for a 50 cycle memory, but with an estimated overhead of 28 for executing prefetch instructions and address calculation. In [85], Klaiber and Levy add prefetch instructions to loops using a simple algorithm. Their prefetch instruction loads into a prefetch cache, which is separate from the normal load cache. This prevents the prefetch from interfering with normal load. In [28] Chen and Baer study a system which includes ....
....latencies. They show good results for intermediate latency memory (20 cycle latency) even though only one store can be crossed. Systems employing memory prefetch instructions provide a non blocking, non exceptional load instruction to allow loads to be migrated to earlier than normal positions [85][23] One of the ideas central to trace scheduling, allowing code to migrate across block boundaries with support to compensate for the effects of early instruction execution, has been incorporated into at least academic thought. Execution profiling to determine the most likely branch direction ....
A. C. Klaiber, H. M. Levy, An Architecture for Software-Controlled Data Prefetching, Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, vol. 19, pp. 43-53.
....some representative work that is relevant to our discussion. We refer the interested reader to a detailed survey on the matter that was recently published [26] In the field of software prefetching early works include that done by Callahan, Kennedy, and Porterfield [2] and Klaiber and Levy [15]. The former proposed the insertion of data prefetch instructions in data intensive loops while the latter studied efficient architectural support mechanisms for data prefetch instructions. Mowry, Lam and Gupta [20] showed that careful analysis and selective prefetching could provide significant ....
A. C. Klaiber, and H. M. Levy, "An architecture for software-controlled data prefetching," In Proceedings of the 18th Annual International Symposium on Computer Architecture, pp. 43 -- 53, 1991.
....improvement over a typical critical path scheduler. Snchez and Gonzlez[6] propose a way to integrate software prefetching with software pipelining in VLIW architectures. Their approach is to insert prefetch instructions into modulo scheduled loops. Others have also studied prefetch insertion[5] 4][3][2] Our CSS algorithm is designed with VLIW machines in mind, and produces code that maximizes ILP, and at the same time we do not treat all loads alike. A load that hits most of the time will not be scheduled in any special way, even if there is some ILP available. Since, there is no long ....
A.C. Klaiber and H.M. Levy, An Architecture for Software Controlled Data Prefetching, in 18th ISCA, 1990
No context found.
A. C. Klaiber and H. M. Levy, "An architecture for software-controlled data prefetching," 1991.
No context found.
Klaiber, A., Levy, H.: An Architecture for Software-Controlled Data Prefetching. Proceedings of the 18th Annual Intl. Symposium on Computer Architecture (1991) 43--53
No context found.
A. Klaiber and H. Levy. An architecture for software-controlled data prefetching. In Proceedings of the pages 43--53, 1991.
No context found.
A.C. Klaiber and H.M. Levy, "An Architecture for Software-Controlled Data Prefetching", in Procs. of 18th ISCA, pp.43-53, May 1991
No context found.
A. C. Klaiber and H. M. Levy. An architecture for software-controlled data prefetching. pages 43--53, Toronto, Canada, May 1991.
No context found.
A. C. Klaiber and H. M. Levy. An architecture for software-controlled data prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43--53, Toronto, Canada, May 1991.
No context found.
Alexander C. Klaiber and Henry M. Levy. An architecture for softwarecontrolled data prefetching. In 18th Annual International Symposium on Computer Architecture, pages 43--53, Toronto, Ontario, Canada, May 1991. Association for Computing Machinery.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC