31 citations found. Retrieving documents...
R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Data prefetching in shared memory multiprocessors," Proceedings of the International Conference on Parallel Processing, pp. 28-31, 1987. 150

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Comparative Evaluation of Latency Reducing and.. - Gupta, Hennessy.. (1991)   (103 citations)  (Correct)

....load) is bound at the time when the prefetch completes. This places restrictions on when a binding prefetch can be issued, since the value will become stale if another processor modifies the same location during the interval between prefetch and reference. Binding prefetching studies done by Lee [17] reported significant performance loss due to such limitations. In contrast, non binding prefetching also brings the data close to the processor, but the data remains visible to the cache coherence protocol and is thus kept consistent until the processor actually reads the value. ....

R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In Proc. Int. Conf Paral. Proc., pages 2831, August 1987.


Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (232 citations)  (Correct)

....processor. Prefetching in DASH is non binding in the sense that prefetched data remains visible to the cache coherence protocol [18] to keep it consistent until the processor actually reads the value through a binding access (e.g. a register load operation) In contrast, with binding prefetching [9, 14] the value of a later reference is bound (e.g. a processor register is loaded) at the time the prefetch completes. As a result, there are restrictions placed on when a binding prefetch can be issued, since the prefetched value may become stale if another processor modifies the same location. For ....

....can be issued, since the prefetched value may become stale if another processor modifies the same location. For example, a binding prefetch cannot be issued if there is a synchronization reference between the prefetch and the subsequent binding reference. Binding prefetching studies done in [14] reported significant performance loss due to such interactions. Non binding prefetching imposes no such restrictions on when a prefetch can be issued; the coherence protocol ensures that the value fetched by the final binding read will be correct. This flexibility considerably simplifies the task ....

[Article contains additional citation context not shown here]

Roland L. Lee, Pen-Chung Yew, and Duncan H. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of the 1987.


Neighborhood Prefetching on Multiprocessors Using Instruction.. - Koppelman (2000)   (1 citation)  (Correct)

....by special software prefetch instructions (available, for example, in SPARC V9 and IA 64 [13,27] inserted by the compiler or programmer. In contrast, hardware prefetching requires no code modification and su#ers no code expansion. See [15,21] for software prefetching on serial systems and [10,19,22] for parallel systems. 7.1. Hardware Prefetching Work on hardware sequential prefetching for serial systems dates back to the 70 s, when the number of instructions that could execute in a miss delay was much smaller than it is today. Sequential prefetching schemes were investigated by Bennett et ....

R.L. Lee, P.C. Yew, and D.H. Lawrie, "Data prefetching in shared memory multiprocessors".Proc.oftheIntl.Confer- ence on Parallel Processing. August 1987, pp. 28--31.


A Survey of Data Prefetching Techniques - VanderWiel, Lilja (1996)   (Correct)

....of K up to 32 due to the more informed prefetches that vector stride information affords. When such high level information cannot be supplied by the programmer, prefetch opportunities can be detected by hardware which monitors the processor s instruction stream or addressing patterns. Lee, et al. [16] examined the possibility of looking ahead in the instruction stream to find memory references for which prefetches might be dispatched. This approach requires that instructions be brought into a buffer and decoded early so that the operand addresses may be calculated for data prefetching. ....

Lee, R.L., P.-C. Yew and D.H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors," Proc. of the 1987 International Conference on Parallel Processing, University Park, PA, USA, Aug. 1987, p. 28-31


Speculative Multiprocessor Cache Line Actions Using Instruction.. - Koppelman (1997)   (1 citation)  (Correct)

....discussion they will be split into two types: prefetching and cache management. 5.1 Prefetch Software prefetch schemes use prefetch instructions, inserted by the programmer or compiler, which bring data to a cache but otherwise have no e#ect. See [14,20] for prefetching on serial systems and [9,17] for parallel systems. Software prefetching works well when the data needed can be identified far enough in advance. SLID does not depend upon such identification by a compiler or programmer; no changes at all need be made to object code. Prefetch instructions are not needed in hardware ....

Lee, R.L., Yew, P.C., and Lawrie, D.H. Data prefetching in shared memory multiprocessors. Proc. of the Intl. Conference on Parallel Processing. August 1987, pp. 28--31.


Decoupled Pre-Fetching for Distributed Shared Memory - Watson, Rawsthorne (1995)   (4 citations)  (Correct)

....favour although recent studies [14] have produced optimistic results for decoupled architectures on a wide variety of programs. Decoupling is clearly a pre fetching technique, although, to our knowledge, it has not been studied in the context of cache pre fetching in multi processors. Lee et al. [15] proposed a scheme where pre fetches are initiated from an instruction lookahead buffer in a uni processor environment but this does not involve decoupling the fetch and execute and is more an application of pipelining. Decoupling is relatively complex and, in most proposals, requires specialized ....

R.L. Lee, P.C. Yew, D.H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors", Proceedings of the 1987 Conference on Parallel Processing, pp28-31.


Compiling Techniques for Improving Decoupled Virtual Shared Memory.. - Zhu   (Correct)

....Most of prefetching approaches show limited adaptivity [38] 1.4.1 Hardware Based Prefetching Hardware prefetching schemes are used in scalar and vector machines. A special hardware mechanism is used to decide how to prefetch data based on history information and simple lookahead information [23, 5, 16, 69, 32, 45, 60, 78, 50]. It is transparent to the programmer. But sometimes, it fails to adapt itself to application programs and results in low efficiency and high cost. 1.4.2 Software Based Prefetching If a machine, such as DEC Alpha [24] provides a instruction suitable to be used as a prefetching instruction, data ....

R. Lee, P.C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of 1987 International Conference on Parallel Processing, pages 28--31, August 1987.


Data Preload For Superscalar And VLIW Processors - Chen, Jr. (1993)   (16 citations)  (Correct)

....faster processor and a slower memory becomes more diverse, it is increasingly difficult to find independent instructions to perform useful computation. 2.3. 5 Data cache prefetching Data prefetching is an effective means of reducing the penalty of long memory access time beyond the primary cache [16] [26] Data prefetching is typically performed for scientific applications, in which the performance of caches is often inadequate. The idea of cache prefetching is to have the data available in the cache when the actual memory access occurs. Several prefetch strategies have been presented in the ....

....cache when the actual memory access occurs. Several prefetch strategies have been presented in the past. Some of these approaches use software support to issue prefetches, while others are strictly hardware based. Hardware based prefetch methods have been proposed to issue prefetches dynamically [16], 25] 24] It can be as simple as implicit prefetching through a long cache block or as complicated as utilizing a separate data path for looking ahead in the instruction stream for potential prefetches. Two advantages are that hardware based methods do not add instruction overhead to issue ....

[Article contains additional citation context not shown here]

R. L. Lee, P. C. Yew, and D. H. Lawrie, "Data prefetching in shared memory multiprocessors, " in Proceedings of 16th International Conference on Parallel Processing, pp. 28--31, Aug. 1987.


Performance Modeling of Eagersharing Distributed Memory.. - Li, Hermannsson, Wittie   (Correct)

....the logically shared memory. A demand driven DSM system[8, 2] momentarily halts processors whenever they need data that must be fetched across the network. This consumer initiated data sharing approach minimizes network traffic, but introduces long delays for remote accesses. Prefetch techniques[4, 7] can lessen delays and improve system performance. However, a requestor cannot easily predict exactly when a needed remote datum will be ready for prefetching, so delays persist in demanddriven DSM systems. In an eagersharing DSM system[9, 11, 12] each processor sharing variable s has a local ....

R.L. Lee, P. Yew, and D.H. Lawrie. Data Prefetching in Shared Memory Multiprocessors. Proceedings of the 1987 Int. Conf. on Parallel Processing, pages pp.28--31, August 1987.


A Design of Performance-optimized Control-based Synchronization - Min, Hsu, Kim (1991)   (Correct)

....the disadvantage of data prefetching in the cache hits to irrelevant prefetching of data. Data prefetching reduces the latency of cache misses on needed data. Most previous studies on data prefetching in large scale shared memory multiprocessors have concentrated on prefetching within an epoch [9, 11]. In these studies, prefetching across different epochs is disallowed to prevent prefetching requests from accessing the contents of the memory locations that are to be written by the processors yet to be synchronized. In this section, we will discuss a technique that allows prefetching across ....

R. L. Lee, P. C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of the 1987 International Conference on Parallel processing, pages 28--31. IEEE, August 1987.


A Data Prefetch Mechanism for Accelerating General-Purpose.. - Luddy Harrison (1994)   (4 citations)  (Correct)

.... and data references, both in the context of uniprocessors and multiprocessors [CB94a, BL94, CR94, Chi94a, Chi94b, DS95, Gor95, PK94, SMH94, YGHH94, DDS93, EV93, JT93, TE93, CB92, FPJ92, SH92, McF92, MLG92, Sel92, Skl92, VS92, CMH 92, BC91, CKP91, FP91, KL91, MG91, CMCH91, GGV90, Jou90, SR88, LYL87, Smi78] Unfortunately, most published data prefetching schemes are generally limited to generating prefetches for arithmetic progressions of memory addresses in loop nests, typically those found in regular scientific codes. While scientific codes are an important class of applications, modern ....

Roland L. Lee, Pen-Chung Yew, and Duncan H. Lawrie. Data Prefetching in Shared Memory Multiprocessors. In Proceedings of the 1987 International Conference on Parallel Processing, pages 28--31, August 1987.


Data Prefetching for High-Performance Processors - Chen (1993)   (24 citations)  (Correct)

....the LA PC is simply incremented by one. When the LA PC finds an entry in the BPT, it indicates that the LA PC points to a branch instruction. In that case, the prediction result of the branch entry in the BPT is provided to modify the LA PC. Note that, unlike the instruction prefetch structure in [Lee et al. 87a] or decoupled architectures[Smith 82b] the system does not need to decode the predicted instruction stream. Instead, the lookahead mechanism is based on the history information of the execution stream. 3.3.1 Lookahead Program Counter (LA PC) and RPT In the generic prediction scheme, prefetching ....

Lee, R. L., Yew, P.-C., and Lawrie, D. H. (1987a). Data prefetching in shared memory multiprocessors. In Proc. of the Int. Conf. on Parallel Processing, pages 28--31. 126


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....as three separate categories. The work by Tse and Smith [7] Chen and Baer [8, 22] Dahlgren, Dubois, and Stenstrom [9] Charney and Reeves [12] Palacharla and Kessler [16] Eickemeyer and Vassiliadis [20] Temam and Jegou [21] Fu [24] Varma and Sinha [29] Jouppi [33] and Lee, Yew, and Lawrie [36] represents a broad cross section of research in hardware data prefetching. Most of this work can be divided broadly into two categories: schemes that react to patterns of cache misses, and schemes that detect linear strides in address sequences. Neither category is sufficient for handling complex ....

R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors," in Proceedings of the 1987 International Conference on Parallel Processing, pp. 28--31, Aug. 1987.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....

....as three separate categories. The work by Tse and Smith [7] Chen and Baer [8, 22] Dahlgren, Dubois, and Stenstrom [9] Charney and Reeves [12] Palacharla and Kessler [16] Eickemeyer and Vassiliadis [20] Temam and Jegou [21] Fu [24] Varma and Sinha [29] Jouppi [33] and Lee, Yew, and Lawrie [36] represents a broad cross section of research in hardware data prefetching. Most of this work can be divided broadly into two categories: schemes that react to patterns of cache misses, and schemes that detect linear strides in address sequences. Neither category is sufficient for handling complex ....

R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors," in Proceedings of the 1987 International Conference on Parallel Processing, pp. 28--31, Aug. 1987.


The Combined Effectiveness of Unimodular.. - Saavedra, Mao.. (1996)   (13 citations)  (Correct)

....Many hardware and software mechanisms have been proposed to eliminate and or hide the large memory latencies required in interprocessor communication. Of the hardware mechanisms, coherent caches [Tang76, Cens78, YenW85] relaxed memory models [Dubo88, Sche88, Adve90, Ghar90] data prefetching [LeeR87, Port89, Mowr91], and multithreading [Smit78, Hals88, Agar90] are considered the most promising. Equally important are program transformations which make it possible to exploit the benefits offered by the hardware mechanisms. For example, it is known that without some amount of loop restructuring, such as ....

Lee, R.L, Yew, P,-C., and Lawrie, D.H., "Data Prefetching in Shared Memory Multiprocessors", Proc. Int. Conf. Par. Proc., August 1987, pp. 28-31.


Tango: a Hardware-based Data Prefetching Technique for.. - Pinter (1996)   (16 citations)  (Correct)

....bus for predictions with small strides whenever the relevant part is in some cache line (cache block) 2 Pre program counter. Even with good prediction, data can be prefetched too early or too late to be useful. One way for solving this problem is a lookahead scheme. The lookahead scheme in [10] is based on generating data prefetch for operands simultaneously with the decoding of the instruction. Prediction and lookahead are integrated by Baer and Chen in [1, 3] to support prefetch for scalar processors. In this on chip scheme the stride prediction is calculated with a reference ....

R. L. Lee, P. C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In International Conference on Parallel Processing, pages 28--31. CRC Press, Inc., 1987.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   Self-citation (Yew)   (Correct)

No context found.

R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Data prefetching in shared memory multiprocessors," Proceedings of the International Conference on Parallel Processing, pp. 28-31, 1987. 150


Efficient Integration of Compiler-directed Cache Coherence And.. - Lim, Yew (2000)   (1 citation)  Self-citation (Yew)   (Correct)

....CCDP scheme. Although our Cray T3D implementation provided substantial performance improvements for the system [20, 21] the performance of the scheme can be further improved by optimizing the prefetch hardware support. Various researchers have developed sophisticated hardware prefetching schemes [4, 10, 11, 15, 18]. These schemes make use of hardware features to dynamically predict the data references to prefetch at run time. However, these prefetch hardware designs are not suitable for the CCDP scheme because they cannot distinguish between potentially stale and nonstale references and take proper actions ....

R. L. Lee, P.-C. Yew, and D. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of the 1987.


Maintaining Cache Coherence through Compiler-Directed Data.. - Lim, Yew (1998)   Self-citation (Yew)   (Correct)

....the Cray T3D, thus providing the first implementation of a sophisticated compiler directed cache coherence scheme on a commercial MPP system. 2.2 Data Prefetching Schemes Data prefetching is also an active research area. Various researchers have proposed hardwarecontrolled data prefetching schemes [3, 10, 11, 15, 19] and software initiated data prefetching schemes [14, 25, 29] which can be implemented in multiprocessors. Recent efforts have focused on the design of prefetching schemes which can handle irregular data reference pat4 terns [22] and also the implementation and performance evaluation of software ....

R. L. Lee, P.-C. Yew, and D. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of the 1987 International Conference on Parallel Processing, pages 28--31, August 1987. Also available as CSRD Tech. Report 639, January 1987.


Data Prefetching And Data Forwarding In Shared Memory.. - Poulsen, Yew (1994)   (13 citations)  Self-citation (Yew)   (Correct)

....latency for shared accesses [5] Accordingly, studies that compare data prefetching and data forwarding have not yet appeared in the literature. Other latency reduction techniques not considered in this paper include multithreading [27, 28] dynamic instruction scheduling and instruction lookahead [29, 30] and decoupled access execute architectures [31, 32] 5.1 Data Prefetching Many software initiated prefetching schemes have been proposed for uniprocessor [2, 3, 33] and multiprocessor architectures [4, 6, 7, 20, 26, 34] In addition, hardware initiated prefetching schemes that have the ....

Lee, R. L., Yew, P.-C., and Lawrie, D. H., "Data Prefetching in Shared Memory Multiprocessors", Proceedings of ICPP, 1987, pp. 28-31.


Cache Filtering Techniques to Reduce the Negative Impact of .. - Onur Mutlu Hyesoon (2004)   (Correct)

No context found.

R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In Proceedings of the Intl. Conference on Parallel Processing, 1987.


Optimizing Communication and Data Distribution for.. - Palermo   (Correct)

No context found.

R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Data prefetching in shared memory multiprocessors, " in Proceedings of the 16th International Conference on Parallel Processing, St. Charles, IL, Aug. 1987, pp. 28--31.


Maximizing Memory Bandwidth for Streamed Computations - McKee (1995)   (7 citations)  (Correct)

No context found.

R.L. Lee, P.-C. Yew, and D.H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors", Proceedings of the International Conference on Parallel Processing, pages 28-31, August 1987.


Evaluation of Hardware-Based Stride and Sequential.. - Dahlgren, Stenström   (14 citations)  (Correct)

No context found.

R. Lee, P-C. Yew, and D. Lawrie, "Data Prefetching in Shared-Memory Multiprocessors," in Proc. Int. Conf. Parallel Processing, 1987, pp. 28-31.


Conclusions - This Dissertation Is   (Correct)

No context found.

Lee, Roland L., Lew, Pen-Chung, and Lawrie, Duncan H., "Data Prefetching in Shared Memory Multiprocessors." In International Conference on Parallel Processing, IEEE, 1987, pp. 28-31.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC