32 citations found. Retrieving documents...
Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188--200, June 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Programmable Memory Hierarchy for Prefetching Linked Data.. - Yang, Lebeck   (Correct)

....research focuses on array based applications with regular access patterns [6 9] Correlation based prefetching [10, 11] can capture complex access patterns from the address history, but the prediction accuracy relies on the size of the prediction table and stable access patterns. Several studies [12 14] explore data structure information to insert prefetch instructions at compile time for irregular applications. Chilimbi et al. 15, 16] seek to improve cache performance of pointer based applications by reorganizing data layouts. Mehrotra et al. 17] extend stride detection schemes to capture ....

Zhang, Z., Torrellas, J.: Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture. (1995) 188-200


Effective Compile-Time Analysis for Data Prefetching in Java - Cahoon (2002)   (Correct)

.... 33] In his thesis, Gornish compares software and hardware prefetching, and presents an integrated prefetching scheme for multiprocessors [41] Zhang and Torrellas describe techniques for prefetching pointer based programs on multiprocessors using a scheme that is similar to greedy prefetching [123]. Tullsen and Eggers evaluate compiler 29 assisted software prefetching on shared memory multiprocessors [106] Ranganathan, Pai, Abdel Shafi, and Adve examine the effectiveness of software prefetching for scientific programs on a shared memory multiprocessor built with modern ILP processors ....

Zheng Zhang and Josep Torrellas. Speeding up irregular applications in sharedmemory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188-- 200, S. Margherita Ligure, Italy, June 1995. 188


Guided Region Prefetching: A Cooperative.. - Wang, Burger.. (2003)   (2 citations)  (Correct)

....to augment a reference prediction table. Skeppstedt and Dubois use a trap handler to trigger prefetching using similar information [39] Karlsson et al. 23] use prefetch arrays to enable a hardware engine to perform a generalized variant of greedy and jump pointer prefetching. Zhang and Torrellas [47] use the compiler to mark blocks in memory as belonging to contiguous spatially local regions or containing indirection pointers. Their scheme requires additional bits in main memory and significant support in the memory controller. Finally, fully programmable prefetch engines provide flexibility ....

Z. Zhang and T. Torrellas. Speeding up irregular applications in shared memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 1--19, Santa Margherita Ligure, Italy, June 1995.


MORPH: A System Architecture for Robust High Performance Using .. - Chien, Gupta (1996)   (5 citations)  (Correct)

....the critical paths and cost of runtime reconfiguration is amortized over long periods time between which these updates take place. 4. 3 Programmable Coherence to Reduce Communication Researchers have long recognized that a single data management granularity, and single cache consistency policies [25, 65, 56, 32] could not hope to serve all applications equally well. These observations are also borne by our experimental results in the following section. However, hard wired machines must be designed to handle a single common case, to simplify their implementation and as a compromise across a workload. In ....

Zhang, Z., and Torrellas, J. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the International Symposium on Computer Architecture (1995).


Using Cache as a Local Memory - Mandhani, Cook, Kremer   (Correct)

....microprocessors. With the increasing gap between processor and memory speed, high cache performance has become critical. However, poor reuse of data, conflicts between various references, and underutilization of cache capacity lead to poor cache performance for various commonly used applications [4, 23]. In this paper, we propose a new approach to improve the cache performance of programs with memory access patterns not well suited for conventional cache architectures. We propose to use the cache as a reconfigurable local memory, where the allocation deallocation of data in the local memory is ....

Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 188--199, 1995.


Quantifying Load Stream Behavior - Sair, Sherwood, Calder (2002)   (Correct)

....parts of the same object can cause multiple cache misses. To eliminate these incidental misses, one could trigger the prefetch of the whole object once a miss occurs to data within the object. This could require the prefetching algorithm to know predict the size of an object. Zhang and Torrellas [22] recognized the benefit of grouping together fields or objects that are used together, and prefetching these all together as a prefetch group of blocks. They examined using user added grouping instructions that allowed the user to group together fields objects that should be prefetched together. ....

Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In 22nd Annual International Symposium on Computer Architecture, June 1995.


Memory-Side Prefetching for Linked Data Structures - Hughes, Adve (2001)   (5 citations)  (Correct)

....traversals involve a chain of inherently serial, dependent loads the next address to be accessed is not known until the data from the previous load returns to the processor. To hide this latency, several researchers have proposed novel prefetching techniques initiated at the processor (e.g. [21, 27, 35, 36, 41]) that we refer to as processor side prefetching. More recently, Yang and Lebeck proposed prefetching initiated near memory [40] that we refer to as memory side prefetching. Their work compares the proposed memory side scheme with a processorside scheme, and finds the former to be very effective. ....

....not provided. We expect a memory side scheme to outperform the above scheme because prefetching can be initiated just as early for memory side prefetching and its prefetch engine has shorter round trip time to memory. Other processor side prefetching schemes include SPAID [26] group prefetching [41], greedy prefetching [27] LDS linearization [27] and dependence based prediction [29, 35] These schemes are more limited than the jump pointer schemes (e.g. most serialize LDS prefetches or do not handle truly dynamic LDS) The only other memory side prefetching scheme of which we are aware ....

Z. Zhang and J. Torrellas. Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In Proc. of the 22th Intl. Symp. on Comp. Arch., 1995. 25


Limitations of Hardware Data Prefetching Techniques on.. - Karlsson, Ibáñez, Ramos   (Correct)

....prefetch this, we believe that most cases where data centric is successful but the load centric is not is covered by DBP. Also, in [12] it was found that a load centric technique performed better than a data centric one for a set of kernels on a shared memory multiprocessor. Group prefetching [31] is a technique where groups of cache blocks are bound together at compile time, and as soon as one block belonging to a group is referenced during the execution, the rest of the blocks belonging to the same group are prefetched. The main drawback of this approach is that the binding is static and ....

Z. Zhang and J. Torrellas. Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 188--199, June 1995. 22


Push vs. Pull: Data Movement for Linked Data Structures - Yang, Lebeck (2000)   (2 citations)  (Correct)

....The reorganization process happens at run time. This approach greatly improves the data spatial and temporal locality. The shortcoming of this approach is that it can incur high overhead for dynamically changing structures, and it cannot hide latency from capacity misses. Zhang et. al [23] proposed a hardware prefetching scheme for irregular applications in shared memory multiprocessors. This mechanism uses object information to guide prefetch ing. Programs are annotated to bind together groups of data (e.g. fields in a record or two records linked by a pointer) either by ....

Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188--200, June 1995.


A Survey of Data Prefetching Techniques - VanderWiel, Lilja (1996)   (Correct)

....have proposed extensions to the RPT mechanism which allow for the prefetching of data objects connected via pointers. This approach adds fields to the RPT which enable the detection of indirect reference strides arising from structures such as linked lists and sparse matrices. Zheng and Torrellas [25] suggest tagging memory in such a way that a reference to one element of a data object initiates a prefetch of either other elements within the referenced object or objects pointed to by the referenced object. This approach relies upon some compiler support to initialize the tags in memory, but ....

Zhang, Z. and J. Torrellas, "Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," Proc. 22th Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995, p. 188199.


Efficient Communication Using Message Prediction for.. - Afsahi, Dimopoulos (1999)   (Correct)

....their poor performance for short messages which is extremely important for parallel computing. Prediction techniques have been proposed in the past to predict the future accesses of sharing patterns and coherence activities in distributed shared memory (DSM) by looking at their observed behavior [27, 23, 20, 39, 11, 31]. These techniques assume that memory accesses and coherence activities in the near future will follow past patterns. Sakr and his colleagues have used time series and neural networks for the prediction of the next memory sharing requests [31] Dahlgren and his colleagues devised hardware regular ....

.... of the next memory sharing requests [31] Dahlgren and his colleagues devised hardware regular stride techniques to prefetch several blocks ahead of the current data block [11] More elaborate hardware based irregular stride prefetching approaches have been proposed by Zhang and Torrellas [39]. Kaxiras and Goodman have recently proposed an instructionbased approach which maintains the history of load and store instructions in relation to cache misses and predicting their future behavior [20] Mukherjee and Hill proposed a general pattern based predictor to learn and predict the ....

Z. Zhang and J. Torrellas, "Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching", Proceedings of the 22nd Annual International Symposium on Computer Architectures, 1995, pp. 188-199.


A High-Level Abstraction of Shared Accesses - Keleher (2000)   (1 citation)  (Correct)

....variables. However, this technique could only be implemented with extensive cooperation from the underlying protocol, and so would not generally be implemented at the tapes level. Several papers have used predictive techniques to accelerate hardware coherence protocols directly. Zhang [35] described a technique similar to our producer consumer regions, but in the domain of cache lines. Their technique allows users and compilers to explicitly create arbitrary groups of cache 31 lines, which are fetched as a group. This is useful even for regular applications (where the groups will ....

Z. Zhang and J. Torrellas, "Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," in 22 nd International Symposium on Computer Architecture (ISCA), June 1995.


Hiding Communication Latency in Reconfigurable.. - Afsahi, Dimopoulos (1999)   (Correct)

....have reported these in [1] However, if the algorithm is not known, regular or simple, the approach mentioned above cannot be used. In the context of the shared memory programming, there are several works on hardware controlled and software controlled prefetching of the next shared data request [16,20,24]. T. Mowry and A. Gupta [16] have used prefetching, multithreading, and caching to hide reduce the latency in shared memory multiprocessors. M. F. Sakr et al. 20] have used time series and neural networks for the pre . 0 1 2 N 1 Beam routers Nodes Potential links Effective links ....

Z. Zhang and J. Torrellas, "Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," Proceedings of the 22nd Annual Symposium on Computer Architecture, 1995, pp. 188-199


Data Prefetch Mechanisms - VanderWiel, Lilja   (19 citations)  (Correct)

....address. The PD field of the newly prefetched block is then set to K b and the tag bit is set. This insures that the appropriate value of K is propagated through the reference stream. Prefetching for non sequential reference patterns is handled by ordinary fetch instructions. Zheng and Torrellas [39] suggest an integrated technique that enables prefetching for irregular data structures. This is accomplished by tagging memory locations in such a way that a reference to one element of a data object initiates a prefetch of either other elements within the referenced object or objects pointed to ....

Zhang, Z. and J. Torrellas, "Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," Proc. 22th International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995, p. 188- 199.


Using a User-Level Memory Thread for Correlation Prefetching - Yan Solihin Jaejin (2002)   (11 citations)  Self-citation (Torrellas)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In the 22nd International Symposium on Computer Architecture, pages 188--199, June 1995.


Using a User-Level Memory Thread for Correlation Prefetching - Yan Solihin Jaejin (2002)   (11 citations)  Self-citation (Torrellas)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In the 22nd International Symposium on Computer Architecture, pages 188--199, June 1995.


Correlation Prefetching with a User-Level Memory Thread - Solihin, Lee, Torrellas (2003)   Self-citation (Torrellas)   (Correct)

....heterogeneous system. # 1INTRODUCTION D ATA prefetching is a popular technique to tolerate long memory access latencies. Most of the past work on data prefetching has focused on processor side prefetching [6] 7] 8] 15] 16] 17] 18] 23] 24] 28] 30] 32] 35] [36]. In this approach, the processor or an engine in its cache hierarchy issues the prefetch requests. An interesting alternative is memory side prefetching, where the engine that prefetches data for the processor is in the main memory system [1] 4] 9] 14] 27] 35] Memory side prefetching ....

....on these schemes. There are many more proposals for processor side prefetching, often for irregular applications. A tiny, nonexhaustive list includes Choi et al. 8] Karlsson et al. 17] Lipasti et al. 23] Luk and Mowry [24] Mehrotra [25] Roth et al. 30] and Zhang and Torrellas [36]. Many of these schemes specifically target linked data structures. Many of them rely on program information that is available to the processor, like the addresses and sizes of data structures. Often, they need compiler support. Our scheme needs neither program information nor compiler support. ....

Z. Zhang and J. Torrellas, "Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," Proc. 22nd Int'l Symp. Computer Architecture, pp. 188199, June 1995.


Hardware Support for Thread-Level Speculation - Steffan (2003)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188--200, June 1995.


Hardware Prefetching in Bus-Based Multiprocessors.. - Garzaran, Briz..   (Correct)

No context found.

Z. Zhang and J. Torrellas. "Speeding up Irregular Applications in Shared-Memory Multiprocessors: memory Binding and Group Prefetching". Proc. 22nd ISCA, 1995: 188-199.


Managing Wire Delay in Large Chip-Multiprocessor Caches - Beckmann, Wood (2004)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188--199, June 1995.


Next-Generation Memory Systems - Wang (2004)   (Correct)

No context found.

Z. Zhang and T. Torrellas. Speeding up irregular applications in shared memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd 142 International Symposium on Computer Architecture, pages 1--19, Santa Margherita Ligure, Italy, June 1995. 143


Guided Region Prefetching: A Cooperative.. - Wang, Burger.. (2003)   (2 citations)  (Correct)

No context found.

Z. Zhang and T. Torrellas. Speeding up irregular applications in shared memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 1--19, Santa Margherita Ligure, Italy, June 1995.


Hardware Support for Thread-Level Speculation - Steffan (2003)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 188--200, June 1995.


Predictor-Directed Data Prefetching for Pointer-based Applications - Sair (2003)   (Correct)

No context found.

Z. Zhang and J. Torrellas. Speeding up irregular applications in sharedmemory multiprocessors: Memory binding and group prefetching. In 22nd Annual International Symposium on Computer Architecture, June 1995. 165


Masking Memory Access Latency with a Compiler-Assisted Data.. - VanderWiel (1998)   (Correct)

No context found.

Zhang, Z. and J. Torrellas, "Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching," Proc. 22nd International 78 Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995, p. 188199.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC