29 citations found. Retrieving documents...
J. Torrellas, M. J. Lam, and J. L. Hennessy, "Shared data placement optimizations to reduce multiprocessor cache misses," Proceedings of the International Conference on Parallel Processing, pp. 266-270, 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Using Compiler Assistance to Reduce the Network Traffic - Requirements Of..   (Correct)

....in the reference marking scheme and mark i writes conservatively. All write references to the same cache block must be marked as i writes if any of them should be marked as an i write. Our scheme can benefit from any of the optimization techniques that reduce the effects of false sharing [6, 11, 39]. Using those techniques, the number of unnecessary i writes due to false sharing can be reduced. In the worst case, our scheme will suffer from false sharing no more than a conventional invalidation scheme. The compiler optimization will have no effect on the potential benefits that could be ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. International Conference on Parallel Processing, pages 266--270, 1990. 31


Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

....array padding: 3: double pad[x] A similar problem called self interference can occur if several rows of a multidimensional array are mapped to the same set of cache lines and the rows are accessed in an alternating fashion. For both cases of interference array padding [BFS89, TLH90] provides a means to reduce the amount of conflict misses. Inter array padding inserts unused variables (pads) between two arrays to avoid cross interference between two arrays by modifying the offset of the second array so that both arrays are mapped to different cache parts. Intra array ....

J. Torrellas, M. Lam, and J. Hennessy. Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates. In Proceedings of the 1990 International Conference on Parallel Processing, volume 2, pages 266--270, Pennsylvania, USA, August 1990.


A Dynamic Cache Sub-block Design to Reduce False Sharing - Kadiyala, Bhuyan (1995)   (2 citations)  (Correct)

.... which utilize the knowledge of the cache organization, achieve significant improvement in the cache miss ratio for large block sizes [5] For situations that exhibit fine grain sharing, grouping objects according to sharing patterns [4] and properly allocating objects to separate cache lines [6] show substantial improvement in cache miss ratios. Hardware schemes that minimize false sharing miss rates do so by adjusting the effective block size or by using large address blocks and smaller 2 transfer blocks. Dubnicki and LeBlanc [7] propose a scheme in which the block size is dynamically ....

J. Torrellas, M.S. Lam and J.L. Hennessy, "Shared Data Placement Optimizations to Reduce Multiprocessor Cache Misses. ", Proc. of the 1990.


Design Memory Mapping - Lindenmaier (2000)   (Correct)

....dummy locations in memory that are never used. Two strategies are distinguished: inter array padding and intra array padding. Inter array padding adds additional pad arrays between the arrays of the program. This controls the di erence between the base addresses of the arrays. Intra array padding ([TLH90]) adds dummy elements within an array. Algorithms that insert pads between columns or rows ( BCC 94, RT98] are relatively simple as only the size of the array dimensions is changed. Other algorithms insert pads within columns or rows ( PNDN97] give an example) This means that array accesses ....

J. Torrellas, M. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the


Parallel Data Mining for Association Rules on.. - Parthasarathy, Zaki.. (2000)   (5 citations)  (Correct)

....based on access patterns to further enhance locality. Our experimental results indicate increased locality gains when grouping related data structures, rather than linearizing a single class. 7.3. Reducing False Sharing Five techniques mainly directed at reducing false sharing were proposed in [41]. They include padding, aligning, and allocation of memory requested by di erent processors from di erent heap regions. Those optimizations result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very bene cial. ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. Intl. Conf. Parallel Processing, pages II:266-270, August 1990.


Parallel Data Mining for Association Rules on Shared-memory.. - Zaki, al. (1998)   (30 citations)  (Correct)

....locality. Our experimental re 28 ZAKI, PARTHASARATHY, OGIHARA AND LI sults indicate increased locality gains when grouping related data structures, rather than linearizing a single class. 7.3. Reducing False Sharing Five techniques mainly directed at reducing false sharing were proposed in (Torrellas, Lam, Hennessy 1990). They include padding, aligning, and allocation of memory requested by different processors from different heap regions. Those optimizations result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very beneficial. ....

Torrellas, J.; Lam, M. S.; and Hennessy, J. L. 1990. Shared data placement optimizations to reduce multiprocessor cache miss rates. Intl. Conf. Parallel Processing II:266--270.


Cache Miss Equations: A Compiler Framework for Analyzing.. - Ghosh, Martonosi, Malik (1998)   (57 citations)  (Correct)

....of a direct mapped cache with strided data access to estimate the cache e#ciency. Subsequently, cache e#ciency is used by the compiler to detect unfavorable strides and determine automatically the necessary padding for array dimensions, thereby reducing cache set conflicts. In the parallel domain, Torrellas et al. 1990] have suggested alignment of arrays to cache line boundaries to reduce false sharing. Later, Bacon et al. 1994] developed a padding algorithm for selecting e#cient padding amounts, which takes into account both cache and TLB (Translation Lookaside Bu#er) e#ects collectively within a single ....

Torrellas, J., Lam, M. S., and Hennessey, J. L. 1990. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing.


Cache Miss Equations: A Compiler Framework for Analyzing.. - Ghosh, Martonosi, Malik (1998)   (57 citations)  (Correct)

....direct mapped cache with strided data access to estimate the cache efficiency. Subsequently, cache efficiency is used by the compiler to detect unfavorable strides and determine automatically the necessary padding for array dimensions, thereby reducing cache set conflicts. In the parallel domain, Torrellas et al. 1990] have suggested alignment of arrays to cache line boundaries to reduce false sharing. Later, Bacon et al. 1994] developed a padding algorithm for selecting efficient padding amounts, which takes into account both cache and TLB (Translation Lookaside Buffer) effects collectively within a single ....

Torrellas, J., Lam, M. S., and Hennessey, J. L. 1990. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proc. Int'l Conf. on Parallel Processing (Aug.).


An Innovative Implementation for Directory-based Cache.. - Shi, Hu, Zhu (1997)   (Correct)

....is the sharing of cache blocks without actual sharing of data. It occurs because cache blocks contain more than one data item. Whenever in write invalidate or write update protocols, more traffic and miss rates are caused by false sharing so that the whole system performance degrade significantly[18]. However, in software DSM systems, such as TreadMarks[1] Munin[3] etc, the false sharing problem are sloved greatly because of using lazy release consistency and multiple writer protocol[1] 3] Therefore, in this paper, we propose a new cache coherence protocol which is based on the scope ....

J.Torrellas, Monica S.Lam and John Hennessy. Shared Data Placement Optimization to Reduce Multiprocessor Cache Miss Rates. In 1990 International Conference on Parallel Processing. pp.II-266-II-270.


Reducing False Sharing on Shared Memory Multiprocessors.. - Tor Jeremiassen (1994)   (79 citations)  (Correct)

....cache configuration and cache miss penalty, the gains were more modest. 1 Introduction On bus based, shared memory multiprocessors, much of the unnecessary bus traffic, i.e. that which could be eliminated with better processor locality [AG88] is coherency overhead caused by false sharing [TLH90, EJ91] False sharing occurs when multiple processors access (both read and write) different words in the same cache block. Although they do not actually share data, they incur its costs, because coherency operations are often cache block based. In a write invalidate coherency protocol the ....

....the memory layout of write shared data and the crossprocessor memory reference pattern to it. Manually changing the placement of this data to better conform to the memory reference pattern (based on profiles derived from trace driven simulations) can reduce false sharing misses by up to 75 [TLH90, EJ91] However, manual restructuring requires that the programmer pinpoint the data structures that suffer from false sharing in a particular memory (cache) architecture. Simulation profiles are generally not available during the development cycle. To identify these data structures it is ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, volume II, pages 266--270, August 1990.


A Distributed Shared Memory System With Self-Adjusting.. - Wang, Chang (1994)   (Correct)

....Coherence can then be assured. Write frequently data objects: These objects have the write request operations more frequent than read request operations. If we apply above two coherence schemes for write frequently data, network contention occurs, which is similar to that of false sharing [25]. The false sharing problem arises when two processes residing on different processors write different addresses which fall in the same page. This type of data objects is not suitable for supporting data sharing. Migration algorithm [22] is then used. Only one copy for write frequently data exists ....

J. Torrellas, M. S. Lam and J. L. Hennessy, Shared data placement optimizations to reduce multiprocessor cache miss rates, Proc. 1990 International Conference on Parallel Processing (Aug. 1990) II-266-II-270.


Delayed Consistency And Its Effects On The Miss Rate .. - Dubois, Wang.. (1991)   (24 citations)  (Correct)

....statically or dynamically and different processes work on different partitions of the structures. In general, partition boundaries do not coincide with cache block boundaries. As a result, cache blocks are shared while no data is actually shared. This gives rise to false sharing transitions [26], which create coherence or miss CACHE P1 Store buffer coherence update buffers CACHE Pn Store buffer coherence update buffers I N T E R C O N N E C T M E M O R Y 4 activity which would not happen if each cache block contained a single data item. To demonstrate occurrences of false sharing ....

....to false sharing and because the algorithm does not require any synchronization during most of its execution. 6.0 Other Approaches 6. 1 False Sharing There have been several other proposed solutions to the problem of false sharing and the relatively low spatial locality of shared data accesses [16][26]. The first one consists of not caching shared writable data. This solution usually requires a T.L.B. Translation Lookaside Buffer) to discriminate dynamically between cacheable and non cacheable blocks. The problem with this solution is that entire data structures must be deemed non cacheable if ....

[Article contains additional citation context not shown here]

J. Torrellas, M.S. Lam, and J.L. Hennessy, "Shared Data Placement Optimizations to Reduce Multiprocessor Cache Misses," Proc. of the 1990 Int. Conf. on Parallel Proc., Aug 1990, pp. 266270.


Effective Cache Prefetching on Bus-Based Multiprocessors - Tullsen, Eggers (1995)   (14 citations)  (Correct)

....over half of the invalidation misses could be attributed to false sharing, even for the SPLASH benchmarks, which have been hand tuned for processor locality, although the total amount of false sharing in those benchmarks is rather low. We show results for a 32 byte cache line; previous work[28, 11] demonstrates that false sharing goes up significantly with larger block sizes. In [14] and [15] an algorithm is presented for restructuring shared data to reduce false sharing. While the technique has promise for improving overall performance, for the purpose of this study we are only interested ....

J. Torrellas, M.S. Lam, and J.L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In International Conference on Parallel Processing, volume II, pages 266--270, August 1990.


Using Compile-Time Analysis and Transformations to Reduce False .. - Jeremiassen (1995)   (6 citations)  (Correct)

.... have shown that large cache blocks may often increase coherency overhead, even to the point where it more than negates any benefit of the prefetching provided by the larger cache block size [LYL87, CGBG88, EK89a] This additional coherency overhead is caused by a phenomenon known as false sharing [TLH90, EJ91] False sharing in multiprocessor caches occurs when the cache block is larger than a single word, and different processors access (read and write) different words in the same cache block. It is called false sharing because, although the words in the cache block are not individually shared, ....

....their prefetching benefits can be exploited, as on uniprocessors. Manually changing the placement of data in coarse grained parallel programs to better conform to the memory reference pattern (based on profiles derived from tracedriven simulations) can reduce false sharing misses by up to 75 [TLH90, EJ91] However, manual restructuring requires that the programmer pinpoint the data structures that suffer from false sharing in a particular memory (cache) architecture. Simulation profiles are generally not available during the development cycle. To identify these data structures it is ....

J. Torrellas, M.S. Lam, and J.L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, volume II, pages 266--270, August 1990.


Directions in Parallel Programming: HPF, Shared.. - Bodin, Priol..   (Correct)

.... memory (and so decreases the size of the memory that can be allocated to the cache) and also that this may increase the amount of communication (unused data are loaded when accessing useful data) More generally, data layout optimization tries to store data so that it minimizes false sharing [61, 18]. Optimizing Data Locality Optimizing data locality relies on changing the access order to data structure so that it increases the spatial locality of a loop or it exploits better temporal locality. Loop transformations like loop interchanging, blocking, unimodular transformation may be used. ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In International Conference on Parallel Processing, pages 266--270, August 1990.


Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)   (64 citations)  (Correct)

....programs, particularly within loop nests [18] We believe compiler transformations can be very effective in eliminating conflict misses for scientific programs with regular access patterns. We evaluate two compiler transformations to eliminate conflict misses: inter and intravariable padding [3, 21]. Unlike standard compiler transfor This research was supported in part by NSF grant #CCR9711514 and NSF CAREER Award #ASC9625531 in New Technologies. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 98) Montreal, Canada, June 1998. real ....

J. Torrellas, M. Lam, and J. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, St. Charles, IL, August 1990.


Compiler Optimizations for High Performance Architectures - Han, Rivera, Tseng   (Correct)

....canbe very effective in eliminating conflict misses for scientific programs with regular access patterns. We evaluate two compiler transformations to eliminate conflict misses: 1) inter variable padding (modify variable base address) 2) intra variable padding (increase array dimension size) [2, 24]. Unlike compiler transformations that restructure the computation performed by the program, these two techniques modify the program s data layout. Two motivating examples of these transformations are shown in Figures 7 and 8. In Figure 7, unit stride references to B(i) and C(i) provide spatial ....

J. Torrellas, M. Lam, and J. Hennessy.Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, St. Charles, IL, August 1990.


Static Analysis of Barrier Synchronization in Explicitly.. - Tor Jeremiassen (1994)   (5 citations)  (Correct)

....D.1.3; I.1. 3 Keywords: Concurrent Programming; Languages and Systems 1 Introduction On cache coherent shared memory multiprocessors, much of the unnecessary communication, i.e. that which could be eliminated with locality enhancing optimizations, is coherency overhead caused by false sharing [22, 10]. False sharing occurs when multiple processors access different words in the same cache block. Although they do not actually share data, they incur its costs, because coherency operations are cache block based. In a write invalidate coherency protocol the overhead of false sharing takes the form ....

....256 bytes) 10] False sharing is caused by a mismatch between the memory layout of write shared data and the cross processor memory reference pattern to it. Manually changing the placement of this data to better conform to the memory reference pattern reduced false sharing misses by 40 to 75 [22, 10]. However, manual restructuring requires that the programmer pinpoint the data structures that suffer from false sharing in a particular memory (cache) architecture. This is hard to determine; knowledge of how each data object is shared is often non intuitive, and each application must be tailored ....

J. Torrellas, M.S. Lam, and J.L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, volume II, pages 266--270, August 1990.


Efficient Machine-Independent Programming of High-Performance.. - Tseng (1995)   (Correct)

....so that they begin on separate coherence unit boundaries. Eliminating false sharing and conflict misses for portions of a single array is a more complex task that may require extensive modification to the program. In the simplest case, array dimensions may be padded to eliminate false sharing [11, 72]. Another alternative is to transpose array dimensions. This optimization is simple to apply, but only works for certain cases of one dimensional data distributions. A more powerful solution can be derived by looking at distributed memory compilers. These compilers explicitly manage the address ....

J. Torrellas, M. Lam, and J. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, St. Charles, IL, August 1990.


Data Layout Optimizations for High-Performance Architectures - Tseng   (Correct)

....is being consistently accessed, applying split andscatter can avoid fetching data not needed. 2. 4 Array Padding Array padding differs from modifying variable base addresses in that it increases internal array dimension sizes, changing the relative layout for higher dimensions of the array [5, 45]. A simple example is shown in Figure 6. Unlike changing the base or field address of variables, array padding can eliminate conflict misses between different sections of the same array. Disadvantages include extra memory for pads within the array. Array padding must be performed at compile time. ....

J. Torrellas, M. Lam, and J. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, St. Charles, IL, August 1990.


Custom Memory Placement for Parallel Data Mining - Parthasarathy, Zaki, Li (1997)   (1 citation)  (Correct)

....based on access patterns to further enhance locality. Our experimental results indicate increased locality gains when grouping related data structures, rather than linearizing a single class. 6. 3 Reducing False Sharing Five techniques mainly directed at reducing false sharing were proposed in [28]. They include padding, aligning, and allocation of memory requested by different processors from different heap regions. Those optimizations result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very beneficial. ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. Intl. Conf. Parallel Processing, pages II:266-- 270, August 1990.


False Sharing Elimination by Selection of Runtime Scheduling.. - Jyh-Herng Chow (1997)   (1 citation)  (Correct)

....performance can be degraded due to cache line ownership becoming a serial bottleneck thus increasing the miss penalty by the waiting time for ownership. The increase in memory traffic caused by ping ponging of cache lines due to false sharing has been studied by several researchers, e.g. [18, 8, 5]. False sharing can also lead to an anomaly in which increasing the cache line size leads to an increase in the number of cache misses observed in parallel programs, even though the programs may have good spatial locality [6] In most currently implemented memory consistency mechanisms, the false ....

....sharing by considering read references in conjunction with write references, though doing so may decrease the set of loops for which false sharing elimination is guaranteed. The techniques that have been proposed in the past for reducing false sharing fall into one of the following approaches [18, 8, 5]: Changing loop structures : Transform program loops, e.g. by blocking, alignment, or peeling, so that iterations in a parallel loop access disjoint cache lines [19, 8, 7] Changing data structures : Change the layout of data structures, e.g. by array alignment and padding [18, 1] Array ....

[Article contains additional citation context not shown here]

Josep Torrellas, Monica S. Lam, and John L. Hennessy. Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates. In International Conference on Parallel Processing, pages II.266--II.270, 1990.


An Empirical Comparison of the Kendall Square Research.. - Singh, Joe, Gupta.. (1993)   (37 citations)  Self-citation (Hennessy)   (Correct)

....latencies than on CC NUMA machines. Second, the coherence protocol is more complex because it needs to ensure that at least one copy of a data item remains in some attraction memory. The attraction memory is also more complex to design than the main memory of a CCNUMA machine. A previous paper [16] described the relative advantages and disadvantages of COMA and CC NUMA architectures in detail (in the context of running a single parallel application at a time) and discussed the application characteristics that are 2 The term attraction memory is used by the DDM designers to refer to the ....

Joseph Torrellas, Monica S. Lam, and John L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the International Conference on Parallel Processing, pages 266-270, 1990. Vol. II.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)

No context found.

J. Torrellas, M. J. Lam, and J. L. Hennessy, "Shared data placement optimizations to reduce multiprocessor cache misses," Proceedings of the International Conference on Parallel Processing, pp. 266-270, 1990.


An Overview of Cache Optimization Techniques and Cache-Aware .. - Kowarschik, Weiß (2003)   (Correct)

No context found.

J. Torrellas, M. Lam, and J. Hennessy. Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates. In Proc. of the Int. Conference on Parallel Processing, volume 2, pages 266-270, Pennsylvania, USA, 1990.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC