16 citations found. Retrieving documents...
R. Bianchini and T. LeBlanc. Software caching on cachecoherent multiprocessors. In Proc. Symp. on Parallel and Distributed Processing, SPDP, pages 521--526. IEEE, May 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Parallel Data Mining for Association Rules on.. - Parthasarathy, Zaki.. (2000)   (5 citations)  (Correct)

....sharing. There are three resulting strategies, L SPP, L LPP, and L GPP, which corresponding to the Simple Placement Policy, Localized Placement Policy, and Global Placement Policy, respectively. Privatize (and Reduce) Another technique for eliminating false sharing is called Software Caching [9] or Privatization. It involves making a private copy of the data that will be used locally, so that operations on that data do not cause false sharing. For association mining, we can utilize that fact that the support counter increment is a simple addition operation and one that is associative and ....

....heap regions. Those optimizations result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very bene cial. Other techniques to reduce false sharing on array based programs include indirection [17] software caching [9], and data remapping [6] Our placement policies have utilized some of these techniques. 7.4. Memory Allocation General purpose algorithms for dynamic storage have been proposed and honed for several years [25, 24, 42] An evaluation of the performance of contemporary memory allocators on 5 ....

Ricardo Bianchini and Thomas J. LeBlanc. Software caching on cache-coherent multiprocessors. 4th Symp. Parallel Distributed Processing, pages 521-526, December 1992.


Parallel Data Mining for Association Rules on Shared-memory.. - Zaki, al. (1998)   (30 citations)  (Correct)

....sharing. There are three resulting strategies, L SPP, L LPP, and L GPP, which corresponding to the Simple Placement Policy, Localized Placement Policy, and Global Placement Policy, respectively. Privatize (and Reduce) Another technique for eliminating false sharing is called Software Caching (Bianchini LeBlanc 1992) or Privatization. It involves making a private copy of the data that will be used locally, so that operations on that data do not cause false sharing. For association mining, we can utilize that fact that the support counter increment is a simple addition operation and one that is associative and ....

....result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very beneficial. Other techniques to reduce false sharing on array based programs include indirection (Eggers Jeremiassen 1991) software caching (Bianchini LeBlanc 1992), and data remapping (Anderson, Amarsinghe, Lam 1995) Our placement policies have utilized some of these techniques. 7.4. Memory Allocation General purpose algorithms for dynamic storage have been proposed and honed for several years (Knuth 1973; Kingsley 1982; Weinstock Wulf 1988) An ....

Bianchini, R., and LeBlanc, T. J. 1992. Software caching on cache-coherent multiprocessors. 4th Symp. Parallel Distributed Processing 521--526.


Impacts of Network Latency on Parallel Virtual Memory Management - Reis, Scherson (1998)   (Correct)

.... cache or disk) Management policies must define: Page Placement (where to put it ) Page Identification (how to find it ) Page 1 It is assumed, for simplicity, that a compiler or some preprocessing tool,such as Chaos [P 95] the compiler directives of HPF [Lov93] or software caching [BL92] divides PD in blocks in an optimized way, exploring data locality as much as possible. Replacement; Write Strategies (write through, write back etc. Write Miss Strategies (to allocate or not to allocate ) These questions also apply to traditional VM, and some have been already analyzed ....

R. Bianchini and T. J. LeBlanc. Software Caching on Cache-Coherent Multiprocessors. In Proceedings of the 4th IEEE Symposium on Parallel and Distributed Processing, pages 521 -- 526, 1992.


A Cost-Comparison Approach for Adaptive Distributed Shared Memory - Kim, Vaidya (1996)   (2 citations)  (Correct)

....proposed in [7] where several categories of shareddataobjectsare identified: conventional, readonly, migratory, write shared, and synchronization. But, with their approach, the programmer needs to know the memory access behaviors on each shared variable to specify a protocol used for the variable. [5, 11, 18] also presentotherschemesto reducecoherency overhead. IRG (Inter Reference Gap) model for the time interval betweensuccessive referencesto the same addresswaspresentedin [24] It estimates the future IRG values by using prediction based algorithm and can be used for memory replacementalgorithm, ....

R. Bianchini and T. LeBlanc, "Software caching on cache-coherent multiprocessors,"in Proceedings of InternationalConference on Parallel and DistributedProcessing, pp. 521--526, 1992.


Compiler Optimizations for Cache Locality and Coherence - Li (1994)   (8 citations)  (Correct)

....variables, it may cause the block to pingpong between processors under the write invalidate protocol or to be updated unnecessarily under the write update protocol. This problem is called false sharing, which has been identified by many researchers as a major obstacle to high performance [12, 3, 4, 10]. Therefore, in a multiprocessor system, we need to avoid false sharing and exploit cache locality. In this paper, we make the following contributions: ffl We develop a new data reuse model and an algorithm called height reduction to improve cache locality. The advantage of this algorithm is that ....

....= k 1, i A(i,j) A(i,j) A(i,k)A(j,k) end for end for end for 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Processors (a) b) Figure 3: Dense Cholesky with False Sharing on KSR Multiprocessors 1. 4 False Sharing Many researchers have identified false sharing as a major obstacle to high performance [12, 3, 4, 10]. False sharing occurs when two or more processes access non overlapping portions of the same coherence block (at least one of them with writes) A block is pingponging when it has been moved back and forth between two processors due to false sharing. For example, if variable x is used repeatedly ....

[Article contains additional citation context not shown here]

R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521--526, December 1992.


Using Compile-Time Analysis and Transformations to Reduce False .. - Jeremiassen (1995)   (6 citations)  (Correct)

....exhibited better time complexity (O(n 2 ) vs. exponential in loop nest depth) Instead of searching the space of possible loop permutations, they used an algorithm that directly computed the preferred loop nest structure. A different approach altogether was taken by Bianchini and LeBlanc [BL92] They studied the effectiveness of using software caching techniques to eliminate false sharing on cache coherent multiprocessors. Their workload consisted of one program, parallel shellsort, an algorithm with a dynamically changing cross processor memory reference pattern. Their application ....

R. Bianchini and T.J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings fo the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521--526, June 1992.


Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)   (92 citations)  (Correct)

....as follows: two or more processes access non overlapping portions of the same coherence block (at least one of them with writes) causing unnecessary coherence traffic and data movement. False sharing has been found to be a serious obstacle to high performance on distributed shared memory machines [12, 4, 6]. Previous work on compiler algorithms for cache locality has been on loop transformations, i.e. control transformations. Wolf and Lam [32] focus on loop tiling of the innermost loops as a means of achieving cache locality. They try all possible subsets of the loops in the loop nest. The subset ....

....the number of cache lines for uniprocessor machines, when the cache line size is 1. Li [24] developed a new reuse model and algorithms that can optimize for not only dense matrix algorithms but also banded matrix algorithms. The work by Eggers and Jeremiassen [12] and by Bianchini and LeBlanc [4] showed that for some programs, program restructuring and data restructuring can eliminate or reduce false sharing so that performance can be improved. However, these transformation techniques are all performed by hand on specific application programs. Dubois et al. describe a hardware mechanism ....

R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521--526, December 1992.


Compiler Cache Optimizations for Banded Matrix Problems - Li (1995)   (14 citations)  (Correct)

....variables, it may cause the block to pingpong between processors under the write invalidate protocol or to be updated unnecessarily under the write update protocol. This problem is called false sharing, which has been identified by many researchers as a major obstacle to high performance [12, 4, 5, 10]. Therefore, in a multiprocessor system, we need to exploit cache locality and avoid false sharing. In this paper, we make the following contributions: ffl We develop a new data reuse model and a compiler algorithm called height reduction to improve cache locality. The advantage of this algorithm ....

....parallel machines. As a by product, the technique can also improve cache locality by generating stride one accesses. A recent work by Cierniak and Li [8] unifies both data and control transformations for improving data locality. The work by Eggers and Jeremiassen [12] and by Bianchini and LeBlanc [4] shows that for some programs, program restructuring and data restructuring can eliminate or reduce false sharing so that the performance can be improved. However, these transformation techniques are all performed by hand on specific application programs. No systematic compiler transformation ....

R. Bianchini and T. J. LeBlanc. Software caching on cachecoherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521--526, December 1992.


Custom Memory Placement for Parallel Data Mining - Parthasarathy, Zaki, Li (1997)   (1 citation)  (Correct)

....sharing. There are three resulting strategies: 1) Lock region Simple Placement Policy (L SPP) 2) Lock region Localized Placement Policy (L LPP) and 3) Lockregion Global Placement Policy (L GPP) Privatize (and Reduce) Another technique for eliminating false sharing is called Software Caching [8] or Privatization. It involves making a private copy of the data that will be used locally, so that operations on that data do not cause false sharing. For association mining, we can utilize that fact that the support counter increment is a simple addition operation and one which is associative ....

....heap regions. Those optimizations result in a good performance improvement for the applications they considered. In our case study padding and aligning was not found to be very beneficial. Other techniques to reduce false sharing on array based programs include indirection [12] software caching [8], and data remapping [6] Our placement policies have utilized some of these techniques. 6.4 Memory Allocation General purpose algorithms for dynamic storage have been proposed and honed for several years [19, 18, 29] An evaluation of the performance of contemporary memory allocators on 5 ....

Ricardo Bianchini and Thomas J. LeBlanc. Software caching on cache-coherent multiprocessors. 4th Symp. Parallel Distributed Processing, pages 521--526, December 1992.


Towards an Adaptive Distributed Shared Memory - Kim, Vaidya (1995)   (3 citations)  (Correct)

.... 50,000 0 Total Amount of Data (Bytes) 70,000 K 60,000 K 50,000 K 40,000 K 30,000 K 20,000 K 10,000 K Figure 15: Experiment 8: Performance Comparison: Total costs over the entire application 6 Related Work Many schemes have been proposed to reduce overhead by adapting to memory access patterns [2, 33, 11, 8, 29, 24, 9, 1, 14, 10, 4, 7, 20, 5, 26, 27, 3, 21, 16, 6]: ffl The approach proposed in this paper is related to the work by Veenstra and Fowler [31] 31] evaluates the performance of three types of off line algorithms: i) an algorithm that chooses statically, at the beginning of the program, either invalidate or update protocols on a per page basis, ....

....the shared pages are updated with the local images by invoking the define global system call. ffl Bianchini and LeBlanc address software caching which can adapt to changes in memory reference behavior by making a new copy of data and repartitioning the data as needed for each phase of execution [5]. ffl [21] presents a flexible communication mechanism. Their scheme uses a programmable node controller, called MAGIC. MAGIC is responsible for implementing the cachecoherence and message passing protocols. ffl Hybrid protocol is more appropriate than a pure protocol for a DSM, if the access ....

R. Bianchini and T. LeBlanc, "Software caching on cache-coherent multiprocessors," in Proceedings of International Conference on Parallel and Distributed Processing, pp. 521-- 526, 1992.


Unifying Data and Control Transformations for Distributed Shared .. - Cierniak (1994)   (92 citations)  (Correct)

....as follows: two or more processes access non overlapping portions of the same coherence block (at least one of them with writes) causing unnecessary coherence traffic and data movement. False sharing has been found to be a serious obstacle to high performance on distributed shared memory machines [12, 4, 6]. Previous work on compiler algorithms for cache locality has been on loop transformations, i.e. control transformations. Wolf and Lam [32] focus on loop tiling of the innermost loops as a means of achieving cache locality. They try all possible subsets of the loops in the loop nest. The subset ....

....the number of cache lines for uniprocessor machines, when the cache line size is 1. Li [23] developed a new reuse model and algorithms that can optimize for not only dense matrix algorithms but also banded matrix algorithms. The work by Eggers and Jeremiassen [12] and by Bianchini and LeBlanc [4] showed that for some programs, program restructuring and data restructuring can eliminate or reduce false sharing so that performance can be improved. However, these transformation techniques are all performed by hand on specific application programs. Dubois et al. describe a hardware mechanism ....

R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521--526, December 1992.


Eliminating Useless Messages in Write-Update Protocols .. - Bianchini, LeBlanc.. (1994)   (3 citations)  Self-citation (Bianchini Leblanc)   (Correct)

....Size Running Time of SC Blocked LU WI Infinite BW WU Infinite BW WI High BW WU High BW WI Medium BW WU Medium BW Figure 23: Running Time of SC Blocked LU Under Pure WU Protocol. so that the processor caches can be more efficiently utilized. We experiment with software caching [Bianchini and LeBlanc, 1992], a technique originally proposed to reduce false sharing in WI protocols that, in effect, allows for a reorganization of the data. Software caching consists basically of copying a range of virtual addresses to a different range of virtual addresses, allowing the application to determine when ....

R. Bianchini and T. J. LeBlanc, "Software Caching on CacheCoherent Multiprocessors," In Proceedings of the 4th Symposium on Parallel and Distributed Processing, Dallas, TX, December 1992.


Categorizing Network Traffic in Update-Based Protocols .. - Bianchini, LeBlanc.. (1996)   (1 citation)  Self-citation (Bianchini Leblanc)   (Correct)

....by the fact that processors access columns of the row major allocated shared matrix during parts of the computation. We can eliminate many of these updates by changing the data layout dynamically, so that the processor caches are more efficiently utilized. We experiment with software caching [3], a technique originally proposed to reduce false sharing in WI protocols that, in effect, allows for a reorganization of the data. In the software caching version of Blocked LU, each processor makes a local, re organized, copy of the data it needs, modifies the data as appropriate, and copies the ....

R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the 4th Symposium on Parallel and Distributed Processing, Dallas, TX, December 1992.


Eliminating Useless Messages in Write-Update Protocols .. - Bianchini, LeBlanc.. (1994)   (3 citations)  Self-citation (Bianchini Leblanc)   (Correct)

.... data items to the size of a cache line [Torellas et al. 1990] aligning data structures on cache line boundaries, and reorganizing data structures so as to combine data items with similar sharing patterns [Eggers and Jeremiassen, 1991] We experiment with two other techniques, software caching [Bianchini and LeBlanc, 1992] and indirection [Eggers and Jeremiassen, 1991] to reduce false sharing in Blocked LU. We call the resulting programs SC Blocked LU and Indirect Blocked LU, respectively. Software caching consists basically of copying a range of virtual addresses to a different range of virtual addresses, ....

R. Bianchini and T. J. LeBlanc, "Software Caching on CacheCoherent Multiprocessors," In Proceedings of the 4th Symposium on Parallel and Distributed Processing, Dallas, TX, December 1992.


New Parallel Algorithms for Frequent Itemset.. - Veloso.. (2003)   (Correct)

No context found.

R. Bianchini and T. LeBlanc. Software caching on cachecoherent multiprocessors. In Proc. Symp. on Parallel and Distributed Processing, SPDP, pages 521--526. IEEE, May 1992.


Identification And Optimization Of Sharing Patterns For Scalable.. - Kaxiras (1998)   (4 citations)  (Correct)

No context found.

R. Bianchini and T.J. LeBlanc, "Software Caching on Cache-Coherent Multiprocessors. " In Proc. of the 4th Symposium on Parallel and DIstributed Processing, Dec. 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC