16 citations found. Retrieving documents...
A. Gonzalez, M. Valero, N. Topham, J. Parcerisa, "Eliminating Cache Conflict Misses Through XOR-Based Placement Functions, " 11th Int'l Conf. Supercomputing, 1997, pp. 76--83.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
SUDS: Automatic Parallelization for Raw Processors - Frank (2003)   (Correct)

....node wishes to make a memory request from the logically shared memory it injects a message into the on chip interconnect directed at the memory node that owns the corresponding memory address. The owner is determined by a simple xor based hash of the address, similar to that used in some L1 caches [46]. Thus, if there are 64 tiles dedicated as memory nodes, the logically shared memory can be viewed as being banked 64 ways. After the request is injected, it travels through Raw s on chip interconnect at one machine cycle per hop (except when the message turns, which takes two machine cycles) ....

A. Gonzalez, M. Valero, N. Topham, and J.M. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In Eleventh International Conference on Supercomputing, 1997.


Optimizing Graph Algorithms for Improved Cache Performance - Joon-Sang Park Michael (2002)   (Correct)

....A number of groups have done research in the area of cache performance analysis in recent years. Detailed cache models have been developed by Weikle, McKee, and Wulf in [23] and Sen and Chatterjee in[21] XORbased data layouts to eliminate cache misses have been explored by Valero and others in [10]. A number of papers have discussed the optimization of specific dense linear algebra problems with respect to cache performance. Whaley and others discuss optimizing the widely used Basic Linear Algebra Subroutines (BLAS) in [24] Chatterjee and Sen discuss a cache efficient matrix transpose ....

A. Gonzalez, M. Valero, N. Topham, and J. M. Parcerisa. Eliminating Cache Conflict Misses through XORBased Placement Functions. In Proc. of 1997.


Evaluation of the Performance of Polynomial Set Index.. - Vandierendonck, De.. (2002)   (1 citation)  (Correct)

....function when we mean randomising set index function. Although some work has shown the benefits of randomisation functions, architects are left with the implausible task of selecting just one randomisation function to implement in a processor. Few papers provide help on this task. In [6, 7, 14, 15], the class of functions based on division of polynomials over GF (2) originally proposed for interleaved memories in vector processors [10] is promoted for use as set index functions for data caches. The performance of these functions in the presence of stride based access patterns can be shown ....

....over GF (2) originally proposed for interleaved memories in vector processors [10] is promoted for use as set index functions for data caches. The performance of these functions in the presence of stride based access patterns can be shown to be predictable. Except for a few functions in [7], no comparisons between these functions, called polynomial randomisation functions, and other XOR based randomisation functions are made. Furthermore, it is claimed that polynomials that are irreducible, which means that they cannot be written as the product of other polynomials, will lead to ....

[Article contains additional citation context not shown here]

A. Gonzalez, M. Valero, N. Topham, and J. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In ICS'97. Proceedings of the 1997.


Efficient Profile-Based Evaluation of Randomising Set.. - Vandierendonck, De.. (2001)   (1 citation)  (Correct)

....the past decade, extrapolations show that future processors will be able to access only small direct mapped caches (4 to 8 kB) in one or two clock cycles [2, 5] Such caches usually suffer from many conflict misses. By using a randomising set index function, conflict miss ratios can be reduced [6, 16, 21]. It was shown that the average miss ratio of the SPEC95 benchmark suite can be halved this way [21] There are many set index functions one can think of. Typically, one requires that the functions be cheap to evaluate, because they are in the critical path of the cache access. This condition ....

A. Gonzalez, M. Valero, N. Topham, and J. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In ICS'97. Proceedings of the 1997.


Capturing Dynamic Memory Reference Behavior with Adaptive.. - Peir, Lee, Hsu (1998)   (9 citations)  (Correct)

....[13] In [19, 8] a small fully associative buffer is proposed for holding the lines that exhibit poor temporal locality so as to prevent them from entering and polluting the primary direct mapped cache. Another approach to reducing conflict misses is to use better hashing or mapping functions [23, 20, 6]. 7 Conclusions In this paper, we observe that the direct mapped cache, instead of faithfully maintaining the lines that have been referenced recently, retains a large number of less recently used lines that are not likely to be re referenced before they are replaced. Based on this observation, ....

A. Gonzalez, M. Valero, N. Topham and J.M. Parcerisa, "Eliminating Cache Conflict Misses Through XOR-Based Placement Functions," Proc. 11th Int'l Conference Supercomputing, Vienna, Austria, 1997, pp. 76--83.


Microarchitectural and Compile-Time Optimizations for.. - Kalamatianos (2000)   (1 citation)  (Correct)

.... 5 are: i) preventing one code segment from being stored in the cache (cache bypassing [48] ii) finding an alternative place to store the conflicting code module in the cache [49] iii) implement a mapping function in hardware so that fewer code modules map to the same cache location [50, 51] and (iv) reorder the code modules in the main memory address space at compile time so that fewer conflicts may occur at run time [52, 53, 54] We pursue the last method. We first study the temporal interaction among procedures since accurate temporal information has not been used in the context ....

A. Gonzalez, M. Valero, N. Topham, and J. Parcerisa. Eliminating Cache Conflict Misses through XOR-based Placement Functions. In Proceedings of International Conference on Supercomputing, pages 76--83, July 1997.


Improving Cache Performance Via Active Management - Tam (1999)   (4 citations)  (Correct)

.... non blocking caches, which overlap multiple load misses while fulfilling other requests that hit in the cache [8] 9] 3) hardware and software prefetching methodologies that attempt to preload data from memory to the cache before it is needed [10] 11] 12] 13] 14] and 4) XOR mapping functions [15][16] and column associative caches [17] which reduce the effect of conflicts in cache block allocation. While these schemes do contribute to reducing or tolerating average data access time, this dissertation approaches the access latency problem from the premise that the average data access time can ....

A. Gonzalez, M. Valero, N. Topham, and J. M. Parcerisa, "Eliminating Cache Conflict Misses through XOR-based Placement Functions" Proceedings of the 1997 ICS, pp. 76--83, July 1997.


A New Case for . . . - Seznec   (Correct)

....with Not Recently Used replacement policy and the Enhanced Not Recently Used full 4 way skewed associative cache is not very high. 7 Summary It was known that the use of skewed associative caches may allow to significantly reduced conflict misses in numerical applications on dense structures [1, 2]. In particular, on those applications, poor data layout and or stride distribution may lead to catastrophic and unpredictable behavior when using set associative cache [4, 5, 1] Nevertheless, for most non numeric workloads, four way set associative capture the most significant part of conflict ....

A. Gonzalez, M. Valero, N. Topham, J. Parcerisa " Eliminating Cache Conflict Misses Through XOR-Based Placement Placement Functions", Proceeding of the 11th International Conference on Supercomputing, July 1997


A New Case for Skewed-Associativity - Seznec (1997)   (4 citations)  (Correct)

.... nroff sdet verilog 8A 4SHy Y 4SY2 4SL 16A (c) L2 caches Figure 7: 4 way skewed versus set associative caches INRIA 23 7 Summary It was known that the use of skewed associative caches may allow to significantly reduced conflict misses in numerical applications on dense structures [1, 2]. In particular, on those applications, poor data layout and or stride distribution may lead to catastrophic and unpredictable behavior when using set associative cache [4, 5, 1] Nevertheless, for most non numeric workloads, four way set associative capture the most significant part of conflict ....

A. Gonzalez, M. Valero, N. Topham, J. Parcerisa " Eliminating Cache Conflict Misses Through XOR-Based Placement Placement Functions", Proceeding of the 11th International Conference on Supercomputing, July 1997


Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)   (64 citations)  (Correct)

.... Euclidean algorithm [7] McKinley and Temam perform a study of loop nest oriented cache behavior for scientific programs and conclude that conflict misses cause half of all cache misses and most intra nest misses [18] Researchers have examined methods to eliminate conflict misses using hardware [11, 13] or 10 8 6 4 2 0 2 4 6 8 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 Shal Miss Rate Improv. 8 6 4 2 0 2 4 6 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 ....

A. Gonzalez, M. Valero, N. Topham, and J. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.


Eliminating Conflict Misses for High Performance Architectures - Rivera, Tseng (1998)   (27 citations)  (Correct)

....reuse. 5 Related Work Data locality has been recognized as a significant performance issue for both scalar and parallel architectures. In particular, conflict misses can cause half of all cache misses and most intra nest misses in scientific codes [18] Conflicts may be eliminated with hardware [9, 11] or operating systems support [2, 3] For many scientific codes we can achieve similar or better results through inexpensive data layout transformations. Data layout transformations applied by hand has been shown to reduce conflict misses in the SPEC benchmarks [15] Researchers working on ....

A. Gonzalez, M. Valero, N. Topham, and J. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.


The Use of Prediction for Accelerating Upgrade Misses in.. - Multiprocessors Manuel..   Self-citation (Gonzalez)   (Correct)

No context found.

A. Gonzalez, M. Valero, N. Topham and J. M. Parcerisa. "Eliminating Cache Conflict Misses through XOR-Based Placement Functions". Proc. of the Int'l Conference on Supercomputing, pp. 76--83, 1997.


Owner Prediction for Accelerating Cache-to-Cache.. - Acacio, Gonzalez, .. (2002)   (2 citations)  Self-citation (Gonzalez)   (Correct)

....table is indexed directly using ten least significant bits of the PC of the instruction missing in the L2 cache. The access to the second level table is carried out from the result of computing the XOR between bits from 5 to 18 of the missing address and bits from 2 to 15 of the PC. As in [8], we use XOR based placement to optimize the use of the entries in the second level table. Note that both prediction tables are non tagged and aliasing can occur. Finally, due to its small number of entries, the NPT is organized as a totally associative buffer structure. Application AOP UL 2Level ....

A. Gonzalez, M. Valero, N. Topham and J. M. Parcerisa. "Eliminating Cache Conflict Misses through XOR-Based Placement Functions". Proc. of the Int'l Conference on Supercomputing (ICS'97), pp. 76--83, 1997.


Data Caches for Multithreaded Processors - Montse Garca Jos   Self-citation (Gonzlez)   (Correct)

....In this section the different cache configurations are described and compared with conventional data caches. First, we study conventional caches using two families of placement functions: modulus functions and XORbased placement functions. We have chosen the bitwise XOR mapping function [6][7] 8] 9] because of its simplicity and its potential to reduce many conflict misses. After analyzing the results, we propose several data cache architectures to further decrease the miss ratio. The data cache architectures that we have considered differ in the number of indexing functions, the ....

....overall percentage of interthread misses (from 75 for DMm to 65 of 4wAm) As pointed out before, inter thread misses are one of the major bottlenecks in data caches for multithreaded processors. On the other hand, XOR based placement functions are powerful mechanism for reducing conflict misses [6][7] so we applied them to the data cache in order to evaluate their potential for reducing these critical misses. Figure 3 shows the miss ratio as a function of the placement function used to access cache. We have evaluated direct mapped and two way set associative caches. Figure 3 shows that the ....

A. Gonzlez, M. Valero, N. Topham and J.M. Parcerisa "Eliminating Cache Conflicts Misses Through XOR-Based Placement Functions" in 11 th ACM Int. Conference on Supercomputing. 1997


Data Caches for Multithreaded Processors - García, González, González (2000)   Self-citation (Gonzlez)   (Correct)

....In this section the different cache configurations are described and compared with conventional data caches. First, we study conventional caches using two families of placement functions: modulus functions and XOR based placement functions. We have chosen the bitwise XOR mapping function [6][7] 8] 9] due to its simplicity and its potential to reduce many conflict misses. After analyzing the results, we propose several data cache architectures to further decrease the miss ratio. The data cache architectures that we have considered differ in the number of indexing functions, the ....

....of inter thread misses (from 75 for DMm to 65 of 4wAm) As pointed out before, inter thread misses become one of the major bottlenecks in data caches for multithreaded processors. On the other hand, XOR based placement functions have been shown as a powerful mechanism to reduce conflict misses [6][7] so we have applied this mapping function to the data cache in order to evaluate its potential to reduce these critical misses. Figure 3 shows the miss ratio as a function of the placement function used to access cache. We have evaluated direct mapped and two way set associative caches. ....

A. Gonzlez, M. Valero, N. Topham and J.M. Parcerisa "Eliminating Cache Conflicts Misses Through XORBased Placement Functions" in 11 th ACM Int. Conference on Supercomputing. 1997


Implementation Issues in Modern Cache Memory - Peir, Hsu, Smith (1998)   (Correct)

No context found.

A. Gonzalez, M. Valero, N. Topham, J. Parcerisa, "Eliminating Cache Conflict Misses Through XOR-Based Placement Functions, " 11th Int'l Conf. Supercomputing, 1997, pp. 76--83.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC