| Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical report, Stanford University, February 1990. Report No. CSL-TR-90-412. 15 |
....in program accesses and has the advantage that it requires no software intervention. Large cache lines also successfully utilize the block transfer capabilities of modem memory systems. Although the utility of multiword cache lines is almost universally accepted nowadays [23, 21] recent data [5, 15, 25] show that cache lines should be kept small for multiprocessors. The primary reason is that the spatial locality is considerably reduced in the process of parallelizing applications for multiprocessors, and large cache lines can result in a significant increase in memory system traffic. Also, ....
J. Torrellas, M. S. Lam, and J. L. Hennessy. Measurement, analysis, and improvement of the cache behavior of shared data in cache coherent multiprocessors. Technical Report CSL-TR-90-412, Stanford University, February 1990.
....of benefit due to prefetching on a per page basis. These calculations are done o# line since there is a considerable amount of state information necessary, and in some cases we need to know what the future references are. 3.2. 1 Invalidate Coherency The technique used is similar to the one used in [25], but we calculate the bytes transferred due to false sharing rather than just counting the false sharing misses. Two concurrent trace driven simulations are run: one is identical to the simulation that generated the trace (multi word line size) and the other tracks the location and status of ....
Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical Report CSL-TR-90-412, Computer Systems Laboratory, Stanford University, Stanford, CA 94305, February 1990.
....An efficient communication system allows the micro kernel approach to be used without a significant performance penalty. Large scale memory systems can hurt communication performance since the cost of copying data in these machines increases, due to generally poor cache behavior of copying [17] and the increasing processor speed to average memory access ratio. The cost of remapping data in multiprocessor systems also appears to be significantly greater than in uniprocessors [4, 14] because of the need to update or invalidate the TLBs, or page tables, in all processors. Moreover, the ....
J. Torellas, M.S. Lam, and J.L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. In Workshop on Scalable Shared-Memory Architectures. Seattle, May 1990.
....the whole machine. Unfortunately, the performance of communication systems implemented using standard shared memory techniques decreases with larger scale memory systems for several reasons. First, the cost of copying data in these machines increases because of the poor cache behavior of copying [24] and the increasing ratio of processor speed to average memory access time. Second, the cost of remapping data in multiprocessor systems, as an alternative to copying, is greater than in uniprocessors [4, 21] because of the need to update or invalidate the TLB or page table in each processor. ....
J. Torrellas, M.S. Lam, and J.L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. In Workshop on Scalable Shared-Memory Architectures. Seattle, May 1990.
....and is thus kept consistent until the processor actually reads the value. Hardware controlled prefetching includes schemes such as long cache lines and instruction look ahead [16] The effectiveness of long cache lines is limited by the reduced spatial locality in multiprocessor applications [6, 28], while instruction look ahead is limited by branches and the finite look ahead buffer size. With software controlled prefetching, explicit prefetch instructions are issued. Software control allows the prefetching to be done selectively (thus reducing bandwidth requirements) and extends the ....
J. Torrellas, M. S. Lam, and J. L. Hennessy. Measurement, analysis, and improvement of the cache behavior of shared data in cache coherent multiprocessors. Technical Report CSLTR -90-412, Stanford University, Feb. 1990.
....sharing costs as well as the amount of benefit due to prefetching on a per page basis. These calculations are done off line since there is a lot of state tracked and in some cases we need to know what the future references are. 4.2. 1 Invalidate Coherency The technique used is the same as used in [14], but we calculate the bytes transferred due to false sharing rather than just counting the false sharing misses. Two concurrent trace driven simulations are run: one is identical to the simulation that generated the trace, and the other just tracks the location and status of each individual ....
Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical Report CSL-TR-90-412, Computer Systems Laboratory, Stanford University, Stanford, CA 94305, February 1990.
No context found.
Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical report, Stanford University, February 1990. Report No. CSL-TR-90-412. 15
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC