7 citations found. Retrieving documents...
Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical report, Stanford University, February 1990. Report No. CSL-TR-90-412. 15

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (232 citations)  (Correct)

....in program accesses and has the advantage that it requires no software intervention. Large cache lines also successfully utilize the block transfer capabilities of modem memory systems. Although the utility of multiword cache lines is almost universally accepted nowadays [23, 21] recent data [5, 15, 25] show that cache lines should be kept small for multiprocessors. The primary reason is that the spatial locality is considerably reduced in the process of parallelizing applications for multiprocessors, and large cache lines can result in a significant increase in memory system traffic. Also, ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Measurement, analysis, and improvement of the cache behavior of shared data in cache coherent multiprocessors. Technical Report CSL-TR-90-412, Stanford University, February 1990.


Factors Affecting False Sharing on Page-Granularity Cache-Coherent .. - Khera (1994)   (2 citations)  (Correct)

....of benefit due to prefetching on a per page basis. These calculations are done o# line since there is a considerable amount of state information necessary, and in some cases we need to know what the future references are. 3.2. 1 Invalidate Coherency The technique used is similar to the one used in [25], but we calculate the bytes transferred due to false sharing rather than just counting the false sharing misses. Two concurrent trace driven simulations are run: one is identical to the simulation that generated the trace (multi word line size) and the other tracks the location and status of ....

Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical Report CSL-TR-90-412, Computer Systems Laboratory, Stanford University, Stanford, CA 94305, February 1990.


Optimizing Memory-Based Messaging for Scalable Shared Memory.. - David Cheriton (1993)   (3 citations)  (Correct)

....An efficient communication system allows the micro kernel approach to be used without a significant performance penalty. Large scale memory systems can hurt communication performance since the cost of copying data in these machines increases, due to generally poor cache behavior of copying [17] and the increasing processor speed to average memory access ratio. The cost of remapping data in multiprocessor systems also appears to be significantly greater than in uniprocessors [4, 14] because of the need to update or invalidate the TLBs, or page tables, in all processors. Moreover, the ....

J. Torellas, M.S. Lam, and J.L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. In Workshop on Scalable Shared-Memory Architectures. Seattle, May 1990.


Optimized Memory-Based Messaging: Leveraging the Memory.. - David Cheriton (1994)   (4 citations)  (Correct)

....the whole machine. Unfortunately, the performance of communication systems implemented using standard shared memory techniques decreases with larger scale memory systems for several reasons. First, the cost of copying data in these machines increases because of the poor cache behavior of copying [24] and the increasing ratio of processor speed to average memory access time. Second, the cost of remapping data in multiprocessor systems, as an alternative to copying, is greater than in uniprocessors [4, 21] because of the need to update or invalidate the TLB or page table in each processor. ....

J. Torrellas, M.S. Lam, and J.L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. In Workshop on Scalable Shared-Memory Architectures. Seattle, May 1990.


Comparative Evaluation of Latency Reducing and.. - Gupta, Hennessy.. (1991)   (103 citations)  Self-citation (Hennessy)   (Correct)

....and is thus kept consistent until the processor actually reads the value. Hardware controlled prefetching includes schemes such as long cache lines and instruction look ahead [16] The effectiveness of long cache lines is limited by the reduced spatial locality in multiprocessor applications [6, 28], while instruction look ahead is limited by branches and the finite look ahead buffer size. With software controlled prefetching, explicit prefetch instructions are issued. Software control allows the prefetching to be done selectively (thus reducing bandwidth requirements) and extends the ....

J. Torrellas, M. S. Lam, and J. L. Hennessy. Measurement, analysis, and improvement of the cache behavior of shared data in cache coherent multiprocessors. Technical Report CSLTR -90-412, Stanford University, Feb. 1990.


An Architecture-Independent Analysis of False Sharing - Khera, LaRowe, Jr., Ellis (1993)   (4 citations)  Self-citation (Analysis)   (Correct)

....sharing costs as well as the amount of benefit due to prefetching on a per page basis. These calculations are done off line since there is a lot of state tracked and in some cases we need to know what the future references are. 4.2. 1 Invalidate Coherency The technique used is the same as used in [14], but we calculate the bytes transferred due to false sharing rather than just counting the false sharing misses. Two concurrent trace driven simulations are run: one is identical to the simulation that generated the trace, and the other just tracks the location and status of each individual ....

Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical Report CSL-TR-90-412, Computer Systems Laboratory, Stanford University, Stanford, CA 94305, February 1990.


Minerva: An Adaptive Subblock Coherence Protocol for Improved .. - Rothman, Smith   (Correct)

No context found.

Josep Torrellas, Monica S. Lam, and John L. Hennessy. Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors. Technical report, Stanford University, February 1990. Report No. CSL-TR-90-412. 15

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC