| B. R. Buck and J. K. Hollingsworth. Using hardware performance monitors to isolate memory bottlenecks. In ACM, editor, Supercomputing, pages 64--65, 2000. |
....with lines in the source code for user feedback. Consider the example with a row major layout shown in Figure 2. For the sake of simplicity, we assume an offset of one per array element. The read references to array B occur at offsets n 1, n 2, n 3 (corresponding to references B[1,1] B[1,2] and B[1,3], respectively) for the first iteration of the outer loop and a length of n 1 accesses. The starting sequence id for the first access of the B array is 3 (since the first three events (seq ids start from 0) are the two enter scopes for scopes 1 and 2 as well as the read event for A[i] For one ....
....references generated by a processor. The TSpec notation is more complex than RSDs since it is also the object on which the cache filter operates. Buck and Hollingsworth performed a simulation study to pinpoint the hot spots of cache misses based on hardware support for data trace generation [3]. Hardware counter support in conjunction with interrupt support on overflow for a cache miss counter was compared to miss counting in selected memory regions. The former approach is based on probing to capture data misses at a certain frequency (e.g. one out of 50,000 misses) The latter ....
B. R. Buck and J. K. Hollingsworth. Using hardware performance monitors to isolate memory bottlenecks. In ACM, editor, Supercomputing, pages 64--65, 2000.
....associated with a cache miss. It also provides a way to limit cache miss counting to misses associated with a user determined area of memory. These facilities could enable presentation of data about cache behavior in terms of program data structures at the source code level. Work reported in [2] has shown that such information can be extremely useful in identifying performance bottlenecks caused by bad cache behavior. In [2] the data were obtained through use of a cache simulator which runs considerably slower than the original application (e.g. by a couple of orders of magnitude) and ....
....area of memory. These facilities could enable presentation of data about cache behavior in terms of program data structures at the source code level. Work reported in [2] has shown that such information can be extremely useful in identifying performance bottlenecks caused by bad cache behavior. In [2], the data were obtained through use of a cache simulator which runs considerably slower than the original application (e.g. by a couple of orders of magnitude) and does not model details such as pipelining and multiple instruction issue. Through use of appropriate hardware support (e.g. as on ....
Buck, B., Hollingsworth, J.K.: Using Hardware Performance Monitors to Isolate Memory Bottlenecks. SC'2000. Dallas, Texas. November, 2000.
....tools for automatic trace expansion or synthetic trace generation, and can be used to represent di erent levels of abstraction in benchmark analysis. Buck and Hollingsworth performed a simulation study to pinpoint the hot spots of cache misses based on hardware support for data trace generation [4]. Hardware counter support in conjunction with interrupt support on over ow for a cache miss counter was compared to miss counting in selected memory regions. The former approach is based on probing to capture data misses at a certain frequency (e.g. one out of 50,000 misses) The latter approach ....
Bryan R. Buck and Jerey K. Hollingsworth. Using hardware performance monitors to isolate memory bottlenecks. In ACM, editor, SC
....software distribution along the lines used by Prof. Joel Saltz in distributing his CHAOS run time library [21] Finally, we intend to collaborate with Prof. Jeff Hollingsworth, who is working on user performance tools to isolate and display memory bottlenecks using hardware cache miss counters [6]. Similar tools such as MemSpy have proven very useful in helping users identify problematic data layouts and computation patterns with poor cache performance [56] We intend to both use their tools to identify optimization targets, and integrate our analyses with their tool to provide advice to ....
B. Buck and J. Hollingsworth. Using hardware performance monitors to isolate memory bottlenecks. In Proceedings of SC'00, Dallas, TX, November 2000.
....ability to change the microcode in some processors to collect memory reference information. The FlashPoint [17] system uses the fact that the Stanford FLASH multiprocessor [18] implements its coherence protocols in sotvare, allowing instrumentation to be added at this level. Buck and Hollingsworth [19] proposed using interrupt on overflow to sample the addresses of data cache misses, but this approach does not provide the level of detail provided by SIGMA. Mtool [20] provides information about the amount of performance lost due to the memory hierarchy, but only relates this information back to ....
B. Buck, and J. K. Hollingsworth, "Using Hardware Performance Monitors to Isolate Memory Bottlenecks", In Proceedings of Supercomputing'02, November 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC