Results 1 - 10
of
49
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 163 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Quantifying behavioral differences between C and C++ programs
- JOURNAL OF PROGRAMMING LANGUAGES
, 1994
"... Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming lang ..."
Abstract
-
Cited by 91 (15 self)
- Add to MetaCart
(Show Context)
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming language of choice. In this paper, we measure the empirical behavior of a group of significant C and C++ programs and attempt to identify and quantify behavioral differences between them. Our goal is to determine whether optimization technology that has been successful for C programs will also be successful in C++ programs. We furthermore identify behavioral characteristics of C++ programs that suggest optimizations that should be applied in those programs. Our results show that C++ programs exhibit behavior that is significantly different than C programs. These results should be of interest to compiler writers and architecture designers who are designing systems to execute object-oriented programs.
SPAID: Software Prefetching in Pointer and Call Intensive Environments. In
- Proc. 28th International Symposium on Microarchitecture,
, 1995
"... ..."
(Show Context)
Memory Behavior of the SPEC2000 Benchmark Suite
, 2000
"... The SPEC CPU benchmarks are frequently used in computer architecture research. The newly released SPEC'2000 benchmarks consist of fourteen floating point and twelve integer applications. In this paper we present measurements of number of cache misses for all the applications for a variety of ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
(Show Context)
The SPEC CPU benchmarks are frequently used in computer architecture research. The newly released SPEC'2000 benchmarks consist of fourteen floating point and twelve integer applications. In this paper we present measurements of number of cache misses for all the applications for a variety of cache configurations. Prior studies have shown that SPEC benchmarks do not put much stress on the memory system. Our simulation results demonstrate that SPEC'2000 places only modest pressure on the first level caches confirming the results of similar experiments. 1 Introduction SPEC CPU benchmarks have long been used to gauge the performance of uniprocessor systems as well as microarchitectural enhancements. The newly released SPEC'2000 benchmark suite replaced the previous release, SPEC'95. Many studies [1, 3, 4]showed that only a few applications place more than modest stress on the memory system. The purpose of this study is to examine the memory behavior of the SPEC'2000 benchmark suite an...
Quantifying Loop Nest Locality Using SPEC'95 and the Perfect Benchmarks
, 1999
"... This paper analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spend the majority of their time in nests, the vast majority of cache optimization techniques target lo ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
This paper analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spend the majority of their time in nests, the vast majority of cache optimization techniques target loop nests. In contrast, the locality characteristics that drive these optimizations are usually collected across the entire application rather than the nest level. Researchers have studied numerical codes for so long that a number of commonly held assertions have emerged on their locality characteristics. In light of these assertions, we use the SPEC'95 and Perfect Benchmarks to take a new look at measuring locality on numerical codes based on references, loop nests, and program locality properties. Our results show that several popular assertions are at best overstatements. For example, although most reuse is within a loop nest, in line with popular assertions, most misses are inter-nest cap...
Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors
, 1996
"... The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in th ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91 % of the performance of a dual-ported cache. 1.
Cache Performance for Multimedia Applications
- In Proceedings of the 15th IEEE International Conference on Supercomputing
, 2001
"... The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache performance due to non-locality of data references. Despite this, there is no published research deriving or measuring the ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache performance due to non-locality of data references. Despite this, there is no published research deriving or measuring these qualities. Utilizing the previously developed Berkeley Multimedia Workload, we present the results of execution driven cache simulations with the goal of aiding future media processing architecture design. Our analysis examines the differences between multimedia and traditional applications in cache behavior. We find that multimedia applications actually exhibit lower instruction miss ratios and comparable data miss ratios when contrasted with other widely studied workloads. In addition, we find that longer data cache line sizes than are currently used would benefit multimedia processing.
A fast and accurate framework to analyze and optimize cache memory behavior
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 2004
"... The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compile-time program transformations can significantly improve the performan ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compile-time program transformations can significantly improve the performance of the cache. To apply most of these transformations, the compiler requires a precise knowledge of the locality of the different sections of the code, both before and after being transformed. Cache miss equations (CMEs) allow us to obtain an analytical and precise description of the cache memory behavior for loop-oriented codes. Unfortunately, a direct solution of the CMEs is computationally intractable due to its NP-complete nature. This article proposes a fast and accurate approach to estimate the solution of the CMEs. We use sampling techniques to approximate the absolute miss ratio of each reference by analyzing a small subset of the iteration space. The size of the subset, and therefore the analysis time, is determined by the accuracy selected by the user. In order to reduce the complexity of the algorithm to solve CMEs, effective mathematical techniques have been developed to analyze the subset of the iteration space that is being considered. These techniques exploit some properties of the particular polyhedra represented by CMEs.
A.Varma “Selective victim caching: A method to improve the performance of direct-mapped caches”,
- IEEE Transaction on Computers ,
, 1997
"... ..."