Results 1 - 10
of
115
Low-overhead memory leak detection using adaptive statistical profiling
- In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2004
"... Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling schem ..."
Abstract
-
Cited by 118 (3 self)
- Add to MetaCart
(Show Context)
Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling scheme that addresses this by sampling executions of code segments at a rate inversely proportional to their execution frequency. To validate our ideas, we have implemented SWAT, a novel memory leak detection tool. SWAT traces program allocations/ frees to construct a heap model and uses our adaptive profiling infrastructure to monitor loads/stores to these objects with low overhead. SWAT reports ‘stale ’ objects that have not been accessed for a ‘long ’ time as leaks. This allows it to find all leaks that manifest during the current program execution. Since SWAT has low runtime overhead (< 5%), and low space overhead (< 10 % in most cases and often less than 5%), it can be used to track leaks in production code that take days to manifest. In addition to identifying the allocations that leak memory, SWAT exposes where the program last accessed the leaked data, which facilitates debugging and fixing the leak. SWAT has been used by several product groups at Microsoft for the past 18 months and has proved effective at detecting leaks with a low false positive rate (<10%).
Predicting Whole-Program Locality Through Reuse Distance Analysis
, 2003
"... Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in ..."
Abstract
-
Cited by 79 (0 self)
- Add to MetaCart
(Show Context)
Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in short-distance reuses or local control flow. However, the analysis must meet two requirements to be useful. The first is efficiency. It needs to analyze all accesses to all data elements in full-size benchmarks and to measure distance of any length and in any required precision. The second is predication. Based on a few training runs, it needs to classify patterns as regular and irregular and, for regular ones, it should predict their (changing) behavior for other inputs. In this paper, we show that these goals are attainable through three techniques: approximate analysis of reuse distance (originally called LRU stack distance), pattern recognition, and distance-based sampling. When tested on 15 integer and floating-point programs from SPEC and other benchmark suites, our techniques predict with on average 94% accuracy for data inputs up to hundreds times larger than the training inputs. Based on these results, the paper discusses possible uses of this analysis.
Online Feedback-Directed Optimization of Java
, 2002
"... This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior... ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
(Show Context)
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior...
Software Behavior Oriented Parallelization
, 2007
"... Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and inputdependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on ..."
Abstract
-
Cited by 60 (8 self)
- Add to MetaCart
Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and inputdependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on partial information about program behavior, for example, a user reading just part of the source code, or a profiling tool examining merely one or few executions. The basis of BOP is programmable software speculation, where a user or an analysis tool marks possibly parallel regions in the code, and the run-time system executes these regions speculatively. It is imperative to protect the entire address space during speculation. The main goal of the paper is to demonstrate that the general protection can be made cost effective by three novel techniques: programmable speculation, critical-path minimization, and value-based correctness checking. On a recently acquired multicore, multi-processor PC, the BOP system reduced the end-to-end execution time by integer factors for a Lisp interpreter, a data compressor, a language parser, and a scientific library, with no change to the underlying hardware or operating system.
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
Online performance auditing: using hot optimizations without getting burned
- In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation
, 2006
"... As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
(Show Context)
As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good performance for a range of benchmarks. Although optimizations typically perform well on average, they often have unpredictable impact on running time, sometimes degrading performance significantly. Today’s VMs perform sophisticated feedback-directed optimizations, but these techniques do not address performance degradations, and they actually make the situation worse by making the system more unpredictable. This paper presents an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime. This work opens the door for a fundamental shift in the way optimizations are developed and tuned for online systems, and may allow the body of work in offline empirical optimization search to be applied automatically at runtime. We present our implementation and evaluation of this system in a product Java VM.
Temporal Streaming of Shared Memory
- In Proceedings of the 32nd Annual International Symposium on Computer Architecture
, 2005
"... Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to a processor in advance of the corresponding memory a ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
(Show Context)
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to a processor in advance of the corresponding memory accesses. Temporal streaming dynamically identifies address sequences to be streamed by exploiting two common phenomena in shared-memory access patterns: (1) temporal address correlation—groups of shared addresses tend to be accessed together and in the same order, and (2) temporal stream locality—recently-accessed address streams are likely to recur. We present a practical design for temporal streaming. We evaluate our design using a combination of trace-driven and cycle-accurate full-system simulation of a cache-coherent distributed shared-memory system. We show that temporal streaming can eliminate 98 % of coherent read misses in scientific applications, and between 43 % and 60 % in database and web server workloads. Our design yields speedups of 1.07 to 3.29 in scientific applications, and 1.06 to 1.21 in commercial workloads. 1.
Online phase detection algorithms
- In The International Symposium on Code Generation and Optimization
, 2006
"... Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in tra ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
(Show Context)
Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in transition. A VM using an online phase detector can apply specialized optimizations during a phase or reconsider optimization decisions between phases. Unfortunately, extant approaches to detecting phase behavior rely on either offline profiling, hardware support, or are targeted toward a particular optimization. In this work, we focus on the enabling technology of online phase detection. More specifically, we contribute (a) a novel framework for online phase detection, (b) multiple instantiations of the framework that produce novel online phase detection algorithms, (c) a novel client- and machine-independent baseline methodology for evaluating the accuracy of an online phase detector, (d) a metric to compare online detectors to this baseline, and (e) a detailed empirical evaluation, using Java applications, of the accuracy of the numerous phase detectors. 1
Whole Execution Traces
, 2004
"... Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data produced by realistic program runs, most work has focused on separately collecting and compressing different types of profiles. In this paper we present a unified representation of profiles called Whole Execution Trace (WET) which includes the complete information contained in each of the above types of traces. Thus WETs provide a basis for a next generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles. The key features of our WET representation are: WET is constructed by labeling a static program representation with profile information such that relavent and related profile information can be directly accessed by analysis algorithms as they traverse the representation; a highly effective two tier strategy is used to significantly compress the WET; and compression techniques are designed such that they do not adversely affect the ability to rapidly traverse WET for extracting subsets of information corresponding to individual profile types as well as a combination of profile types (e.g., in form of dynamic slices of WETs). Our experimentation shows that on an average execution traces resulting from execution of 647 Million statements can be stored in 331 Megabytes of storage after compression. The compression factors range from 16 to 83. Moreover the rates at which different types of profiles can be individually or simultaneously extracted are high.