Results 1 - 10
of
415
Performance Debugging for Distributed Systems of Black Boxes
, 2003
"... Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many ..."
Abstract
-
Cited by 316 (3 self)
- Add to MetaCart
(Show Context)
Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes.
The Tau Parallel Performance System
- The International Journal of High Performance Computing Applications
, 2006
"... The ability of performance technology to keep pace with the growing complexity of parallel and distributed systems depends on robust performance frameworks that can at once provide system-specific performance capabilities and support high-level performance problem solving. Flexibility and portabilit ..."
Abstract
-
Cited by 242 (21 self)
- Add to MetaCart
The ability of performance technology to keep pace with the growing complexity of parallel and distributed systems depends on robust performance frameworks that can at once provide system-specific performance capabilities and support high-level performance problem solving. Flexibility and portability in empirical methods and processes are influenced primarily by the strategies available for instrumentation and measurement, and how effectively they are integrated and composed. This paper presents the TAU (Tuning and Analysis Utilities) parallel performance system and describe how it addresses diverse requirements for performance observation and analysis.
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
- Computer Architecture Letters
, 2002
"... Computer architects must determine how to most effectively use finite computational resources when running simulations to evaluate new architectural ideas. To facilitate efficient simulations with a range of benchmark programs, we have developed the MinneSPEC input set for the SPEC CPU 2000 benchmar ..."
Abstract
-
Cited by 226 (15 self)
- Add to MetaCart
(Show Context)
Computer architects must determine how to most effectively use finite computational resources when running simulations to evaluate new architectural ideas. To facilitate efficient simulations with a range of benchmark programs, we have developed the MinneSPEC input set for the SPEC CPU 2000 benchmark suite. This new workload allows computer architects to obtain simulation results in a reasonable time using existing simulators. While the MinneSPEC workload is derived from the standard SPEC CPU 2000 workload, it is a valid benchmark suite in and of itself for simulation-based research. MinneSPEC also may be used to run large numbers of simulations to find "sweet spots" in the evaluation parameter space. This small number of promising design points subsequently may be investigated in more detail with the full SPEC reference workload. In the process of developing the MinneSPEC datasets, we quantify its differences in terms of function-level execution patterns, instruction mixes, and memory behaviors compared to the SPEC programs when executed with the reference inputs. We find that for some programs, the MinneSPEC profiles match the SPEC reference dataset program behavior very closely. For other programs, however, the MinneSPEC inputs produce significantly different program behavior. The MinneSPEC workload has been recognized by SPEC and is distributed with Version 1.2 and higher of the SPEC CPU 2000 benchmark suite.
The Packet Filter: An Efficient Mechanism for User-level Network Code
- IN PROCEEDINGS OF THE ELEVENTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1987
"... Code to implement network protocols can be either inside the kernel of an operating system or in user-level processes. Kernel-resident code is hard to develop, debug, and maintain, but user-level implementations typically incur significant overhead and perform poorly. The performance of user-level ..."
Abstract
-
Cited by 222 (7 self)
- Add to MetaCart
Code to implement network protocols can be either inside the kernel of an operating system or in user-level processes. Kernel-resident code is hard to develop, debug, and maintain, but user-level implementations typically incur significant overhead and perform poorly. The performance of user-level network code depends on the mechanism used to demultiplex received packets. Demultiplexing in a user-level process increases the rate of context switches and system calls, resulting in poor performance. Demultiplexing in the kernel eliminates unnecessary overhead. This paper describes the packet filter, a kernel-resident, protocolindependent packet demultiplexer. Individual user processes have great flexibility in selecting which packets they will receive. Protocol implementations using the packet filter perform quite well, and have been in production use for several years.
Path-Based Failure and Evolution Management
- IN PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI’04
, 2004
"... We present a new approach to managing failures and evolution in large, complex distributed systems using runtime paths. We use the paths that requests follow as they move through the system as our core abstraction, and our "macro" approach focuses on component interactions rather than the ..."
Abstract
-
Cited by 139 (5 self)
- Add to MetaCart
We present a new approach to managing failures and evolution in large, complex distributed systems using runtime paths. We use the paths that requests follow as they move through the system as our core abstraction, and our "macro" approach focuses on component interactions rather than the details of the components themselves. Paths record component performance and interactions, are user- and request-centric, and occur in sufficient volume to enable statistical analysis, all in a way that is easily reusable across applications. Automated statistical analysis of multiple paths allows for the detection and diagnosis of complex failures and the assessment of evolution issues. In particular, our approach enables significantly stronger capabilities in failure detection, failure diagnosis, impact analysis, and understanding system evolution. We explore these capabilities with three real implementations, two of which service millions of requests per day. Our contributions include the approach; the maintainable, extensible, and reusable architecture; the various statistical analysis engines; and the discussion of our experience with a high-volume production service over several years.
Using Profile Information to Assist Classic Code Optimizations
- SOFTWARE---PRACTICE AND EXPERIENCE
, 1991
"... This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in tra ..."
Abstract
-
Cited by 135 (13 self)
- Add to MetaCart
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile-based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile-based code optimizer significantly improves the performance of production C programs that have already been optimized by a high-quality global code optimizer
IPS-2: The Second Generation of a Parallel Program Measurement System
- IEEE Transactions on Parallel and Distributed Systems
, 1990
"... IPS is a performance measurement system for parallel and distributed programs. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a par ..."
Abstract
-
Cited by 122 (15 self)
- Add to MetaCart
(Show Context)
IPS is a performance measurement system for parallel and distributed programs. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Second, IPS provides performance analysis techniques that help to automatically guide the programmer to the location of program bottlenecks. IPS is currently running on its second implementation. The first implementation was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. This implementation was built on the Charlotte Distributed Operating System. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interf...
Where is the Energy Spent Inside My App?: Fine Grained Energy Accounting on Smartphones With Eprof
- In Proceedings of the 7th ACM European Conference on Computer Systems (2012
"... Where is the energy spent inside my app? Despite the immense popularity of smartphones and the fact that energy is the most crucial aspect in smartphone programming, the answer to the above question remains elusive. This paper first presents eprof, thefirstfine-grainedenergyprofilerfor smartphone ap ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
(Show Context)
Where is the energy spent inside my app? Despite the immense popularity of smartphones and the fact that energy is the most crucial aspect in smartphone programming, the answer to the above question remains elusive. This paper first presents eprof, thefirstfine-grainedenergyprofilerfor smartphone apps. Compared to profiling the runtime of applications running on conventional computers, profiling energy consumption of applications running on smartphones faces a unique challenge, asynchronous power behavior, where the effect on a component’s power state due to a program entity lasts beyond the end of that program entity. We present the design, implementation and evaluation of eprof on two mobile OSes, Android and Windows Mobile. We then present an in-depth case study, the first of its kind, of six popular smartphones apps (including Angry-Birds, Facebook and Browser). Eprof sheds lights on internal energy dissipation of these apps and exposes surprising findings like 65%-75 % of energy in free apps is spent in third-party advertisement modules. Eprof also reveals several “wakelock bugs”, a family of “energy bugs ” in smartphone apps, and effectively pinpoints their location in the source code. The case study highlights the fact that most of the energy in smartphone apps is spent in I/O, and I/O events are clustered, often due to a few routines. This motivates us to propose bundles,anewaccountingpresentationofappI/O energy, which helps the developer to quickly understand and optimize the energy drain of her app. Using the bundle presentation, we reduced the energy consumption of four apps by 20 % to 65%.
Encoding program executions
- In ICSE
, 2001
"... Dynamic analysis is based on collecting data as the program runs. However, raw traces tend to be too voluminous and too unstructured to be used directly for visualization and understanding. We address this problem in two phases: the first phase selects subsets of the data and then compacts it, while ..."
Abstract
-
Cited by 101 (2 self)
- Add to MetaCart
(Show Context)
Dynamic analysis is based on collecting data as the program runs. However, raw traces tend to be too voluminous and too unstructured to be used directly for visualization and understanding. We address this problem in two phases: the first phase selects subsets of the data and then compacts it, while the second phase encodes the data in an attempt to infer its structure. Our major compaction/selection techniques include gprof-style N-depth call sequences, selection based on class, compaction based on time intervals, and encoding the whole execution as a directed acyclic graph. Our structure inference techniques include run-length encoding, contextfree grammar encoding, and the building of finite state automata.
Systems for Late Code Modification
- WRL Research Report 91/5
, 1991
"... Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE ..."
Abstract
-
Cited by 97 (5 self)
- Add to MetaCart
Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE '91 International Workshop on Code Generation, Schloss Dagstuhl, Germany, May 20-24, 1991. i 1. Introduction Late code modification is the process of modifying the output of a compiler after the compiler has generated it. The reasons one might want to do this fall into two categories, optimization and instrumentation. Some forms of optimization must be performed on assembly-level or machinelevel code. The oldest is peephole optimization [11], which acts to tidy up code that a compiler has generated; it has since been generalized to include transformations on more machine-independent code [2,3]. Reordering of code to avoid pipeline stalls [4,7,18] is most often done after the code is gene...