Results 1 -
3 of
3
Harmony: Collection and Analysis of Parallel Block Vectors
"... Efficient execution of well-parallelized applications is central to performance in the multicore era. Program analysis tools support the hardware and software sides of this effort by exposing relevant features of multithreaded applications. This paper describes parallel block vectors, which uncover ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Efficient execution of well-parallelized applications is central to performance in the multicore era. Program analysis tools support the hardware and software sides of this effort by exposing relevant features of multithreaded applications. This paper describes parallel block vectors, which uncover previously unseen characteristics of parallel programs. Parallel block vectors provide block execution profiles per concurrency phase (e.g., the block execution profile of all serial regions of a program). This information provides a direct and fine-grained mapping between an application’s runtime parallel phases and the static code that makes up those phases. This paper also demonstrates how to collect parallel block vectors with minimal application perturbation using Harmony. Harmony is an instrumentation pass for the LLVM compiler that introduces just 16-21 % overhead on average across eight Parsec benchmarks. We apply parallel block vectors to uncover several novel insights about parallel applications with direct consequences for architectural design. First, that the serial and parallel phases of execution used in Amdahl’s Law are often composed of many of the same basic blocks. Second, that program features, such as instruction mix, vary based on the degree of parallelism, with serial phases in particular displaying different instruction mixes from the program as a whole. Third, that dynamic execution frequencies do not necessarily correlate with a block’s parallelism. 1
ParaShares: Finding the Important Basic Blocks in Multithreaded Programs
"... Abstract. Understanding and optimizing multithreaded execution is a significant challenge. Numerous research and industrial tools debug par-allel performance by combing through program source or thread traces for pathologies including communication overheads, data dependencies, and load imbalances. ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Understanding and optimizing multithreaded execution is a significant challenge. Numerous research and industrial tools debug par-allel performance by combing through program source or thread traces for pathologies including communication overheads, data dependencies, and load imbalances. This work takes a new approach: it ignores any underlying pathologies, and focuses instead on pinpointing the exact lo-cations in source code that consume the largest share of execution. Our new metric, ParaShares, scores and ranks all basic blocks in a program based on their share of parallel execution. For the eight benchmarks ex-amined in this paper, ParaShare rankings point to just a few important blocks per application. The paper demonstrates two uses of this infor-mation, exploring how the important blocks vary across thread counts and input sizes, and making modest source code changes (fewer than 10 lines of code) that result in 14-92 % savings in parallel program runtime. 1