• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Production-Run Software Failure Diagnosis via Hardware Performance Counters

Cached

  • Download as a PDF

Download Links

  • [pages.cs.wisc.edu]
  • [people.cs.uchicago.edu]
  • [pages.cs.wisc.edu]
  • [pages.cs.wisc.edu]
  • [pages.cs.wisc.edu]
  • [people.engr.ncsu.edu]
  • [pages.cs.wisc.edu]
  • [pages.cs.wisc.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Joy Arulraj , Po-chun Chang , Guoliang Jin , Shan Lu
Citations:7 - 2 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Arulraj_production-runsoftware,
    author = {Joy Arulraj and Po-chun Chang and Guoliang Jin and Shan Lu},
    title = {Production-Run Software Failure Diagnosis via Hardware Performance Counters},
    year = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Sequential and concurrency bugs are widespread in deployed software. They cause severe failures and huge financial loss during production runs. Tools that diagnose production-run failures with low overhead are needed. The state-of-the-art diagnosis techniques use software instrumentation to sample program properties at run time and use off-line statistical analysis to identify properties most correlated with failures. Although promising, these techniques suffer from high run-time overhead, which is sometimes over 100%, for concurrency-bug failure diagnosis and hence are not suitable for production-run usage. We present PBI, a system that uses existing hardware performance counters to diagnose production-run failures caused by sequential and concurrency bugs with low overhead. PBI is designed based on several key observations. First, a few widely supported performance counter events can reflect a wide variety of common software bugs and can be monitored by hardware with almost no overhead. Second, the counter overflow interrupt supported by existing hardware and operating systems provides a natural and effective mechanism to conduct event sampling at user level. Third, the noise and non-determinism in interrupt delivery complements well with statistical processing. We evaluate PBI using 13 real-world concurrency and sequential bugs from representative open-source server, client, and utility programs, and 10 bugs from a widely used software-testing benchmark. Quantitatively, PBI can effectively diagnose failures caused by these bugs with a small overhead that is never higher than 10 %. Qualitatively, PBI does not require any change to software and presents a novel use of existing hardware performance counters.

Keyphrases

hardware performance counter    production-run software failure diagnosis    concurrency bug    production-run failure    low overhead    interrupt delivery complement    novel use    utility program    production run    off-line statistical analysis    state-of-the-art diagnosis technique    run time    small overhead    software-testing benchmark    sequential bug    wide variety    huge financial loss    performance counter event    user level    representative open-source server    high run-time overhead    deployed software    several key observation    production-run usage    effective mechanism    severe failure    real-world concurrency    concurrency-bug failure diagnosis    common software bug    present pbi    program property    software instrumentation    statistical processing   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University