MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  An Analytic Performance Model Of Parallel Systems That Perform N tasks Using P Processors That Can Fail (2001) [5 citations — 4 self]

Download:
pdf | ps
by Gehan Weerasinghe, Imad Antonios, Lester Lipsky
IEEE NCA 01 International Symposium on Network Computing and Applications
http://www.engr.uconn.edu/~lester/papers/nca01.ps
Add To MetaCart

Abstract:

We present a family of Markov models for analyzing the performance of parallel /distributed processors that execute a job consisting of N independent tasks in parallel using P processors. The model is a Markov Chain with states representing service and failure rates with k (0! k P) active processors. The task-times and processor failures are both exponentially distributed. We derive a number of expressions to determine the mean execution time, probability of success, work, and other measurable quantities, all conditioned on the job finishing successfully. A prototype, implemented using an extended version of ACMPI, is used for actual experiments that are based on simulated task-times and processor failures. We present our results comparing the analytic model with the prototype for a range of values of processor failure rates. We then discuss extensions of the model and issues related to communication costs, approximations and effect of task-time distributions. 1

Citations

649 An Introduction to Probability Theory and Its – Feller - 1968
269 Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities – Amdahl - 1967
107 Probability & Statistics with Reliability – Trivedi - 1982
73 LogP: A practical model of parallel computation – Culler, Karp, et al. - 1996
65 Queueing Theory: A Linear Algebraic Approach – Lipsky - 1992
44 Fault-Tolerant Parallel Computation – Kanellakis, Shvartsman - 1997
27 The importance of power-tail distributions for modeling queueing systems – Greiner, Jobmann, et al. - 1999
8 On The Performance of Parallel Computers: Order Statistics and Amdahl's Law – Lipsky, Zhang, et al. - 1996
4 An Asynchronous Model of Communication and Computation for MPI – Weerasinghe, Greenshields - 2000
4 A Distributed Fault-Tolerant Asynchronous Algorithm for Performing – Weerasinghe, Lipsky - 2001