| Dong Tang and Ravishankar K. Iyer. Analysis and modeling of correlated failures in multicomputer systems. IEEE Trans. Comput. , 41(5):567--577, 1992. |
.... that failures are not always independent [22, 35] as well as anecdotal evidence that baroque, complex failures are not uncommon [14] These observations imply that the independence assumptions in our model will result in optimistic predictions for the frequency of multi fault scenarios [33]. Unfortunately, there is no study that quantifies such correlations for cluster based Internet services. In the future, we may extend our methodology for designers to test their service s sensitivity to sets of potentially correlated faults. 2.3 Performability Metric Despite much work that ....
D. Tang and R. K. Iyer. Analysis and Modeling of Correlated Failures in Multicomputer Systems. IEEE Transactions on Computers, 41(5):567--577, May 1992.
....not general enough to describe all cluster based servers, we believe that it is representative of a large class of servers, such as front end servers (including PRESS) and other readonly servers. Our model also does not consider correlated failures, which may lead to optimistic predictions [33]. However, since there is little quantitative We refer the reader to [26] for a discussion of why the denominator is correctly MTTF rather than MTTF MTTR. 1 h lu r Figure 3: Basic PRESS architecture. data on how faults are correlated, we have made the explicit decision of keeping our model ....
D. Tang and R. K. Iyer. Analysis and Modeling of Correlated Failures in Multicomputer Systems. IEEE Transactions on Computers, 41(5):567577, May 1992.
....of how correlated their failures are. Therefore, this way of calculating correlation is not suitable for nodes with high availabilities. A better approach is to calculate the average conditional probability. A similar approach was used in modeling the correlated failures in multi computer systems [Tang1992]. Average conditional probability is calculated as follows. V Pair of nodes (X, Y) in the system, where X : Y, Calculate P (Node X is down I Node Y is down) The average of all these probabilities is the correlation level of the system. Note that for two nodes A and B, P (Node A down I Node B ....
....having a state that represents the case when more than one machine is unavailable. Their model is completely empirical. Such an approach is impractical for cases where there are hundreds of servers since it would be impossible to identify all of the states and the relationships between states. In [Tang1992], they present an approach to correlation modeling that is similar to the model based on conditional probabilities. They also use conditional probabilities as a measure of correlation among failures of machines. They only model 2 way correlations, whereas in our case, to determine the ....
D. Tang, R. K. Iyer "Analysis and Modeling of Correlated Failures in Multicomputer Systems", IEEE Transactions on Computers, Vol. 41 No:5, May 1992 36
....of time (in percents) 4 Predicted fraction of time assuming independent failures, using the individual crash probabilities of Table 1. Table 2. Concurrent crash distributions. crashes and network partitions) However they are roughly 10 times higher than the crash probabilities reported in [38] for VAXclusters. This can perhaps be attributed to the fact that planned shutdowns are not considered to be crashes in [38] whereas we count any period of time during which the Spread daemon was unable to function as a crash. A closer look at Table 1 indicates that our machines generally behaved ....
....of Table 1. Table 2. Concurrent crash distributions. crashes and network partitions) However they are roughly 10 times higher than the crash probabilities reported in [38] for VAXclusters. This can perhaps be attributed to the fact that planned shutdowns are not considered to be crashes in [38], whereas we count any period of time during which the Spread daemon was unable to function as a crash. A closer look at Table 1 indicates that our machines generally behaved according to one of three patterns: ffl The HU lab machines with the exception of hal had relatively few crashes during ....
[Article contains additional citation context not shown here]
D. Tang and R. K. Iyer. analysis and modeling of correlated failures in multicomputer systems. IEEE Trans. Comput., 41(5):567--577, 1992.
....are likely to follow from errors that affect the operating system. Reliability prediction: Many of the papers discussed above, especially those written or advised by R. K. Iyer, provide mathematical models that assist in predicting reliability from failure data. Notable examples include [LeI92, LeI93, TaI92b, TaI93]. ChC96] considers how to design experiments so that reliability prediction based on fault injection studies will be statistically valid, while [EIP 91] uses data from system test of a UNIX based AT T network management system to analyze the validity of various mathematical models. Finally, ....
D. Tang and R.K. Iyer. "Analysis and modeling of correlated failures in multicomputer systems." IEEE Transactions on Computers, vol. 41, no. 5, pp. 567--577, May 1992.
....disadvantage of piping: it tends to work poorly when the piped programs are interactive. 4. While operating systems are built with the intent of robust operation, I O operations seem particularly resistant to robustness and therefore subject to problems (e.g. Tang and Iyer s Tables I and III [31]) For example, command (1) above causes each line of output to be preceded by M . While this may be tolerable, less textfile.txt learn (2) differs substantively from command (1) only in the order of the programs in the pipeline, yet works as though less had an infinite page length under the ....
Tang, D. and R. K. Iyer, Analysis and Modeling of Correlated Failures in Multicomputer Systems, IEEE Transactions on Computers 41 (5) (May 1992) 567-- 577.
No context found.
Dong Tang and Ravishankar K. Iyer. Analysis and modeling of correlated failures in multicomputer systems. IEEE Trans. Comput. , 41(5):567--577, 1992.
No context found.
D. Tang and R.K. Iyer, "Analysis and Modeling of Correlated Failures in Multicomputer Systems," ACM Trans. Computer Systems, pp. 567-577, May 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC