#### DMCA

## An introduction to ROC analysis. (2006)

### Cached

### Download Links

Venue: | Pattern Recognition Letters, |

Citations: | 1063 - 1 self |

### Citations

5963 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...positive instance higher than a randomly chosen negative instance. This is equivalent to the Wilcoxon test of ranks (Hanley and McNeil, 1982). The AUC is also closely related to the Gini coefficient (=-=Breiman et al., 1984-=-), which is twice the area between the diagonal and the ROC curve. Hand and Till (2001) point out that Gini + 1 = 2 · AUC. Fig. 8a shows the areas under two ROC curves, A and B. Classifier B has great... |

1073 |
The meaning and use of the area under a receiver operating characteristic curve.
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...sifiers we may want to reduce ROC performance to a single scalar value representing expected performance. A common method is to calculate the area under the ROC curve, abbreviated AUC (Bradley, 1997; =-=Hanley and McNeil, 1982-=-). Since the AUC is a portion of the area of the unit square, its value will always be between 0 and 1.0. However, because random guessing produces the diagonal line between (0,0) and (1,1), which has... |

684 | The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition
- Bradley
- 1997
(Show Context)
Citation Context ...To compare classifiers we may want to reduce ROC performance to a single scalar value representing expected performance. A common method is to calculate the area under the ROC curve, abbreviated AUC (=-=Bradley, 1997-=-; Hanley and McNeil, 1982). Since the AUC is a portion of the area of the unit square, its value will always be between 0 and 1.0. However, because random guessing produces the diagonal line between (... |

566 |
Measuring the accuracy of diagnostic systems.
- Swets
- 1988
(Show Context)
Citation Context ...to ROC graphs and as a guide for using them in research. 2005 Elsevier B.V. All rights reserved. Keywords: ROC analysis; Classifier evaluation; Evaluation metrics1. Introduction A receiver operating characteristics (ROC) graph is a technique for visualizing, organizing and selecting classifiers based on their performance. ROC graphs have long been used in signal detection theory to depict the tradeoff between hit rates and false alarm rates of classifiers (Egan, 1975; Swets et al., 2000). ROC analysis has been extended for use in visualizing and analyzing the behavior of diagnostic systems (Swets, 1988). The medical decision making community has an extensive literature on the use of ROC graphs for diagnostic testing (Zou, 2002). Swets et al. (2000) brought ROC curves to the attention of the wider public with their Scientific American article. One of the earliest adopters of ROC graphs in machine learning was Spackman (1989), who demonstrated the value of ROC curves in evaluating and comparing algorithms. Recent years have seen an increase in the use of ROC graphs in the machine learning community, due in part to the realization that simple classification accuracy is often a poor metric for m... |

415 | Metacost: A general method for making classifiers cost-sensitive. In:
- Domingos
- 1999
(Show Context)
Citation Context ...e confidence of a rule matching an instance can be used as a score (Fawcett, 2001). Even if a classifier only produces a class label, an aggregation of them may be used to generate a score. MetaCost (=-=Domingos, 1999-=-) employs bagging to generate an ensemble of discrete classifiers, each of which produces a vote. The set of votes could be used to generate a score.3 Finally, some combination of scoring and voting c... |

414 | The Case against Accuracy Estimation for Comparing Induction Algorithms.
- Provost, Fawcett, et al.
- 1998
(Show Context)
Citation Context ...he use of ROC graphs in the machine learning community, due in part to the realization that simple classification accuracy is often a poor metric for measuring performance (Provost and Fawcett, 1997; =-=Provost et al., 1998-=-). In addition to being a generally useful performance graphing method, they have properties that make them especially useful for domains with skewed class distribution and unequal classification erro... |

341 | Robust classification for imprecise environments,” - Provost, Fawcett - 2001 |

313 | Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions
- Provost, Fawcett
- 1997
(Show Context)
Citation Context ... have seen an increase in the use of ROC graphs in the machine learning community, due in part to the realization that simple classification accuracy is often a poor metric for measuring performance (=-=Provost and Fawcett, 1997-=-; Provost et al., 1998). In addition to being a generally useful performance graphing method, they have properties that make them especially useful for domains with skewed class distribution and unequ... |

221 | Adaptive fraud detection
- Fawcett, Provost
- 1997
(Show Context)
Citation Context ... in medical decision making epidemics may cause the incidence of a disease to increase over time. In fraud detection, proportions of fraud varied significantly from month to month and place to place (=-=Fawcett and Provost, 1997-=-). Changes in a manufacturing practice may cause the proportion of defective units Infinity .9 .8 .7 .6 .55 .54 .53 .52 .51 .505 .4 .39 .38 .37 .36 .35 .34 .33 .30 .1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8... |

169 | Machine learning for the detection of oil spills in satellite radar images. Machine Learning - Special issue on applications of machine learning and the knowledge discovery process,
- Kubat, Holte, et al.
- 1998
(Show Context)
Citation Context ... unrealistic. However, class skews of 101 and 102 are very common in real world domains, and skews up to 106 have been observed in some domains (Clearwater and Stern, 1991; Fawcett and Provost, 1996; =-=Kubat et al., 1998-=-; Saitta and Neri, 1998). Substantial changes in class distributions are not unrealistic either. For example, in medical decision making epidemics may cause the incidence of a disease to increase over... |

112 | Evaluating text categorization. - Lewis - 1991 |

64 | Combining Data Mining and Machine Learning for Effective User Profiling,"
- Fawcett, Provost
- 1996
(Show Context)
Citation Context ...ions may seem contrived and unrealistic. However, class skews of 101 and 102 are very common in real world domains, and skews up to 106 have been observed in some domains (Clearwater and Stern, 1991; =-=Fawcett and Provost, 1996-=-; Kubat et al., 1998; Saitta and Neri, 1998). Substantial changes in class distributions are not unrealistic either. For example, in medical decision making epidemics may cause the incidence of a dise... |

59 | Robust classification systems for imprecise environments.
- Provost, Fawcett
- 2001
(Show Context)
Citation Context ...ds for doing this (Zadrozny and Elkan, 2001). Another approach is to use an ROC method that chooses operating points based on their relative performance, and there are methods for doing this as well (=-=Provost and Fawcett, 1998-=-, 2001). These latter methods are discussed briefly in Section 6. A consequence of relative scoring is that classifier scores should not be compared across model classes. One model class may be design... |

53 | Well-trained PETs: Improving probability estimation trees,
- Provost, Domingos
- 2001
(Show Context)
Citation Context ...ecision tree determines a class label of a leaf node from the proportion of instances at the node; the class decision is simply the most prevalent class. These class proportions may serve as a score (=-=Provost and Domingos, 2001-=-). A rule learner keeps similar statistics on rule confidence, and the confidence of a rule matching an instance can be used as a score (Fawcett, 2001). Even if a classifier only produces a class labe... |

53 |
Signal detection theory: valuable tools for evaluating inductive learning.
- Spackman
- 1989
(Show Context)
Citation Context .... ROC graphs have long been used in signal detection theory to depict the tradeoff between hit rates and false alarm rates of classifiers (Egan, 1975; Swets et al., 2000). ROC analysis has been extended for use in visualizing and analyzing the behavior of diagnostic systems (Swets, 1988). The medical decision making community has an extensive literature on the use of ROC graphs for diagnostic testing (Zou, 2002). Swets et al. (2000) brought ROC curves to the attention of the wider public with their Scientific American article. One of the earliest adopters of ROC graphs in machine learning was Spackman (1989), who demonstrated the value of ROC curves in evaluating and comparing algorithms. Recent years have seen an increase in the use of ROC graphs in the machine learning community, due in part to the realization that simple classification accuracy is often a poor metric for measuring performance (Provost and Fawcett, 1997; Provost et al., 1998). In addition to being a generally useful performance graphing method, they have properties that make them especially useful for0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2005.10.010 E-mail addresses: tfawc... |

46 | Using rule sets to maximize roc performance,
- Fawcett
- 2001
(Show Context)
Citation Context ...ss proportions may serve as a score (Provost and Domingos, 2001). A rule learner keeps similar statistics on rule confidence, and the confidence of a rule matching an instance can be used as a score (=-=Fawcett, 2001-=-). Even if a classifier only produces a class label, an aggregation of them may be used to generate a score. MetaCost (Domingos, 1999) employs bagging to generate an ensemble of discrete classifiers, ... |

38 |
A simple generalization of the area under the ROC curve to multiple class classification problems
- Hand, Till
(Show Context)
Citation Context ...ier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. This isA B 0 0.2 0.4 0.6 0.8 1.0 False positive rate 0 0.2 0.4 0.6 0.8 1.0 T ru e po si tiv e ra te (a) Fig. 8. Two ROC graphs. The graph on the left shows the area under two R discrete classifier (A) and a probabilistic classifier (B).equivalent to the Wilcoxon test of ranks (Hanley and McNeil, 1982). The AUC is also closely related to the Gini coefficient (Breiman et al., 1984), which is twice the area between the diagonal and the ROC curve. Hand and Till (2001) point out that Gini + 1 = 2 · AUC. Fig. 8a shows the areas under two ROC curves, A and B. Classifier B has greater area and therefore better average performance. Fig. 8b shows the area under the curve of a binary classifier A and a scoring classifier B. Classifier A represents the performance of B when B is used with a single, fixed threshold. Though the performance of the two is equal at the fixed point (As threshold), As performance becomes inferior to B further from this point. It is possible for a high-AUC classifier to perform worse in a specific region of ROC space than a low-AUC clas... |

36 | Note on the location of optimal classifiers in ndimensional ROC space.
- Srinivasan
- 1999
(Show Context)
Citation Context ..., which is easy to visualize. 9.1. Multi-class ROC graphs With more than two classes the situation becomes much more complex if the entire space is to be managed. With n classes the confusion matrix becomes an n · n matrix containing the n correct classifications (the major diagonal entries) and n2 n possible errors (the off-diagonal entries). Instead of managing trade-offs between TP and FP, we have n benefits and n2 n errors. With only three classes, the surface becomes a 32 3 = 6-dimensional polytope. Lane (2000) has outlined the issues involved and the prospects for addressing them. Srinivasan (1999) has shown 872 T. Fawcett / Pattern Recognition Letters 27 (2006) 861–874that the analysis behind the ROC convex hull extends to multiple classes and multi-dimensional convex hulls. One method for handling n classes is to produce n different ROC graphs, one for each class. Call this the class reference formulation. Specifically, if C is the set of all classes, ROC graph i plots the classification performance using class ci as the positive class and all other classes as the negative class, i.e. P i ci ð2Þ Ni [ j 6i cj 2 C ð3Þ While this is a convenient formulation, it compromises one of th... |

33 |
Confidence bands for ROC curves: Methods and an empirical study. In:
- Macskassy, Provost
- 2004
(Show Context)
Citation Context ...do 8: p ROC_POINT_AT_THRESHOLD(ROCS[i], npts[i], T[tidx]) 9: fprsum fprsum + p.fpr 10: tprsum tprsum + p.tpr 11: end for 12: Avg[s] (fprsum/nrocs, tprsum/nrocs) 13: s s + 1 14: end for 15: end 1: function ROC_POINT_AT_THRESHOLD(ROC,npts, thresh) 2: i 1 3: while i 6 npts and ROC[i]. score > thresh do 4: i i + 1 5: end while 6: return ROC[i] 7: end function T. Fawcett / Pattern Recognition Letters 27 (2006) 861–874 871should not be compared across model classes. Because of this, ROC curves averaged from different model classes may be misleading because the scores may be incommensurate. Finally, Macskassy and Provost (2004) have investigated different techniques for generating confidence bands for ROC curves. They investigate confidence intervals from vertical and threshold averaging, as well as three methods from the medical field for generating bands (simultaneous join confidence regions, Working-Hotelling based bands, and fixed-width confidence bands). The reader is referred to their paper for a much more detailed discussion of the techniques, their assumptions, and empirical studies. 9. Decision problems with more than two classes Discussions up to this point have dealt with only two classes, and much of the... |

27 | Repairing concavities in ROC curves. In
- Flach, Wu
- 2003
(Show Context)
Citation Context ... classifier on the diagonal may be said to have no information about the class. A classifier below the diagonal may be said to have useful information, but it is applying the information incorrectly (=-=Flach and Wu, 2003-=-). Given an ROC graph in which a classifier!s performance appears to be slightly better than random, it is natural to ask: ‘‘is this classifier!s performance truly significant or is it only better tha... |

19 |
Signal detection theory and ROC analysis, Series in Cognition and Perception.
- Egan
- 1975
(Show Context)
Citation Context ...ganizing and selecting classifiers based on their performance. ROC graphs have long been used in signal detection theory to depict the tradeoff between hit rates and false alarm rates of classifiers (=-=Egan, 1975-=-; Swets et al., 2000). ROC analysis has been extended for use in visualizing and analyzing the behavior of diagnostic systems (Swets, 1988). The medical decision making community has an extensive lite... |

13 |
A rule-learning program in high energy physics event classification.
- Clearwater, Stern
- 1991
(Show Context)
Citation Context ...e changes in class distributions may seem contrived and unrealistic. However, class skews of 101 and 102 are very common in real world domains, and skews up to 106 have been observed in some domains (=-=Clearwater and Stern, 1991-=-; Fawcett and Provost, 1996; Kubat et al., 1998; Saitta and Neri, 1998). Substantial changes in class distributions are not unrealistic either. For example, in medical decision making epidemics may ca... |

11 | Representation quality in text classification: An introduction and experiment. In:
- Lewis
- 1990
(Show Context)
Citation Context ...without altering the fundamental characteristic of the class, i.e., the target concept. Precision and recall are common in information retrieval for evaluating retrieval (classification) performance (=-=Lewis, 1990-=-, 1991). Precision-recall graphs are commonly used where static document sets can sometimes be assumed; however, they are also used in dynamic environments such as web page retrieval, where the number... |

9 |
Extensions of ROC analysis to multi-class domains. In: Dietterich,
- Lane
- 2000
(Show Context)
Citation Context ...in the two-class problem. The resulting performance can be graphed in two dimensions, which is easy to visualize. 9.1. Multi-class ROC graphs With more than two classes the situation becomes much more complex if the entire space is to be managed. With n classes the confusion matrix becomes an n · n matrix containing the n correct classifications (the major diagonal entries) and n2 n possible errors (the off-diagonal entries). Instead of managing trade-offs between TP and FP, we have n benefits and n2 n errors. With only three classes, the surface becomes a 32 3 = 6-dimensional polytope. Lane (2000) has outlined the issues involved and the prospects for addressing them. Srinivasan (1999) has shown 872 T. Fawcett / Pattern Recognition Letters 27 (2006) 861–874that the analysis behind the ROC convex hull extends to multiple classes and multi-dimensional convex hulls. One method for handling n classes is to produce n different ROC graphs, one for each class. Call this the class reference formulation. Specifically, if C is the set of all classes, ROC graph i plots the classification performance using class ci as the positive class and all other classes as the negative class, i.e. P i ci ð2... |

9 |
Receiver operating characteristic (ROC) literature research. On-line bibliography available from:
- Zou
- 2002
(Show Context)
Citation Context ...fier evaluation; Evaluation metrics1. Introduction A receiver operating characteristics (ROC) graph is a technique for visualizing, organizing and selecting classifiers based on their performance. ROC graphs have long been used in signal detection theory to depict the tradeoff between hit rates and false alarm rates of classifiers (Egan, 1975; Swets et al., 2000). ROC analysis has been extended for use in visualizing and analyzing the behavior of diagnostic systems (Swets, 1988). The medical decision making community has an extensive literature on the use of ROC graphs for diagnostic testing (Zou, 2002). Swets et al. (2000) brought ROC curves to the attention of the wider public with their Scientific American article. One of the earliest adopters of ROC graphs in machine learning was Spackman (1989), who demonstrated the value of ROC curves in evaluating and comparing algorithms. Recent years have seen an increase in the use of ROC graphs in the machine learning community, due in part to the realization that simple classification accuracy is often a poor metric for measuring performance (Provost and Fawcett, 1997; Provost et al., 1998). In addition to being a generally useful performance gra... |

7 |
A method for discovering the insignificance of ones best classifier and the unlearnability of a classification task. In: Lavrac,
- Forman
- 2002
(Show Context)
Citation Context ... point in the upper left triangle. In Fig. 2, E performs much worse than random, and is in fact the negation of B. Any classifier on the diagonal may be said to have no information about the class. A classifier below the diagonal may be said to have useful information, but it is applying the information incorrectly (Flach and Wu, 2003). Given an ROC graph in which a classifiers performance appears to be slightly better than random, it is natural to ask: ‘‘is this classifiers performance truly significant or is it only better than random by chance?’’ There is no conclusive test for this, but Forman (2002) has shown a methodology that addresses this question with ROC curves. 4. Curves in ROC space Many classifiers, such as decision trees or rule sets, are designed to produce only a class decision, i.e., a Y or N on each instance. When such a discrete classifier is applied to a test set, it yields a single confusion matrix, which inturn corresponds to one ROC point. Thus, a discrete classifier produces only a single point in ROC space. Some classifiers, such as a Naive Bayes classifier or a neural network, naturally yield an instance probability or score, a numeric value that represents the degr... |

5 | Learning in the ‘‘real world’’.
- Saitta, Neri
- 1998
(Show Context)
Citation Context ...r, class skews of 101 and 102 are very common in real world domains, and skews up to 106 have been observed in some domains (Clearwater and Stern, 1991; Fawcett and Provost, 1996; Kubat et al., 1998; =-=Saitta and Neri, 1998-=-). Substantial changes in class distributions are not unrealistic either. For example, in medical decision making epidemics may cause the incidence of a disease to increase over time. In fraud detecti... |

1 | Fawcett / - unknown authors - 2006 |