MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  DRAFT Robust Classification for Imprecise Environments

Download:
Download as a PDF | Download as a PS
by Foster Provost, Tom Fawcett
http://www.hpl.hp.com/personal/Tom_Fawcett/papers/ROCCH-MLJ.ps.gz
Add To MetaCart

Abstract:

In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. We then show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. In some cases, the performance of the hybrid can actually surpass that of the best known classifier. The hybrid is also efficient to build, to store, and to update. Finally, we point to empirical evidence that a robust hybrid classifier is needed for many real-world problems.

Citations

3215 C4.5: Programs for Machine Learning – Quinlan - 1993
2438 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
1453 Bagging Predictors – Breiman - 1996
338 A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection – Kohavi - 1995
314 Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms – Dietterich - 1998
301 Supervised and unsupervised discretization of continuous features – Dougherty, Kohavi, et al. - 1995
234 Beyond independence: Conditions for the optimality of the simple bayesian classifier – Domingos, Pazzani - 1996
224 The Quickhull algorithm for convex hulls – Barber, Dobkin, et al. - 1996
215 The meaning and use of the area under a receiver operating characteristic (ROC) curve – Hanley, McNeil - 1982
204 The case against accuracy estimation for comparing induction algorithms – Provost, Fawcett, et al. - 1998
203 The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms – Bradley - 1997
129 Data Mining Techniques for – Berry, Linoff - 1997
129 Data mining using MLC++: a machine learning library in C – Kohavi, Sommerfield, et al. - 1996
120 Adaptive fraud detection – FAWCETT, PROVOST - 1997
114 Measuring the accuracy of diagnostic systems – Swets - 1988
108 Error reduction through learning multiple descriptions – Ali, Pazzani - 1996
89 On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1:3 – Salzberg - 1997
79 Reducing Misclassification Costs – Pazzani, Merz, et al. - 1994
54 Machine learning for the detection of oil spills in satellite radar images – Kubat, Holte, et al. - 1998
52 Theory of games and statistical decisions – Blackwell, Girshick - 1954
21 Building robust learning systems by combining induction and optimization – Tcheng, Lambert, et al. - 1989
18 Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management – Ezawa, K, et al. - 1996
12 A rule-learning program in high energy physics event classification – Clearwater, Stern - 1991
10 Cost-sensitive learning bibliography. http://ai.iit.nrc.ca/bibliographies/cost-sensitive.html – Turney - 1996
7 The use of ROC curves in test performance evaluation – BECK, SHULTZ - 1986
6 Multicriteria Optimization in Engineering and – Stadler - 1988
5 Learning in the "Real World – Saitta, Neri - 1998
4 Tailoring rulesets to misclassificatioin costs – Catlett - 1995
1 Running head: Robust Classification for Imprecise Environments 31 – Egan - 1975