Download:
|
by Foster Provost, Tom Fawcett
http://www.hpl.hp.com/personal/Tom_Fawcett/papers/ROCCH-MLJ.ps.gz
Add To MetaCart
Abstract:
In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. We then show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. In some cases, the performance of the hybrid can actually surpass that of the best known classifier. The hybrid is also efficient to build, to store, and to update. Finally, we point to empirical evidence that a robust hybrid classifier is needed for many real-world problems.
Citations
|
3215
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1453
|
Bagging Predictors
– Breiman
- 1996
|
|
338
|
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
– Kohavi
- 1995
|
|
314
|
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
– Dietterich
- 1998
|
|
301
|
Supervised and unsupervised discretization of continuous features
– Dougherty, Kohavi, et al.
- 1995
|
|
234
|
Beyond independence: Conditions for the optimality of the simple bayesian classifier
– Domingos, Pazzani
- 1996
|
|
224
|
The Quickhull algorithm for convex hulls
– Barber, Dobkin, et al.
- 1996
|
|
215
|
The meaning and use of the area under a receiver operating characteristic (ROC) curve
– Hanley, McNeil
- 1982
|
|
204
|
The case against accuracy estimation for comparing induction algorithms
– Provost, Fawcett, et al.
- 1998
|
|
203
|
The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms
– Bradley
- 1997
|
|
129
|
Data Mining Techniques for
– Berry, Linoff
- 1997
|
|
129
|
Data mining using MLC++: a machine learning library in C
– Kohavi, Sommerfield, et al.
- 1996
|
|
120
|
Adaptive fraud detection
– FAWCETT, PROVOST
- 1997
|
|
114
|
Measuring the accuracy of diagnostic systems
– Swets
- 1988
|
|
108
|
Error reduction through learning multiple descriptions
– Ali, Pazzani
- 1996
|
|
89
|
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1:3
– Salzberg
- 1997
|
|
79
|
Reducing Misclassification Costs
– Pazzani, Merz, et al.
- 1994
|
|
54
|
Machine learning for the detection of oil spills in satellite radar images
– Kubat, Holte, et al.
- 1998
|
|
52
|
Theory of games and statistical decisions
– Blackwell, Girshick
- 1954
|
|
21
|
Building robust learning systems by combining induction and optimization
– Tcheng, Lambert, et al.
- 1989
|
|
18
|
Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management
– Ezawa, K, et al.
- 1996
|
|
12
|
A rule-learning program in high energy physics event classification
– Clearwater, Stern
- 1991
|
|
10
|
Cost-sensitive learning bibliography. http://ai.iit.nrc.ca/bibliographies/cost-sensitive.html
– Turney
- 1996
|
|
7
|
The use of ROC curves in test performance evaluation
– BECK, SHULTZ
- 1986
|
|
6
|
Multicriteria Optimization in Engineering and
– Stadler
- 1988
|
|
5
|
Learning in the "Real World
– Saitta, Neri
- 1998
|
|
4
|
Tailoring rulesets to misclassificatioin costs
– Catlett
- 1995
|
|
1
|
Running head: Robust Classification for Imprecise Environments 31
– Egan
- 1975
|