In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT’s development are discussed, as well as the potential consequences of departing from some common statistical assumptions made when applying more traditional methods. STRAT uses an established replicate statistical technique called balanced repeated replication, or BRR, that does not require the i.i.d. assumption needed for bootstrapping, jackknifing, or binomial techniques. 1.
|
1034
|
An Introduction to the Bootstrap
– Efron, Tibshirani
- 1993
|
|
356
|
The FERET Evaluation Methodology for Face-Recognition Algorithms
– Phillips, Moon, et al.
- 2000
|
|
338
|
A study of cross-validation and bootstrap for accuracy estimation and model selection
– Kohavi
- 1995
|
|
314
|
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
– Dietterich
- 1998
|
|
264
|
Empirical methods for artificial intelligence
– Cohen
- 1996
|
|
254
|
The Jackknife, the Bootstrap, and Other Resampling Plans
– EFRON
- 1982
|
|
200
|
Sampling Techniques
– Cochran
- 1977
|
|
113
|
Statistics for experimenters: An introduction to Design
– Box, Hunter, et al.
- 1978
|
|
94
|
Estimating the error rate of a prediction rule: Improvement on cross-validation
– Efron
- 1983
|
|
91
|
Some statistical issues in the comparison of speech recognition algorithms
– Gillick, Cox
- 1989
|
|
89
|
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1:3
– Salzberg
- 1997
|
|
76
|
Topics in Algebra
– HERSTEIN
- 1964
|
|
68
|
Classification Algorithms
– James
- 1985
|
|
64
|
Analysis of Binary Data
– Cox, Snell
- 1989
|
|
48
|
A comparitive analysis of methods for pruning decision trees
– Esposito, Malerba, et al.
- 1997
|
|
39
|
The design of optimum multifactorial experiments. Biometrika
– Plackett, Burman
- 1946
|
|
34
|
A nonparametric statistical comparison of principal component and linear discriminant subspaces for face recognition
– Beveridge, She, et al.
- 2001
|
|
31
|
Modelling Binary Data
– Collett
- 1991
|
|
30
|
Bibliography on estimation of misclassification
– Toussaint
- 1974
|
|
27
|
Computer Intensive Methods for Testing Hypothesis: An Introduction
– Noreen
- 1989
|
|
20
|
On comparing classifiers: a critique of current research and methods. Data mining and knowledge discovery
– SALZBERG
- 1999
|
|
16
|
Efficient evaluation of classification and recognition systems
– Micheals, Boult
- 2001
|
|
13
|
Which method learns most from the data
– Feelders, Verkooijen
- 1995
|
|
11
|
Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets
– Jensen
- 1992
|
|
10
|
Evaluating machine learning models for engineering problems
– Reich, Barai
- 1999
|
|
9
|
Parametric and nonparametric methods for the statistical evaluation of human id algorithms
– Beveridge, She, et al.
- 2001
|
|
9
|
Psychological Statistics
– McNemar
- 1969
|
|
7
|
Statistical significance in inductive learning
– Gascuel, Caraux
- 1992
|
|
7
|
Inference from complex samples
– Kish, Frankel
- 1974
|
|
6
|
Asymptotic properties of the balanced repeated replication method for sample quantiles
– Shao, Wu
- 1992
|
|
5
|
Balanced repeated replications based on orthogonal multi-arrays
– Sitter
- 1993
|
|
4
|
Construction orthogonal replications for variance estimation
– Gurney, Jewett
- 1975
|
|
4
|
Fay’s method for variance estimation
– Judkins
- 1990
|
|
4
|
Replication: An Approach to the Analysis of Data From Complex Surveys. U.S. Government Printing Office
– McCarthy
- 1966
|
|
3
|
Facial recognition vendor test 2000. http://www.dodcounterdrug.com/facialrecognition
– Blackburn, Bone, et al.
- 2000
|
|
3
|
Inference from Survey Samples: An Empirical Investication. Litho Crafters
– Frankel
- 1971
|
|
3
|
Variance estimation for complex surveys using replication techniques
– Rust, Rao
- 1996
|
|
2
|
SuperResoluation Imaging, chapter Super-Resolution via Image Warping
– Boult, Chiang, et al.
- 2001
|
|
2
|
Confidence intervals for clustered samples
– Kish
- 1957
|
|
2
|
Inference from statified samples: Properties of the linearization, jackknife and balanced repeated replication methods
– Krewski, Rao
- 1981
|
|
2
|
Practical Methods for Design and Analaysis of Complex Surveys. Statistics in Practice
– Lehtonen, Pahkinen
- 1995
|
|
2
|
Replicate statistics for efficient vision system evaluation
– Micheals, Boult
- 2000
|
|
2
|
Finite Population Sampling and Inference
– Valliant, Dorfman, et al.
- 2000
|
|
1
|
Learning from Data: AI and Statistics V, chapter On the Statistical Comparison of Inductive Learning Methods
– Feelders, Verkooijen
- 1996
|
|
1
|
Spending on the selling of wisdom
– Johnson, Keeves
- 2000
|
|
1
|
Introduction to Variance Estimation. Springer series in statistics
– Wolter
- 1985
|