(Enter summary)
Abstract: . An importantcomponent of many data mining projects is finding a good classification
algorithm, a process that requires very careful thought about experimental design. If not done very
carefully, comparative studies of classification and other types of algorithms can easily result in
statistically invalid conclusions. This is especially true when one is using data mining techniques to
analyze very large databases, which inevitably contain some statistically unlikely data. This paper
describes... (Update)
Context of citations to this paper: More
...how well those algorithms perform on standard data sets (e.g. UC Irvine repository [20] or in a particular domain. As pointed out in [28], such comparisons of algorithms may be misleading since the performance of the classifiers they produce depends strongly on the specific...
...a simple, easily understood, and wellstudied distribution. Its use for evaluation has been discussed for classifiers [1] 10] 15] 41] [42] and for evaluation and tuning in a variety of other domains, including inductive learning systems [18] and speech recognition...
Cited by: More
Diversity in Ensemble Feature Selection - Tsymbal, Pechenizkiy, Cunningham (2003)
(Correct)
The PNC 2 Cluster Algorithm - An integrated learning algorithm.. - Haendel (2003)
(Correct)
A Stratified Methodology for Classifier and Recognizer.. - Micheals, Boult
(Correct)
Similar documents (at the sentence level):
79.8%: On Comparing Classifiers: Pitfalls to Avoid and a Recommended.. - Salzberg (1997)
(Correct)
Active bibliography (related documents): More All
0.3: Multiple Comparisons in Induction Algorithms - Jensen, Cohen (1998)
(Correct)
0.3: Theory Refinement of Bayesian Networks with Hidden Variables - Ramachandran (1998)
(Correct)
0.3: A Radial Basis Function Approach to Financial Time Series Analysis - Hutchinson (1994)
(Correct)
Similar documents based on text: More All
0.1: Prediction of Transcription Terminators in Bacterial.. - Ermolaeva, Khalak.. (2000)
(Correct)
0.1: Comment on "Setting Confidence Intervals for - Bounded Parameters By
(Correct)
0.1: A Weighted Nearest Neighbor Algorithm for Learning with.. - Cost, Salzberg (1993)
(Correct)
Related documents from co-citation: More All
5: A study of cross-validation and bootstrap for accuracy estimation and model sele..
- Kohavi - 1995
3: A dilemma for fitness sharing with a scaling function
- Darwen, Yao - 1995
3: Multi-Interval Discretization of Continuous-Valued Attributes for Classification.. (context) - Fayyad, Irani - 1993
BibTeX entry: (Update)
S. L. Salzberg. On comparing classifiers: A critique of current research and methods. Technical Report CS-1995-06, John Hopkins University, 1995. http://citeseer.ist.psu.edu/salzberg95comparing.html More
@article{ salzbergsalzbergcomparing,
author = "Steven Salzberg",
title = "On Comparing Classifiers: {A} Critique of Current Research and Methods",
url = "citeseer.ist.psu.edu/salzberg95comparing.html" }
Citations (may not include all citations):
256
Parallel networks that learn to pronounce english text (context) - Sejnowski, Rosenberg - 1987
216
Very simple classification rules perform well on most common.. (context) - Holte - 1993
203
Multi-interval discretization of continuous valued attribute.. (context) - Fayyad, Irani - 1993
84
A conservation law for generalization performance (context) - Schaffer - 1994
82
UCI repository of machine learning databases -- a machine-re.. (context) - Murphy - 1995
70
Predicting the secondary structure of globular proteins usin.. (context) - Qian, Sejnowski - 1988
56
Generalizing from case studies: A case study
- Aha - 1992
55
Symbolic and neural learning algorithms: An experimental com.. (context) - Shavlik, Mooney et al. - 1991
45
An experimental comparison of the nearest-neighbor and neare..
- Wettschereck, Dietterich - 1995
45
the connection between in-sample testing and generalization .. (context) - Wolpert - 1992
43
Bayesian model selection in social research (context) - Raftery - 1995
41
Machine learning as an experimental science (context) - Kibler, Langley - 1988
20
Experimental Designs (context) - Cochran, Cox - 1957
19
A study of experimental evaluations of neural network algori..
- Prechelt - 1995
14
Which method learns most from the data (context) - Feelders, Verkooijen - 1995
7
Knowledge discovery through induction with randomization tes..
- Jensen - 1991
6
Statistical significance in inductive learning (context) - Gascuel, Caraux - 1992
3
Statistical tests for comparing supervised learning algorith.. (context) - Dietterich - 1996
3
Review of Economics and Statistics (context) - Denton, as - 1985
2
Statistical Thinking for Behavioral Scientists (context) - Hildebrand - 1986
2
Labeling space: A tool for thinking about significance testi.. (context) - Jensen - 1995
1
Overfitting in inductive learning algorithms: Why it occurs .. (context) - Jensen, Cohen - 1996
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.tigr.org/~salzberg/): More
Best-Case Results for Nearest Neighbor Learning - Salzberg, Delcher, Heath, Kasif (1995)
(Correct)
Towards a Better Understanding of Memory-Based Reasoning Systems - Rachlin (1994)
(Correct)
Book Review: "C4.5: Programs for Machine Learning" by J. Ross.. - Salzberg (1994)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC