Results 1 - 10
of
81
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
- Machine Learning,
, 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract
-
Cited by 707 (2 self)
- Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only "hard" areas but also outliers and noise.
Cost-sensitive boosting for classification of imbalanced data
, 2007
"... Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent o ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
(Show Context)
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent occurrence of the class imbalance problem indicate the need for extra research efforts. The objective of this paper is to investigate meta-techniques applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. The AdaBoost algorithm is reported as a successful meta-technique for improving classification accuracy. The insight gained from a comprehensive analysis of the AdaBoost algorithm in terms of its advantages and shortcomings in tacking the class imbalance problem leads to the exploration of three cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. Further analysis shows that one of the proposed algorithms tallies with the stagewise additive modelling in statistics to minimize the cost exponential loss. These boosting algorithms are also studied with respect to their weighting strategies towards different types of samples, and their effectiveness in identifying rare cases through experiments on several real world medical data sets, where the class imbalance problem prevails.
Minority Report in Fraud Detection: Classification of Skewed Data
- ACM SIGKDD EXPLORATIONS
, 2004
"... This paper proposes an innovative fraud detection method, built upon existing fraud detection research and Minority Report, to deal with the data mining problem of skewed data distributions. This method uses backpropagation (BP), together with naive Bayesian (NB) and C4.5 algorithms, on data partiti ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
This paper proposes an innovative fraud detection method, built upon existing fraud detection research and Minority Report, to deal with the data mining problem of skewed data distributions. This method uses backpropagation (BP), together with naive Bayesian (NB) and C4.5 algorithms, on data partitions derived from minority oversampling with replacement. Its originality lies in the use of a single meta-classifier (stacking) to choose the best base classifiers, and then combine these base classifiers’ predictions (bagging) to improve cost savings (stacking-bagging). Results from a publicly available automobile insurance fraud detection data set demonstrate that stacking-bagging performs slightly better than the best performing bagged algorithm, C4.5, and its best classifier, C4.5 (2), in terms of cost savings. Stackingbagging also outperforms the common technique used in industry (BP without both sampling and partitioning). Subsequently, this paper compares the new fraud detection method (meta-learning approach) against C4.5 trained using undersampling, oversampling, and SMOTEing without partitioning (sampling approach). Results show that, given a fixed decision threshold and cost matrix, the partitioning and multiple algorithms approach achieves marginally higher cost savings than varying the entire training data set with different class distributions. The most interesting find is confirming that the combination of classifiers to produce the best cost savings has its contributions from all three algorithms.
Bayesian approaches to failure prediction for disk drives
- In Proc. 18th ICML
, 2001
"... Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
(Show Context)
Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of drive internal conditions. We first view the problem from an anomaly detection stance. We introduce a mixture model of naive Bayes submodels (i.e. clusters) that is trained using expectation-maximization. The second method is a naive Bayes classifier, a supervised learning approach. Both methods are tested on realworld data concerning 1936 drives. The predictive accuracy of both algorithms is far higher than the accuracy of thresholding methods used in the disk drive industry today. 1.
Using SQL to Build New Aggregates and Extenders for Object-Relational Systems
- In Proceedings of 26th International Conference on Very Large Data Bases
, 2000
"... User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database prog ..."
Abstract
-
Cited by 44 (11 self)
- Add to MetaCart
(Show Context)
User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database programmers because it preserves the constructs, programming paradigm and data types of SQL (whereas there is an `impedance mismatch' between SQL and the procedural languages of user-defined functions currently used in O-R DBs). AXL will also inherit the benefits of database query languages, such as scalability, data independence and parallelizability. In this paper, we show that, while adding only minimal extensions to SQL, AXL is very powerful and capable of expressing complex algorithms e#ciently. We demonstrate this by coding data mining functions and other advanced applications that, previously, had been a major problem for SQL databases. Due to its flexibility, SQL-compati...
Ensemble Feature Selection with the Simple Bayesian Classification
"... A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and di ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random subspaces...
Magical Thinking in Data Mining: Lessons From CoIL Challenge 2000
- In Knowledge Discovery and Data Mining
, 2001
"... CoIL challenge 2000 was a supervised learning contest that attracted 43 entries. The authors of 29 entries later wrote explanations of their work. This paper discusses these reports and reaches three main conclusions. First, naive Bayesian classifiers remain competitive in practice: they were used b ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
(Show Context)
CoIL challenge 2000 was a supervised learning contest that attracted 43 entries. The authors of 29 entries later wrote explanations of their work. This paper discusses these reports and reaches three main conclusions. First, naive Bayesian classifiers remain competitive in practice: they were used by both the winning entry and the next best entry. Second, identifying feature interactions correctly is important for maximizing predictive accuracy: this was the difference between the winning classifier and all others. Third and most important, too many researchers and practitioners in data mining do not appreciate properly the issue of statistical significance and the danger of overfitting. Given a dataset such as the one for the CoIL contest, it is pointless to apply a very complicated learning algorithm, or to perform a very time-consuming model search. In either case, one is likely to overfit the training data and to fool oneself in estimating predictive accuracy and in discovering useful correlations.
Estimating campaign benefits and modeling lift
- in Proc. of the 5th SIGKDD International Conference on Knowledge Discovery and Data Mining
"... In assessing the potential of data mining based marketing campaigns one needs to estimate the payoff of applying modeling to the problem of predicting behavior of some target population (e.g. attriters, people likely to buy product X, people likely to default on a loan, etc). This assessment has two ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
In assessing the potential of data mining based marketing campaigns one needs to estimate the payoff of applying modeling to the problem of predicting behavior of some target population (e.g. attriters, people likely to buy product X, people likely to default on a loan, etc). This assessment has two components: a) the financial estimate of the campaign profitability, based on cost/benefit analysis and b) estimation of model accuracy in the targeted population using measures such as lift. We present a methodology for initial cost/benefit analysis and present surprising empirical results, based on actual business data from several domains, on achievable model accuracy. We conjecture that lift at T (where T is the target frequency) is usually about sqrt(1/T) for a good model. We also present formulae for estimating the entire lift curve and estimating expected profits.
Reducing Multiclass to Binary By Coupling Probability Estimates
, 2001
"... This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise cou ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.
A case study of applying boosting Naive Bayes to claim fraud diagnosis
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... In this paper, we apply the weight of evidence reformulation of AdaBoosted naive Bayes scoring due to Ridgeway et al. [38] to the problem of diagnosing insurance claim fraud. The method effectively combines the advantages of boosting and the explanatory power of the weight of evidence scoring framew ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
In this paper, we apply the weight of evidence reformulation of AdaBoosted naive Bayes scoring due to Ridgeway et al. [38] to the problem of diagnosing insurance claim fraud. The method effectively combines the advantages of boosting and the explanatory power of the weight of evidence scoring framework. We present the results of an experimental evaluation with an emphasis on discriminatory power, ranking ability, and calibration of probability estimates. The data to which we apply the method consists of closed personal injury protection (PIP) automobile insurance claims from accidents that occurred in Massachusetts during 1993 and were previously investigated for suspicion of fraud by domain experts. The data mimic the most commonly occurring data configuration—that is, claim records consisting of information pertaining to several binary fraud indicators. The findings of the study reveal the method to be a valuable contribution to the design of intelligible, accountable, and efficient fraud detection support.