Results 1  10
of
14
Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining a ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
Nearest prototype classifier designs: an experimental study
 International Journal of Intelligent Systems
"... We compare eleven methods for finding prototypes upon which to base the nearest prototype classifier. Four methods for prototype selection are discussed: Wilson � Hart Ža condensation � errorediting method., and three types of combinatorial search�random search, genetic algorithm, and tabu search. ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
We compare eleven methods for finding prototypes upon which to base the nearest prototype classifier. Four methods for prototype selection are discussed: Wilson � Hart Ža condensation � errorediting method., and three types of combinatorial search�random search, genetic algorithm, and tabu search. Seven methods for prototype extraction are discussed: unsupervised vector quantization, supervised learning vector quantization Ž with and without training counters., decision surface mapping, a fuzzy version of vector quantization, cmeans clustering, and bootstrap editing. These eleven methods can be usefully divided two other ways: by whether they employ pre or postsupervision; and by whether the number of prototypes found is userdefined or ‘‘automatic.’ ’ Generalization error rates of the 11 methods are estimated on two synthetic and two real data sets. Offering the usual disclaimer that these are just a limited set of experiments, we feel confident in asserting that presupervised, extraction methods offer a better chance for success to the casual user than postsupervised, selection schemes. Finally, our calculations do not suggest that methods which find the ‘‘best’ ’ number of prototypes ‘‘automatically’’ are superior to methods for which the user simply specifies the number of prototypes. � 2001 John Wiley & Sons, Inc. I.
An immuneinspired instance selection mechanism
"... for supervised classification ..."
(Show Context)
ORIGINAL ARTICLE RRS + LSSVM: a new strategy for ‘‘a priori’ ’ sample selection
"... Abstract We present in this work a new Sparse Hybrid Classifier, by using reduced remaining subset (RRS) with least squares support vector machine (LSSVM). RRS is a sample selection technique based on a modified nearest neighbor rule. It is used in order to choose the best samples to represent eac ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present in this work a new Sparse Hybrid Classifier, by using reduced remaining subset (RRS) with least squares support vector machine (LSSVM). RRS is a sample selection technique based on a modified nearest neighbor rule. It is used in order to choose the best samples to represent each class of a given database. After that, LSSVM uses the samples selected by RRS as support vectors to find the decision surface between the classes, by solving a system of linear equations. This hybrid classifier is considered as a sparse one because it is able to detect support vectors, what is not possible when using LSSVM separately. Some experiments are presented to compare the proposed approach with two existent methods that also aim to impose sparseness in LSSVMs, called LS2SVM and AdaPinv.
Stratified prototype selection based on a steadystate memetic algorithm: a study of scalability
"... Abstract Prototype selection (PS) is a suitable data reduction process for refining the training set of a data mining algorithm. Performing PS processes over existing datasets can sometimes be an inefficient task, especially as the size of the problem increases. However, in recent years some techn ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Prototype selection (PS) is a suitable data reduction process for refining the training set of a data mining algorithm. Performing PS processes over existing datasets can sometimes be an inefficient task, especially as the size of the problem increases. However, in recent years some techniques have been developed to avoid the drawbacks that appeared due to the lack of scalability of the classical PS approaches. One of these techniques is known as stratification. In this study, we test the combination of stratification with a previously published steadystate memetic algorithm for PS in various problems, ranging from 50,000 to more than 1 million instances. We perform a comparison with some wellknown PS methods, and make a deep study of the effects of stratification in the behavior of the selected method, focused on its time complexity, accuracy and convergence capabilities. Furthermore, the tradeoff between accuracy and efficiency of the proposed combination is analyzed, concluding that it is a very suitable option to perform PS tasks when the size of the problem exceeds the capabilities of the classical PS methods.
Pattern Analysis and Learning Group
"... It has been observed that class imbalance (that is, significant differences in class prior probabilities) may produce an important deterioration of the performance achieved by existing learning and classification systems. This situation is often found in realworld data describing an infrequent but ..."
Abstract
 Add to MetaCart
(Show Context)
It has been observed that class imbalance (that is, significant differences in class prior probabilities) may produce an important deterioration of the performance achieved by existing learning and classification systems. This situation is often found in realworld data describing an infrequent but important case. In the present work, we perform a review of the most important research lines on this topic and point out several directions for further investigation. 1
DOI 10.1007/s1011501104656 REGULAR PAPER
"... SMOTERSB∗: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced datasets using SMOTE and rough sets theory ..."
Abstract
 Add to MetaCart
(Show Context)
SMOTERSB∗: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced datasets using SMOTE and rough sets theory
DOI 10.1007/s100440080142x THEORETICAL ADVANCES A new fast prototype selection method based on clustering
"... Abstract In supervised classification, a training set T is given to a classifier for classifying new prototypes. In practice, not all information in T is useful for classifiers, therefore, it is convenient to discard irrelevant prototypes from T. This process is known as prototype selection, which i ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In supervised classification, a training set T is given to a classifier for classifying new prototypes. In practice, not all information in T is useful for classifiers, therefore, it is convenient to discard irrelevant prototypes from T. This process is known as prototype selection, which is an important task for classifiers since through this process the time for classification or training could be reduced. In this work, we propose a new fast prototype selection method for large datasets, based on clustering, which selects border prototypes and some interior prototypes. Experimental results showing the performance of our method and comparing accuracy and runtimes against other prototype selection methods are reported.
DOI 10.1007/s1011500701142 SURVEY PAPER Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE