Results 1 - 10
of
202,515
Empirical evaluation of feature subset selection based on a real world data set
- In Proc. PKDD-2000 (LNAI 1910
, 2000
"... Abstract. Selecting the right set of features for classification is one of the most important problems in designing a good classifier. Decision tree induction algorithms such as C4.5 have incorporated in their learning phase an automatic feature selection strategy while some other statistical classi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
classification algorithm require the feature subset to be selected in a preprocessing phase. It is well know that correlated and irrelevant features may degrade the performance of the C4.5 algorithm. In our study, we evaluated the influence of feature pre-selection on the prediction accuracy of C4.5 using a real-world
Autonomous Robots manuscript No. (will be inserted by the editor) Comparing ICP Variants on Real-World Data Sets
"... protocol. ..."
Parallel Evolutionary Algorithms with SOM-Like Migration and their Application to Real World Data Sets
"... We introduce a multiple subpopulation approach for parallel evolutionary algorithms the migration scheme of which follows a SOM-like dynamics. We succesfully apply this approach to clustering in both VLSI-design and psychotherapy research. The advantages of the approach are shown which consist in a ..."
Abstract
- Add to MetaCart
We introduce a multiple subpopulation approach for parallel evolutionary algorithms the migration scheme of which follows a SOM-like dynamics. We succesfully apply this approach to clustering in both VLSI-design and psychotherapy research. The advantages of the approach are shown which consist in a reduced communication overhead between the subpopulations preserving a non-vanishing information °ow. 1
Based Clustering Over Data Stream
"... • Real-world data set • Sensitivity to parameters • Scalability and complexity ..."
Abstract
- Add to MetaCart
• Real-world data set • Sensitivity to parameters • Scalability and complexity
K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classication Learning. In:
- IJCAI.
, 1993
"... Abstract Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued a ..."
Abstract
-
Cited by 832 (7 self)
- Add to MetaCart
formally derive a criterion based on the minimum description length principle for deciding the partitioning of intervals. We demonstrate via empirical evaluation on several real-world data sets that better decision trees are obtained using the new multi-interval algorithm.
Power-law distributions in empirical data
- ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111
, 2009
"... Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the t ..."
Abstract
-
Cited by 607 (7 self)
- Add to MetaCart
demonstrate these methods by applying them to twentyfour real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law
SMOTE: Synthetic Minority Over-sampling Technique
- Journal of Artificial Intelligence Research
, 2002
"... An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentag ..."
Abstract
-
Cited by 634 (27 self)
- Add to MetaCart
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
- In Proc. 18th International Conf. on Machine Learning
, 2001
"... This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expec ..."
Abstract
-
Cited by 353 (2 self)
- Add to MetaCart
of the expected future error is intractable. Our approach is made feasible by taking a sampling approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on two real-world data sets we reach high accuracy very quickly, sometimes with four times fewer
Rough Sets.
- Int. J. of Information and Computer Sciences
, 1982
"... Abstract. This article presents some general remarks on rough sets and their place in general picture of research on vagueness and uncertainty -concepts of utmost interest, for many years, for philosophers, mathematicians, logicians and recently also for computer scientists and engineers particular ..."
Abstract
-
Cited by 793 (13 self)
- Add to MetaCart
particularly those working in such areas as AI, computational intelligence, intelligent systems, cognitive science, data mining and machine learning. Thus this article is intended to present some philosophical observations rather than to consider technical details or applications of rough set theory. Therefore
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract
-
Cited by 744 (0 self)
- Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all
Results 1 - 10
of
202,515