(Enter summary)
Abstract: Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tree classi ers to large datasets. Unfortunately, both of these techniques can cause a signi cant loss in accuracy. We present a novel decision tree classi er called CLOUDS, which samples the splitting points for numeric attributes followed by an estimation step to narrow the search space of the best split. CLOUDS reduces... (Update)
Context of citations to this paper: More
...directly from this both approaches. Other approaches consider approximation techniques for scaling up the classification, e.g. sampling [1] and discretization, as well as permitting the user to specify constraints on tree size [7] Particularly, approximation techniques...
...classifiers. In Section 1.1. we introduce and compare several state of the art classification algorithms, including SPRINT[17] and CLOUDS[14]. We present our CMP family classifiers in detail in Section 2. including the data structure, techniques to avoid accuracy loss, the...
Cited by: More
Mining Data Streams Using Option Trees - Holmes, Kirkby, Pfahringer (2004)
(Correct)
Decision Trees: More Theoretical Justification for Practical.. - Pechyony (2004)
(Correct)
XRules: An Effective Structural Classifier for XML Data - Zaki, Aggarwal (2003)
(Correct)
Active bibliography (related documents): More All
0.7: Tree-based Incremental Classification for Large Datasets - Yoon, Alsabti, Ranka (1999)
(Correct)
0.5: SPRINT: A Scalable Parallel Classifier for Data Mining - Shafer, Agrawal, Mehta (1996)
(Correct)
0.2: Predictive Modeling Based On Classification And Pattern Matching.. - Wang (1999)
(Correct)
Similar documents based on text: More All
0.6: A One-Pass Algorithm for Accurately Estimating Quantiles.. - Alsabti, Ranka, Singh (1997)
(Correct)
0.5: An Efficient Parallel Algorithm for High Dimensional.. - Alsabti, Ranka, Singh (1997)
(Correct)
0.4: An Efficient K-Means Clustering Algorithm - Alsabti, Ranka, Singh (1998)
(Correct)
Related documents from co-citation: More All
8: SPRINT: A scalable parallel classifier for data mining
- Shafer, Agrawal et al. - 1996
7: SLIQ: A fast scalable classifier for data mining
- Mehta, Agrawal et al. - 1996
5: Programs for machine learning (context) - Quinlan - 1993
BibTeX entry: (Update)
K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A decision tree classifier for large datasets. In 4th Intl. Conf. on Knowledge Discovery and Data Mining, Aug 1998. http://citeseer.ist.psu.edu/alsabti98clouds.html More
@inproceedings{ alsabti98clouds,
author = "Khaled Alsabti and Sanjay Ranka and Vineet Singh",
title = "{CLOUDS}: A Decision Tree Classifier for Large Datasets",
booktitle = "Knowledge Discovery and Data Mining",
pages = "2-8",
year = "1998",
url = "citeseer.ist.psu.edu/alsabti98clouds.html" }
Citations (may not include all citations):
1051
Optimization and Machine Learning (context) - Goldberg, in - 1989
667
UCI Repository of Machine Learning Databases (context) - Murphy, Aha - 1994
417
Stochastic Complexity in Statistical Inquiry (context) - Rissanen - 1989
281
Programs for Machine Learning (context) - Quinlan - 1993
121
Classication and Regression Trees (context) - Breiman, Friedman et al. - 1984
85
ChiMerge: Discretization of Numeric Attributes (context) - Kerber - 1992
62
Megainduction: Machine Learning on Very Large Databases (context) - Catlett - 1991
54
Meta-Learning for Multistrategy and Parallel Learning (context) - Chan, Stolfo - 1993
35
MDL-Based Decision Tree Pruning
- Mehta, Rissanen et al. - 1995
22
Neural and Statistical Classication (context) - Michie, Spiegelhalter et al. - 1994
18
A One-Pass Algorithm for Accurately Estimating Quantiles for..
- Alsabti, Ranka et al. - 1997
17
SPRINT: A Scalable Parallel Classier for Data Mining
- Shafer, Agrawal et al. - 1996
10
SLIQ: A Fast Scalable Classier for Data Mining (context) - Mehta, Agrawal et al. - 1996
6
IEEE Transactions on Knowledge and Data Engineering (context) - Liu, Setiono et al. - 1997
5
Computer Systems that Learn: Classication and Prediction Met.. (context) - Weiss, Kulikowski - 1991
4
MINI: A Heuristic Algorithm for Generating Minimal Rules fro.. (context) - Hong - 1994
2
Classication Algorithms (context) - James - 1985
2
Introduction to IND Version (context) - Research - 1992
1
Department of Computer Science and Engineering (context) - Srivastava, Singh et al. - 1996
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cise.ufl.edu/~ranka/dm.html): More
Integer Sorting Algorithms for Coarse-Grained Parallel Machines - Alsabti, Ranka (1997)
(Correct)
Skew-Insensitive Parallel Algorithms for Relational Join - Alsabti, Ranka
(Correct)
An Efficient Parallel Algorithm for High Dimensional.. - Alsabti, Ranka, Singh (1997)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC