See this document in CiteSeerX!

CLOUDS: A Decision Tree Classifier for Large Datasets (1998)  (Make Corrections)  (9 citations)
Khaled Alsabti, Sanjay Ranka, Vineet Singh
Knowledge Discovery and Data Mining



  Home/Search   Context   Related

 
View or download:
ufl.edu/pub/faculty/rank...clouds.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  ufl.edu/~ranka/dm (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tree classi ers to large datasets. Unfortunately, both of these techniques can cause a signi cant loss in accuracy. We present a novel decision tree classi er called CLOUDS, which samples the splitting points for numeric attributes followed by an estimation step to narrow the search space of the best split. CLOUDS reduces... (Update)

Context of citations to this paper:   More

...directly from this both approaches. Other approaches consider approximation techniques for scaling up the classification, e.g. sampling [1] and discretization, as well as permitting the user to specify constraints on tree size [7] Particularly, approximation techniques...

...classifiers. In Section 1.1. we introduce and compare several state of the art classification algorithms, including SPRINT[17] and CLOUDS[14]. We present our CMP family classifiers in detail in Section 2. including the data structure, techniques to avoid accuracy loss, the...

Cited by:   More
Mining Data Streams Using Option Trees - Holmes, Kirkby, Pfahringer (2004)   (Correct)
Decision Trees: More Theoretical Justification for Practical.. - Pechyony (2004)   (Correct)
XRules: An Effective Structural Classifier for XML Data - Zaki, Aggarwal (2003)   (Correct)

Active bibliography (related documents):   More   All
0.7:   Tree-based Incremental Classification for Large Datasets - Yoon, Alsabti, Ranka (1999)   (Correct)
0.5:   SPRINT: A Scalable Parallel Classifier for Data Mining - Shafer, Agrawal, Mehta (1996)   (Correct)
0.2:   Predictive Modeling Based On Classification And Pattern Matching.. - Wang (1999)   (Correct)

Similar documents based on text:   More   All
0.6:   A One-Pass Algorithm for Accurately Estimating Quantiles.. - Alsabti, Ranka, Singh (1997)   (Correct)
0.5:   An Efficient Parallel Algorithm for High Dimensional.. - Alsabti, Ranka, Singh (1997)   (Correct)
0.4:   An Efficient K-Means Clustering Algorithm - Alsabti, Ranka, Singh (1998)   (Correct)

Related documents from co-citation:   More   All
8:   SPRINT: A scalable parallel classifier for data mining - Shafer, Agrawal et al. - 1996
7:   SLIQ: A fast scalable classifier for data mining - Mehta, Agrawal et al. - 1996
5:   Programs for machine learning (context) - Quinlan - 1993

BibTeX entry:   (Update)

K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A decision tree classifier for large datasets. In 4th Intl. Conf. on Knowledge Discovery and Data Mining, Aug 1998. http://citeseer.ist.psu.edu/alsabti98clouds.html   More

@inproceedings{ alsabti98clouds,
    author = "Khaled Alsabti and Sanjay Ranka and Vineet Singh",
    title = "{CLOUDS}: A Decision Tree Classifier for Large Datasets",
    booktitle = "Knowledge Discovery and Data Mining",
    pages = "2-8",
    year = "1998",
    url = "citeseer.ist.psu.edu/alsabti98clouds.html" }
Citations (may not include all citations):
1051   Optimization and Machine Learning (context) - Goldberg, in - 1989
667   UCI Repository of Machine Learning Databases (context) - Murphy, Aha - 1994
417   Stochastic Complexity in Statistical Inquiry (context) - Rissanen - 1989
281   Programs for Machine Learning (context) - Quinlan - 1993
121   Classication and Regression Trees (context) - Breiman, Friedman et al. - 1984
85   ChiMerge: Discretization of Numeric Attributes (context) - Kerber - 1992
62   Megainduction: Machine Learning on Very Large Databases (context) - Catlett - 1991
54   Meta-Learning for Multistrategy and Parallel Learning (context) - Chan, Stolfo - 1993
35   MDL-Based Decision Tree Pruning - Mehta, Rissanen et al. - 1995
22   Neural and Statistical Classication (context) - Michie, Spiegelhalter et al. - 1994
18   A One-Pass Algorithm for Accurately Estimating Quantiles for.. - Alsabti, Ranka et al. - 1997
17   SPRINT: A Scalable Parallel Classier for Data Mining - Shafer, Agrawal et al. - 1996
10   SLIQ: A Fast Scalable Classier for Data Mining (context) - Mehta, Agrawal et al. - 1996
6   IEEE Transactions on Knowledge and Data Engineering (context) - Liu, Setiono et al. - 1997
5   Computer Systems that Learn: Classication and Prediction Met.. (context) - Weiss, Kulikowski - 1991
4   MINI: A Heuristic Algorithm for Generating Minimal Rules fro.. (context) - Hong - 1994
2   Classication Algorithms (context) - James - 1985
2   Introduction to IND Version (context) - Research - 1992
1   Department of Computer Science and Engineering (context) - Srivastava, Singh et al. - 1996



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cise.ufl.edu/~ranka/dm.html):   More
Integer Sorting Algorithms for Coarse-Grained Parallel Machines - Alsabti, Ranka (1997)   (Correct)
Skew-Insensitive Parallel Algorithms for Relational Join - Alsabti, Ranka   (Correct)
An Efficient Parallel Algorithm for High Dimensional.. - Alsabti, Ranka, Singh (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC