See this document in CiteSeerX!

RainForest A Framework for Fast Decision Tree Construction of Large Datasets (1998)  (Make Corrections)  (31 citations)
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Data Mining and Knowledge Discovery



  Home/Search   Context   Related

 
View or download:
toronto.edu/~miller/csc2525...grg98.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  toronto.edu/~miller/csc...reading (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This... (Update)

Cited by:   More
CrossMine: Efficient Classification Across Multiple Database .. - Xiaoxin Yin Uiuc (2004)   (Correct)
Mining Data Streams Using Option Trees - Holmes, Kirkby, Pfahringer (2004)   (Correct)
Trends in Data Mining and Knowledge Discovery - Kurgan (2005)   (Correct)

Similar documents (at the sentence level):
68.1%:   RainForest - A Framework for Fast Decision Tree.. - Gehrke, Ramakrishnan.. (1998)   (Correct)

Active bibliography (related documents):   More   All
0.9:   Automated Modeling and Nonlinear Axis Scaling - Leejay Wu (2005)   (Correct)
0.6:   Constructing Classification Trees with Exception Annotations for.. - Li (1999)   (Correct)
0.3:   Feature Selection for High-Dimensional Data: A.. - Biesiada, Duch   (Correct)

Similar documents based on text:   More   All
0.7:   DEMON: Mining and Monitoring Evolving Data - Ganti, Gehrke, Ramakrishnan (2000)   (Correct)
0.4:   Clustering Large Datasets in Arbitrary Metric Spaces - Venkatesh Ganti Raghu (1999)   (Correct)
0.3:   ICICLES: Self-tuning Samples for Approximate Query Answering - Ganti, Lee, Ramakrishnan (2000)   (Correct)

Related documents from co-citation:   More   All
21:   SPRINT: A scalable parallel classifier for data mining - Shafer, Agrawal et al. - 1996
17:   SLIQ: A fast scalable classifier for data mining - Mehta, Agrawal et al. - 1996
15:   Classification and Regression Trees (context) - Breiman, Friedman et al. - 1984

BibTeX entry:   (Update)

J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest - A framework for fast decision tree construction of large datasets. VLDB 1996. http://citeseer.ist.psu.edu/gehrke98rainforest.html   More

@article{ gehrke00rainforest,
    author = "Johannes Gehrke and Raghu Ramakrishnan and Venkatesh Ganti",
    title = "RainForest - A Framework for Fast Decision Tree Construction of Large Datasets",
    journal = "Data Mining and Knowledge Discovery",
    volume = "4",
    number = "2/3",
    pages = "127-162",
    year = "2000",
    url = "citeseer.ist.psu.edu/gehrke98rainforest.html" }
Citations (may not include all citations):
2177   Programs for Machine Learning (context) - Quinlan - 1993
1359   Induction of decision trees (context) - Quinlan - 1986
1262   Classification and Regression Trees (context) - Breiman, Friedman et al.
572   Computer and Intractability (context) - Garey, Johnson - 1979
417   Stochastic Complexity in Statistical Inquiry (context) - Rissanen - 1989
200   Neural and Statistical Classification (context) - Michie, Spiegelhalter et al. - 1994
171   Supervised and unsupervised discretization of continous feat.. - Dougherty, Kahove et al. - 1995
163   Learning efficient classification procedures (context) - Quinlan - 1983
145   SPRINT: A scalable parallel classifier for data mining - Shafer, Agrawal et al. - 1996
112   Split selection methods for classification trees - Loh, Shih - 1997
111   SLIQ: A fast scalable classifier for data mining - Mehta, Agrawal et al. - 1996
102   Computer Systems that Learn: Classification and Prediction M.. (context) - Weiss, Kulikowski - 1991
100   Database mining: A performance perspective - Agrawal, Imielinski et al. - 1993
95   An interval classifier for database mining applications - Agrawal, Ghosh et al. - 1992
62   Megainduction: Machine Learning on Very Large Databases (context) - Catlett - 1991
61   Construction and Assessment of Classification Rules (context) - Hand - 1997
60   Sampling-based estimation of the number of distinct values o.. (context) - Haas, Naughton et al. - 1995
54   Meta-learning for multistrategy and parallel learning (context) - Chan, Stolfo - 1993
45   Experiments on multistrategy learning by meta-learning - Chan, Stolfo - 1993
36   Discovering rules by induction from large collections of exa.. (context) - Quinlan - 1979
35   MDL-based decision tree pruning - Mehta, Rissanen et al. - 1995
29   the induction of decision trees for multiple concept learnin.. (context) - Fayyad - 1991
27   Multivariate versus univariate decision trees - Brodley, Utgoff - 1992
25   PUBLIC: A Decision Tree Classifier that Integrates Pruning a.. - Rastogi, Shim - 1998
21   Efficient agnostic pac-learning with simple hypothesis - Maass - 1994
21   Improved decision trees: A generalized version of ID (context) - Cheng, Fayyad et al. - 1988
18   Tree-structured classification via generalized disriminant a.. (context) - Loh, Vanichsetakul - 1988
14   Algorithms for Mining Association Rules for Binary Segmentat.. (context) - Morimoto, Fukuda et al. - 1998
13   Constructing efficient decision trees by using optimized num.. - Fukuda, Morimoto et al. - 1996
10   Symbolic and neural learning algorithms: an empirical compar.. (context) - Shavlik, Mooney et al. - 1991
9   Approximating the number of unique values of an attribute wi.. (context) - Astrahan, Schkolnick et al. - 1987
8   decision tree induction and discriminant analysis: an empiri.. (context) - Curram, Mingers - 1994
5   An empirical comparison of decision trees and other classifi.. - Lim, Loh et al. - 1997
3   A comparison of decision classifiers with backpropagation ne.. (context) - Corruble, Brown et al. - 1993
3   Multi-interval discretization of continous-valued attributes.. (context) - Fayyad, Irani - 1993
2   The CHAID approach to segmentation modeling (context) - Magidson - 1993



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.toronto.edu/~miller/csc2525/papers/reading.html):   More
Query Flocks: A Generalization of Association-Rule Mining - Tsur, Ullman.. (1998)   (Correct)
Scaling EM (Expectation-Maximization) Clustering to Large.. - Bradley, Fayyad, Reina (1999)   (Correct)
STING+: An Approach to Active Spatial Data Mining - Wang, Yang, Muntz (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC