| Khaled Alsabti, Sanjay Ranka, and Vineet Singh. CLOUDS: A decision tree classifier for large datasets. In Knowledge Discovery and Data Mining, pages 2--8, 1998. |
..... 5.00. the feature values are known while the class value is unknown. The training model is used in order to predict the class variable for such test instances. The classification problem has been widely studied by the database, data mining and machine learning communities [1, 4, 7, 10, 11, 12, 14, 15, 16]. However, most such methods have been developed for general multi dimensional records. For a particular data domain such as strings or text [1, 17] classification models specific to these domains turn out to be most e#ective. In recent years, XML has become a popular way of storing many data ....
....bound is the position (l) of node n l , and the upper bound is the position (r) of node nr . Figure 1 shows a database of 3 trees, with 2 classes; for each tree it shows the node number n i , node scope [l, u] and node label (inside the circle) 1 2 3 2 n1, 1,3] n2, 2,2] n3, 3, 3] n4, [4, 4] n5, 5,5] 2 3 4 1 2 3 4 0, 0, 3] 1, 1, 3] 2, 0, 7] 2, 4, 7] 0, 1, 1] 1, 0, 5] 1, 2, 2] 1, 4, 4] 2, 2, 2] 2, 5, 5] 0, 2, 3] 1, 5, 5] 2, 1, 2] 2, 6, 7] 0, 3, 3] 1, 3, 3] 2, 7, 7] 1 2 1 4 0, 0, 1, 1] 1, 1, 2, 2] 2, 0, 2, 2] 2, 0, 5, 5] 2, 4, 5, 5] ....
[Article contains additional citation context not shown here]
K. Alsabti, S. Ranka, V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. SIGKDD, 1998.
....other methods is given in Section 3. The rest of the paper is organized as follows. In the next subsection we provide a simple description of decision tree classifiers. In Section 1.1. we introduce and compare several state of the art classification algorithms, including SPRINT[17] and CLOUDS[14]. We present our CMP family classifiers in detail in Section 2. including the data structure, techniques to avoid accuracy loss, the prediction method and multivariate splitting criteria. A comparison of performance in various aspects is given in Section 3. Section 4. concludes our work. 1.1. ....
....v l ) #x i = n l (n n 1 (n x i ) 4) is estimated using the value of gradient in a hill climbing manner, that is, at each point starting from the left boundary of the interval, we always choose class i which makes Formula 4 minimal. It is proved in [14] that we do not need to evaluate Formula 4 at each point in the interval. As a matter of fact, if class i is chosen at the left boundary, then the next point at which we need to evaluate the gradient is the c i point, where c i is the number of points of class i in that interval. Then, if class j ....
Khaled Alsabti, Sanjay Ranka, Vineet Singh. "CLOUDS: A Decision Tree Classifier for Large Datasets." KDD 1998: 2-8
....staging. Whereas the RainForest framework does not address SQL databases, the middleware is implemented on a commercial DBMS. Our COMPUTENODESTATICTICS is derived directly from this both approaches. Other approaches consider approximation techniques for scaling up the classification, e.g. sampling [1] and discretization, as well as permitting the user to specify constraints on tree size [7] Particularly, approximation techniques could be supported by the database systems very well and thus could lead to further primitives. The integration of data mining and database systems resulting in ....
K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proc. KDD-98, New York City, New York, 1998.
....node: finding the best split point and performing the split. 2.1 Finding the Best Split Point The best split point at a tree node is the one that best separates the class labels in the portion of the training set associated with that tree node. Many classification algorithms use the Gini index [2, 16] to calculate the goodness value of a split point. At a leaf node, the algorithm scans the portion of the attribute lists associated with the tree node and finds a split point for each attribute list that minimizes the Gini index for that attribute. After scanning all the attribute lists, the ....
K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A decision tree classifier for large datasets. In The 4th International Conferenceon Knowledge Discovery and Data Mining,1998.
....other methods is given in Section 3. The rest of the paper is organized as follows. In the next subsection we provide a simple description of decision tree classifiers. In Section 1.1. we introduce and compare several state of the art classification algorithms, including SPRINT[17] and CLOUDS[14]. We present our CMP family classifiers in detail in Section 2. including the data structure, techniques to avoid accuracy loss, the prediction method and multivariate splitting criteria. A comparison of performance in various aspects is given in Section 3. Section 4. concludes our work. 1.1. ....
....(c i x i ) 2 1 n 2 l c # i=1 x 2 i ) 4) The lower bound gini est is estimated using the value of gradient in a hill climbing manner, that is, at each point starting from the left boundary of the interval, we always choose class i which makes Formula 4 minimal. It is proved in [14] that we do not need to evaluate Formula 4 at each point in the interval. As a matter of fact, if class i is chosen at the left boundary, then the next point at which we need to evaluate the gradient is the c th i point, where c i is the number of points of class i in that interval. Then, if ....
[Article contains additional citation context not shown here]
Khaled Alsabti, Sanjay Ranka, Vineet Singh. "CLOUDS: A Decision Tree Classifier for Large Datasets." KDD 1998: 2-8
....An object is classified by selecting the best rule according to user defined accuracy and statistical significance criteria. Later examples of classification techniques from the literature include Zhang and Michalski s FCLS [71] Gur Ali and Wallace s PrIL [24] Mehta et al. SLIQ [42] and CLOUDS [7]. FCLS induces a weighted threshold rule. The threshold determines the number of conditions which must be satisfied in a valid rule. An object is classified by generalizing and specializing examples until the number of incorrectly classified examples is below some user defined error rate. PrIL ....
K. Alsabti, S. Ranka, and V. Singh. Clouds: a decision tree classifier for large datasets. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), pages 2--8, New York, New York, August 1998.
....massive training sets usual in data mining. Developing classification models using larger training sets can enable the development of higher accuracy models. Various studies have confirmed this [5, 6] Recent classifiers that can handle disk resident data include SLIQ [9] SPRINT [12] and CLOUDS [3]. As data continue to grow in size and complexity, highperformance scalable data mining tools must necessarily rely on parallel computing techniques. Past research on parallel classification has been focussed on distributed memory (also called shared nothing) machines. Examples include parallel ....
K. Alsabti, S. Ranka, and V. Singh. CLOUDS: A decision tree classifier for large datasets. In 4th Intl. Conf. on Knowledge Discovery and Data Mining, Aug 1998.
No context found.
Khaled Alsabti, Sanjay Ranka, and Vineet Singh. CLOUDS: A decision tree classifier for large datasets. In Knowledge Discovery and Data Mining, pages 2--8, 1998.
No context found.
K. Alsabti, S. Ranka and V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pages 2-8, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC