| M.J. Zaki, C.-T. Ho, and R. Agrawal, "Parallel Classification for Data Mining on Shared-Memory Multiprocessors," Proc. IEEE Int'l Conf. Data Eng., pp. 198-205, May 1999. |
.... machines, including association mining [1, 11, 12, 29] k means clustering [9] and decision tree classifiers [3, 10, 15, 24, 26] Recent efforts have also focused on shared memory parallelization of data mining algorithms, including association mining [28, 19, 20] and decision tree construction [27]. Our work is significantly different, because we offer an interface and runtime support to parallelize a number of data mining algorithms. Our shared memory parallelization techniques are also significantly different, because we focus on a common framework for parallelization of a number of ....
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. IEEE International Conference on Data Engineering, pages 198--205, May 1999.
.... have been recently considered for tasks such as association rules and classification, see, for example, Agrawal and Shafer1996] Chattratichat et al..1997] Cheung and Xiao1999] Han et al..1997] Joshi et al..1998] Kargupta et al..1997] Shafer et al..1996] Srivastava et al..1998] [Zaki et al..1998], and [Zaki et al..1997] Also, see [Stolorz and Musick1997] and [Freitas and Lavington1998] for recent books on scalable and parallel data mining. In this paper, we consider parallel clustering. Clustering or grouping of similar objects is one of the most widely used procedures in data mining ....
M. J. Zaki, C. T. Ho, and R. Agrawal. Parallel classification for data mining on sharedmemory multiprocessors. Technical report, IBM Almaden Research Center, 1998.
....classification model [14, 16] Because of large memory and computation requirements, parallelization is a viable approach for handling large datasets. Several parallel decision tree algorithms have been proposed for distributed memory [6, 8, 14, 16, 19, 18, 21] and sharedmemory parallel machines [11, 14, 22, 23]. Clusters of SMPs, in which shared memory nodes with small number of processors (e.g. 2 8 processors) are connected together with a high speed interconnect, have become a popular alternative to distributed memory and shared memory machines. An increasing number of SMP clusters are being ....
....can be employed. In this paper, we present a parallel decision tree algorithm designed for an implementation on a cluster of SMPs. The proposed algorithm employs a hybrid approach, based on the SPRINT distributed memory algorithm [16] and the BASIC sharedmemory algorithm [23]. The training dataset is statically partitioned across the SMP nodes so that each SMP node carries out tree construction using a subset of the records in the dataset. Within each SMP node, on the other hand, tasks associated with an attribute are dynamically scheduled to the light weight threads ....
[Article contains additional citation context not shown here]
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. In IEEE International Conference on Data Engineering, pages 198-- 205, Sydney, Australia, Mar 1999. 8
.... rules and classification, see, for example, Agrawal and Shafer [1] Chattratichat et al. 6] Cheung and Xiao [8] Han, Karypis, and Kumar [22] Joshi, Karypis, and Kumar [24] Kargupta, Hamzaoglu, and Stafford [25] Shafer, Agrawal, and Mehta [32] Srivastava, et al. 38] Zaki, Ho, and Agrawal [41], and Zaki et al. 42] Also, see Stolorz and Musick [39] and Freitas and Lavington [17] for recent books on scalable and parallel data mining. In this paper, we consider parallel clustering. Clustering or grouping of similar objects [23] is one of the most widely used procedures in data mining ....
M. J. Zaki, C. T. Ho, and R. Agrawal. Parallel classification for data mining on sharedmemory multiprocessors. Technical report, IBM Almaden Research Center, 1998.
....The same program ran for an input size of 100K in 155 seconds on a 80MHz PowerPC processor 5 . Note that these numbers are competitive with previous decision tree building implementations [33, 22] on data generated by a function slightly simpler than pred7 6 . For example, Zaki et al. [39] report a running time of around 1750 seconds for 1000K input instances of a synthetic dataset generated using the simpler function on a 112MHz PowerPC. Recall that we switch to serial recursion in our code once the number of examples falls below 4000. This ensures that thread overheads are under ....
....parallel version of SPRINT for distributed memory machines [33] processors synchronously construct each node of the tree. Each of the P processors begins with 1=P of the data instances, but the partitioning can soon become highly unbalanced. Explicit load balancing involves significant complexity [39]. Further, SPRINT uses per processor hash tables, and these hash tables are large near the top of the tree; therefore, the memory requirement of SPRINT increases with the number of processors. Parallel algorithms based on SPRINT have also been proposed for shared memory SMPs [39] One of these ....
[Article contains additional citation context not shown here]
M. J. Zaki, C. T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. Technical report, IBM Research Report, 1998.
No context found.
M.J. Zaki, C.-T. Ho, and R. Agrawal, "Parallel Classification for Data Mining on Shared-Memory Multiprocessors," Proc. IEEE Int'l Conf. Data Eng., pp. 198-205, May 1999.
No context found.
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. IEEE International Conference on Data Engineering, pages 198--205, May 1999.
No context found.
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. IEEE International Conference on Data Engineering, pages 198--205, May 1999.
....with. Fortunately, novel applications of parallel computing techniques should assist in solving these large problems in a timely fashion. Parallel KDD (PKDD) techniques are not currently that common, though recent algorithmic advances seek to address these problems (Freitas and Lavington 1998; Zaki 1999; Zaki and Ho 2000; Kargupta and Chan 2000) However, there has been no work in designing and implementing large scale parallel KDD systems, which must not only support the mining algorithms, but also the entire KDD process, including the pre processing and post processing steps (in fact, it has ....
.... the bene ts of parallelism for many of the common data mining tasks including association rules (Agrawal and Shafer 1996; Cheung et al. 1996; Han et al. 1997; Zaki et al. 1997) sequential patterns (Shintani and Kitsuregawa 1998; Zak i 2000) classi cation (Shafer et al. 1996; Joshi et al. 1998; Zaki et al. 1999; Sreenivas et al. 1999) regression (Williams et al. 2000) and clustering (Judd et al. 1996; Dhillon and Modha 2000; S. Goil and Choudhary 1999) The typical trend in parallel mining is to start with a sequential method and pose various parallel formulations, implement them, and conduct a ....
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel classification for data mining on shared-memory multiprocessors. In Int'l Conf. on Data Engineering, March 1999.
....specifically SPRINT, is built on a uniprocessor machine. Section 3 describes our new SMP algorithms based on various data and task parallelization schemes. We give experimental results in Section 4 and conclude with a summary in Section 5. A more detailed version of this paper appears in [14]. 2 Serial Classification Each node in a decision tree classifier is either a leaf, indicating a class, or a decision node, specifying some test on one or more attributes, with one branch or subtree for each of the possible outcomes of the split test. Decision trees successively divide the set ....
M. Zaki, C-T. Ho, R. Agrawal. Parallel Classification for Data Mining on Shared-Memory Multiprocessors. IBM Technical Report, 1998. Available from www.almaden.ibm.com/cs/quest/publications.html.
No context found.
Zaki M. J., Ho C.T., Agrawal R.: "Parallel classification for data mining on shared-memory multiprocessors " Technical report, IBM Almaden Research Center (1998)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC