| D. W.-L. Cheung, V. Ng, A. W.-C. Fu, and Y. Fu, "E#cient mining of association rules in distributed databases," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 911--922, Dec. 1996. |
....support the rule, and the total count of all transactions at the site. From this, we can compute the global support of each rule, and (from the lemma) be certain that all rules with support at least k have been found. More thorough studies of distributed association rule mining can be found in [2, 3]. The above approach protects individual data privacy, but it does require that each site disclose what rules it supports, and how much it supports each potential global rule. What if this information is sensitive For example, suppose the Centers for Disease Control (CDC) a public agency, would ....
D. W.-L. Cheung, V. Ng, A. W.-C. Fu, and Y. Fu, "E#cient mining of association rules in distributed databases," IEEE Trans. Knowledge Data Eng., vol. 8, no. 6, pp. 911--922, Dec. 1996. [Online]. Available: http://ieeexplore.ieee.org
....local ; G F 2 ; P rocessors i ) Algorithm 16 Redistribute DB 1: for all X 2 T local do 3: P i Parts[i] 4: Xi X P i 5: if jX i j 2 then 6: D(pid; i) D(pid; i) X i 4.4. 3 Distributed Graph for G F 2 and Local Pruning Local pruning which is proposed in [18] can reduce the number of candidate patterns and communication volume dramatically in Candidate Distribution based parallel frequency mining algorithms. It has been used in the FDM algorithm for designing an algorithm suitable for a distributed system. The main observation in local pruning may be ....
D. W. Cheung, V. T. Ng, A. W. Fu, and Y. J. Fu. Ecient mining of association rules in distributed databases. IEEE Trans. On Knowledge And Data Engineering, 8:911-922, 1996.
....### ############ Distributed and parallel data mining systems are more often experimental systems rather than commercial products. Due to this most of them are focused only on a particular data mining problem. Papyrus and JAM are distributed classi ers. Other problems like association calculation [1, 3, 7] or sequential patterns [12, 13] have also distributed algorithms. Each of this algorithms deals with di erent factors of: i) data distribution (horizontal or vertical) ii) communication schemas (broadcast or unicast) iii) synchronization and coordination (collaborative, centralized control, ....
D. W. Cheung, V. T. Ng, A. W. Fu, and Y. J. Fu. Ecient mining of association rules in distributed databases. ########## ## ######### ### #### ###########, 8:911-922, December 1996.
....for constructing and searching hash trees, please refer to [14] note: the second iteration is optimized to directly use arrays for counting the support of 2 sequences, instead of using hash trees) 6 ZAKI 3.2. Parallel Algorithms While parallel association mining has attracted wide attention [1, 3, 4, 10, 12, 18, 19], there has been relatively less work on parallel mining of sequential patterns. Three parallel algorithms based on GSP were presented in [13] All three approaches partition the datasets into equal sized blocks among the nodes. In NPSPM, the candidate sequences are replicated on all the ....
D. Cheung, V. Ng, A. Fu, and Y. Fu. Ecient mining of association rules in distributed databases. In IEEE Trans. on Knowledge and Data Engg., pages 8(6):911-922, 1996.
....and prediction, university enrollments, etc. 2 Parthasarathy et al. Due to the huge size of data and amount of computation involved in data mining, parallel computing is a crucial component for any successful large scale data mining application. Past research on parallel association mining [34, 3, 14, 20, 49] has been focussed on distributed memory (also called shared nothing) parallel machines. In such a machine, each processor has private memory and local disks, and communicates with other processors only via passing messages. Parallel distributed memory machines are essential for scalable massive ....
....local database portion is still scanned in every iteration. Count Distribution was shown to have superior performance among these three algorithms [3] Other parallel algorithms improving upon these ideas in terms of communication e ciency, or aggregate memory utilization have also been proposed [14, 12, 20]. The PDM algorithm [34] presents a parallelization of the DHP algorithm [33] The hash based parallel algorithms NPA, SPA, HPA, and HPA ELD, proposed in [37] are similar to those in [3] Essentially NPA corresponds to Count Distribution, SPA to Data Distribution, and HPA to Candidate ....
D. Cheung, V. Ng, A. Fu, and Y. Fu. Ecient mining of association rules in distributed databases. In IEEE Trans. on Knowledge and Data Engg., pages 8(6):911-922, 1996.
....their statistical representations. A k means clustering algorithm for distributed environment was reported in [10] This algorithm notes the inherent data parallelism in the k means algorithm and asymptotically approaches near optimal performance. 3 The Fast Distributed Mining (FDM) algorithm [8] can be used for mining association rules from distributed, homogeneous data sets. FDM notes that in a distributed environment, every globally large itemset must be locally large at one or more sites. It explores this relationship between locally and globally large itemsets in order to minimize ....
D. Cheung, V. Ng, A. Fu, and Y. Fu. Ecient mining of association rules in distributed databases. IEEE Transaction on Knowledge and Data Engineering, 8(6):911-922, 1996.
No context found.
D. W.-L. Cheung, V. Ng, A. W.-C. Fu, and Y. Fu, "E#cient mining of association rules in distributed databases," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 911--922, Dec. 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC