Download:
|
by David W. Cheung, S. D. Lee, Yongqiao Xiao
http://www.csis.hku.hk/~sdlee/publications/TKDE02.ps.gz
Add To MetaCart
Abstract:
To mine association rules e#ciently, we have developed a new parallel mining algorithm FPM on a distributed share-nothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we proposed previously for distributed mining of association rules [8]. FPM requires fewer rounds of message exchanges than FDM and hence has a better response time in a parallel environment. The algorithm has been experimentally found to outperform CD, a representative parallel algorithm for the same goal [2]. The e#ciency of FPM is attributed to the incorporation of two powerful candidate sets pruning techniques: distributed and global prunings. The two techniques are sensitive to two data distribution characteristics, data skewness and workload balance. Metrics based on entropy are proposed for these two characteristics. The prunings are very e#ective when both the skewness and balance are high. In order to increase the e#ciency of FPM, we have developed methods to partition a database so that the resulting partitions have high balance and skewness. Experiments have shown empirically that our partitioning algorithms can achieve these aims very well, in particular, the results are consistently better than a random partitioning. Moreover, the partitioning algorithms incur little overhead. So, using our partitioning algorithms and FPM together, we can mine association rules from a database e#ciently.
Citations
|
4592
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
1673
|
R.: Fast algorithms for mining association rules
– Agrawal, Srikant
|
|
1505
|
Mining Association rules between sets of items in large databases
– Agrawal, Imielinski, et al.
- 1993
|
|
702
|
Finding Groups in Data: an Introduction to Cluster Analysis
– Kaufman, Rousseuw
- 1990
|
|
365
|
Mining Generalized Association Rules
– Srikant
|
|
350
|
Dynamic itemset counting and implication rules for market basket data
– Brin, Motwani, et al.
- 1997
|
|
314
|
Mining sequential patterns: Generalizations and performance improvements
– Srikant, Agrawal
- 1996
|
|
303
|
Discovery of Multiple-Level Association Rules from Large Databases
– Han, Fu
- 1995
|
|
296
|
An efficient algorithm for mining association rules in large databases
– Savasere, Omiecinski, et al.
- 1995
|
|
263
|
Mining quantitative association rules in large relational tables
– Srikant, Agrawal
- 1996
|
|
182
|
An effective hash-based algorithm for mining association rules
– Park, Chen, et al.
- 1995
|
|
158
|
Efficient algorithms for discovering association rules
– Mannila, Toivonen, et al.
- 1994
|
|
151
|
Maintenance of discovered association rules ยก in large databases: An incremental updating technique
– Cheung, Han, et al.
- 1996
|
|
131
|
Scalable parallel data mining for association rules
– Han, Karypis, et al.
- 1997
|
|
74
|
A fast distributed algorithm for mining association rules
– Cheung, Ng, et al.
- 1996
|
|
63
|
Efficient Mining of Association Rules in Distributed Databases
– Cheung, Ng, et al.
- 1996
|
|
55
|
Parallel mining of association rules: Design, implementation, and experience
– Agrawal, Shafer
- 1996
|
|
54
|
Parallel data mining for association rules on shared-memory systems
– Parthasarathy, Zaki, et al.
- 2001
|
|
49
|
A.: Set-oriented Mining for Association Rules in Relational Databases
– Houtsma, Swami
- 1995
|
|
29
|
Hash based parallel algorithms for mining association rules
– Shintani, Kitsuregawa
- 1996
|
|
10
|
Efficient parallel mining for association rules
– Park, Chen, et al.
- 1995
|
|
1
|
Linear Programming and Netword Models", A#liated East-West Press Private Limited
– Gupta
- 1994
|
|
1
|
Sampling large databases for mining association rules
– Toivonen
|