| Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Efficient parallel data mining for association rules. In CIKM '95: Proc. of the fourth international conference on Information and knowledge management, pages 31--36, New York, NY, USA, 1995. ACM Press. |
....generated (l 1) length frequent itemsets. The GSP [27] algorithm extended the Apriori like level wise mining method to find frequent patterns in sequential datasets. The basic level wise algorithm has been extended in a number of different ways leading to more efficient algorithms such as DHP [19, 18], Partition [22] SEAR and Spear [16] and DIC [4] An entirely different approach for finding frequent itemsets and sequences are the equivalence class based algorithms Eclat [32] and SPADE [31] that break the large search space of frequent patterns into small and independent chunks and use a ....
J.S. Park, M.S. Chen, and P.S. Yu. Efficient parallel data mining for association rules. In Proceedings of the 4th Int'l Conf. on Information and Knowledge Management, 1995.
....a substantial amount of time. A number of efficient and scalable parallel formulations have been developed for finding frequent itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 22, 16, 4] both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20]. However, the problem of parallelizing equivalence class based and projection based algorithms has received relatively little attention and existing parallel formulations for them have been targeted only toward shared memory architectures [30, 28] However, the irregular and unstructured nature ....
....original database. The basic ideas in this algorithm were recently used to develop a similar algorithm for finding sequential patterns [21] A number of parallel frequent itemset discovery algorithms have been developed that focus on parallelizing the various serial algorithms for that problem [2, 22, 17, 8, 25, 29, 20, 28]. Depending on the nature of the underlying serial algorithm, many of these approaches follow the same parallelization strategy. Providing exact details on all of these algorithms is beyond the scope of this section, and for this reason we only focus on the various issues involved in parallelizing ....
[Article contains additional citation context not shown here]
J. Park, M. Chen, and P. Yu. An efficient parallel data mining for association rules. In Proceedings of the 4th International Conference on Information and Knowledge Management, 1995.
....generated (l D length frequent itemsets. The GSP [20] algorithm extended the Apriori like level wise mining method to find frequent patterns in sequential databases. The basic level wise algorithm has been extended in a number of different ways leading to more efficient algorithms such as DHP [14, 13], Partition [19] SEAR and Spear [12] and DIC [5] An entirely different approach for finding frequent itemsets and sequences are the equivalence class based algorithms Eclat [26] and SPADE [24] that break the large search space of frequent patterns into small and independent chunks and use a ....
J.S. Park, M.S. Chen, and P.S. Yu. Efficient parallel data mining for association rules. In Proceedings of the 4th Int'l Conf on Information and Knowledge Management, 1995.
....generated (l 1) length frequent itemsets. The GSP [27] algorithm extended the Apriori like level wise mining method to find frequent patterns in sequential datasets. The basic level wise algorithm has been extended in a number of different ways leading to more efficient algorithms such as DHP [19, 18], Partition [22] SEAR and Spear [16] and DIC [4] An entirely different approach for finding frequent itemsets and sequences are the equivalence class based algorithms Eclat [32] and SPADE [31] that break the large search space of frequent patterns into small and independent chunks and use a ....
J.S. Park, M.S. Chen, and P.S. Yu. Efficient parallel data mining for association rules. In Proceedings of the 4th Int'l Conf on Information and Knowledge Management, 1995.
....a substantial amount of time. A number of efficient and scalable parallel formulations have been developed for finding frequent itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 22, 16, 4] both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20]. However, the problem of parallelizing equivalence class based and projection based algorithms has received relatively litfie attention and existing parallel formulations for them have been targeted only toward shared memory architectures [30, 28] However, the irregular and unstructured nature ....
....original database. The basic ideas in this algorithm were recently used to develop a similar algorithm for finding sequential patterns [21] A number of parallel frequent itemset discovery algorithms have been developed that focus on parallelizing the various serial algorithms for that problem [2, 22, 17, 8, 25, 29, 20, 28]. Depending on the nature of the underlying serial algorithm, many of these approaches follow the same parallelization strategy. Providing exact details on all of these algorithms is beyond the scope of this section, and for this reason we only focus on the various issues involved in parallelizing ....
[Article contains additional citation context not shown here]
J. Park, M. Chen, and P. Yu. An efficient parallel data mining for association rules. In Proceedings of the 4th International Conference on Information and Knowledge Management, 1995.
....generated (l 1) length frequent itemsets. The GSP [20] algorithm extended the Apriori like level wise mining method to find frequent patterns in sequential databases. The basic level wise algorithm has been extended in a number of different ways leading to more efficient algorithms such as DHP [14, 13], Partition [19] SEAR and Spear [12] and DIC [5] An entirely different approach for finding frequent itemsets and sequences are the equivalence class based algorithms Eclat [26] and SPADE [24] that break the large search space of frequent patterns into small and independent chunks and use a ....
J.S. Park, M.S. Chen, and P.S. Yu. Efficient parallel data mining for association rules. In Proceedings of the 4th Int'l Conf. on Information and Knowledge Management, 1995.
....per node decrease, the producer thread consumes fewer cycles, resulting in more substantial performance gains with the use of the fourth consumer thread. 5 Related Work We now compare our work with related research efforts. Parallelization of association mining techniques is a well studied area [1, 10, 13, 14, 19, 20, 21, 24, 31, 29]. Our work is unique in considering hierarchical parallelism on disk resident datasets, and using a middleware to implement the algorithm without low level parallel programming. Our method for shared memory parallelization is significantly different from the existing parallel association mining ....
J. S. Park, M. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In ACM Intl. Conf. Infomration and Knowledge Management, November 1995.
....common sense knowledge; however, there could be a lot of associations that may not be able to be deduced from common knowledge [11] As the database size becomes larger and larger, a better way is to mine association rules in parallel. Some parallel association rule mining algorithms are proposed [3, 8], Count Distribution and Data Distribution are two of them. In Count Distribution algorithm, local counts are calculated by individual processors first. To calculate the global counts, each processor needs to communicate with every other processor. In Data Distribution algorithm, each processor ....
J. S. Park, M.-S. Chen, and P. S. Yu, "Efficient Parallel Data Mining for Association Rules," Proc. of Int'l Conf. of Information and Knowledge Management, Baltimore, Nov. 1995.
....data mining enables us to find out useful and invaluable information from huge databases. Mining of association rules is a research topic that has received much attention among the various data mining problems. Many interesting works have been published recently on this problem and its variations [10, 1, 4, 3, 6, 7, 8, 9, 11]. The retail industry provides a classic example application. Typically, a sales database of a supermarket stores, for each transaction, all the items that are bought in that transaction, together with other information such as the transaction time, customer id, etc. The association rule mining ....
J. S. Park, M. S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proc. 1995 International Conference on Information and Knowledge Management, Baltimore, MD, November 1995.
....association rules has received a great deal of attention in the past several years; and a number [2, 6, 9, 11] of efficient algorithms have been proposed to approach this problem. As the size of a database to be mined can be very large, parallel computation techniques have also been explored [1, 10]. Consider that in a distributed organization, the database may be allocated through a computer network. This leads to a real demand for developing distributed computation techniques in data mining. In this paper, we shall restrict ourself to an investigation of mining association rules in a ....
....for developing distributed computation techniques in data mining. In this paper, we shall restrict ourself to an investigation of mining association rules in a distributed database. In [9] an efficient distributed algorithm is proposed. It should be clear that the parallel algorithms developed in [1, 10] can be immediately used as distributed algorithms. Comparing the distributed algorithm DMA [4] with an implementation of the parallel algorithm CD [1] in a distributed environment, DMA is more efficient than CD because a reduction of both network communication and local processing costs has been ....
[Article contains additional citation context not shown here]
J.S. Park, M.S. Chen, and P.S. Yu, Efficient Parallel Data Mining for Association Rules, Proc. Int'l Conf. Information and Knowledge Management, 1995.
....at most two passes over the database and uses a probabilistic sample (subset) of transactions to produce association rules which are then verified to hold for all transactions. There have been some efforts to use parallel algorithms for finding frequent itemsets [AS96, CHN 96, CNFF96, HKK97, PCY95b] 2.2 Discovering Association Rules Only few proposals can be found in the literature that deal with the creation of association rules because the performance of the itemset generation step dominates the performance of the overall algorithm. The assumption is that the size of transaction data ....
Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Efficient Parallel Data Mining for Association Rules. In Proceedings of the 1995 International Conference on Information and Knowledge Management, Baltimore, Maryland, USA, November 1995.
....HPA(Hash Partitioned Apriori) 4] Association rules are the rules about what items are bought together within a transaction. HPA not only partitions the transaction database but the candidate itemsets among the nodes. Several other parallel algorithms for mining association rules are introduced[5 8]. However, most of these algorithms do not partition the candidate itemsets. Among them, Data Distribution[7] IDD(Intelligent Data Distribution) and HD(Hybrid Distribution) 9] partition the candidate sequences. Since these algorithms exchange all the transaction data among the nodes, a large ....
J.S.Park, M.-S.Chen, and P.S.Yu. Efficient parallel data mining for association rules. In Proc. of the 4th Conf. on Information and Knowledge Management, pages 31--36, November 1995.
....typically consist of a transaction identifier and the bought items par transaction. By analyzing transaction data, we can extract the association rule such as 90 of the customers who buy both A and B also buy C . Several algorithms have been proposed to solve the above problem[1] 2] 3] 4] 5][6][7] However most of these are sequential algorithms. Finding association rules requires scanning the transaction database repeatedly. In order to improve the quality of the rule, we have to handle very large amounts of transaction data, which requires incredibly long computation time. In general, ....
....of the rule, we have to handle very large amounts of transaction data, which requires incredibly long computation time. In general, it is difficult for a single processor to provide reasonable response time. In [7] we examined the feasibility of parallelization of association rule mining 1 . In [6], a parallel algorithm called PDM, for mining association rules was proposed. PDM copies the candidate itemsets among all the processors. As we will explain later, in the second pass of the Apriori algorithm, introduced by R.Agrawal and R.Srikant[2] the candidate itemset becomes too large to fit ....
[Article contains additional citation context not shown here]
J.S.Park, M.-S.Chen, and P.S.Yu: "Efficient Parallel Data Mining for Association Rules", In Proc. of the 4th International Conference on Information and Knowledge Management, pp.3136, November 1995.
....enable development of a new breed of decision support applications. Discovering association rules is an important data mining problem [1] Recently, there has been considerable research in designing fast algorithms for this task [1] 3] 5] 6] 8] 12] 9] 11] However, with the exception of [10], the work so far has been concentrated on designing serial algorithms. Since the databases to be mined are often very large (measured in gigabytes and even terabytes) parallel algorithms are required. We present in this paper three parallel algorithms for mining association rules. In order to ....
....in [9] attempts to improve the performance of Apriori by using a hash filter. However, as we will see in Section 4.3, this optimization actually slows down the Apriori algorithm. Concurrent to our work, that algorithm has been parallelized and was recently presented with a simulation study in [10]. It too suffers from the use of a hash filter, despite the use of a special communication operator to build it. We discuss this further in Section 4.3. Our three parallel algorithms have all been implemented on an IBM POWERparallel System SP2 (henceforth referred to simply as SP2) a ....
[Article contains additional citation context not shown here]
Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Efficient parallel data mining for association rules. In Fourth Int'l Conference on Information and Knowledge Management, Baltimore, Maryland, November 1995.
....for discovering large itemsets were presented. Candidate Distribution, Data Distribution, and Count Distribution [AS96] are the parallelized versions of Apriori, and Count Distribution was shown to be superior to the others. DMA [CNFF96] attempted to parallelize the Partition algorithm, and PDM [PCY95b] is a parallelization of DHP . Finally, Par Eclat, Par MaxEclat, Par Clique, and Par MaxClique [ZPOL97b] are the parallel versions of the four algorithms in [ZPOL97a] CHAPTER 2. A SURVEY IN ASSOCIATION RULES 24 2.6 Variations of Association Rules As we pointed out in Section 2.2, the ....
Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Efficient parallel data mining for association rules. In Proceedings of 4 th Intl. Conf. on Information and Knowledge Management (CIKM'95), pages 31--36, Baltimore, Maryland, USA, November 1995.
....itemsets. Experiments on a 16 node IBM SP2 distributed memory machine showed that PEAR always outperformed PPAR. This is because PEAR uses pass bundling, while PPAR might generate unnecessarily many candidates that turn out to be infrequent. 4.1. 2 DHP based The PDM algorithm by Park et al. [7] is based on their DHP [2] algorithm. PDM works as follows. Each processor i generates the local supports of 1 itemsets and approximate counts for the 2 itemsets via a hash table. The global counts for 1 itemsets are obtained by an all to all broadcast of local counts. Since the 2 itemset hash ....
....the algorithms may not be linear in the number of dimensions. New parallel methods are needed that scale with the dimensions. Some possible solutions include methods that only enumerate maximal patterns, those that use a hash based pruning to reduce the candidate itemsets (especially 2 itemsets) [7], or those that use global pruning [11] Large Size Databases continue to increase in size. Current methods are able to handle data in the tens of gigabytes range. It seems that current ARM algorithms will not be suitable for the terabyte range. Even a single scan for these databases is ....
J. S. Park, M. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In ACM Intl. Conf. Information and Knowledge Management, November 1995.
....has wide applicability and a promising future. Mining of association rules is an important research topic among the various data mining problems. Many interesting works have been recently published on this problem and its variants [AIS93, AS94, CHNW96, CHN 96, HF95, HKMT95, KMR 94, PCY95b, PCY95a, SA96] A classical example is about the retail industry or supermarket sales database. Typically, a record in the sales database describes all the items that are bought in a single transaction, together with other information such as the transaction time, customer id, etc. The classical ....
J. S. Park, M. S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proc. 1995 International Conference on Information and Knowledge Management, Baltimore, MD, November 1995.
.... research community has observed that data mining, together with data warehousing and data repositories are three new uses of database technology, which are considered as important areas in database research [20] Many interesting and efficient data mining algorithms have been proposed (e.g. see [2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 15, 16, 17, 19, 21]) These database oriented mining algorithms can be classified The research of the authors were supported in part by RGC (the Hong Kong Research Grants Council) grant 338 065 0026. into two categories: concept generalization based discovery and discovery at the primitive concept levels. The ....
....without concept generalization. Association rule [4, 6, 16] is an important type of rules in the latter approach. Most of the algorithms for mining association rules proposed so far are sequential algorithms. An algorithm PDM has been proposed recently for parallel mining of association rules [17]. It is an adaptation of the DHP algorithm in the parallel environment [16] Another algorithm Count Distribution (CD) which is an adaptation of the Apriori algorithm, has also been proposed for the same parallel mining environment with an implementation on the IBM SP2 [5] To the best of our ....
[Article contains additional citation context not shown here]
J.S. Park, M.S. Chen, and P.S. Yu, "Efficient Parallel Data Mining for Association Rules," Proc. 1995 Int. Conf. on Information and Knowledge Management, Baltimore, MD, Nov 1995.
....to develop and implement customized marketing programs and strategies. Recently, many interesting works have been published in association rules mining including mining of quantitative association rules and multi level association rules, and parallel and distributed mining of association rules [2, 3, 4, 5, 8, 10, 12, 13, 14]. A feature of data mining problems is that in order to have stable and reliable results, a giant amount (often of the order of gigabytes) of data has to be collected and analyzed. The large amount of input data and mining results poses a maintenance problem. While new transactions are being ....
J. S. Park, M. S. Chen, and P. S. Yu, Efficient Parallel Data Mining for Association Rules. In Proc. 1995 Internation Conference on Information and Knowledge Management, Baltimore, MD, Nov 1995.
....in order to generate the processor specific large itemsets may be the bottleneck. Other methods have been proposed such as using a shared hash tree among the multi processors in order to generate the large itemsets[28] More work on the parallelization of the itemset method may be found in [7, 17, 20, 28, 29]. A common feature of most of the algorithms reviewed above and proposed in the literature is that most such research is are variations on the bottom up theme proposed by the Apriori algorithm[4, 5] For databases in which the itemsets may be long, these algorithms may require substantial ....
Park J. S., Chen M. S. and Yu P. S. "Efficient Parallel Data Mining of Association Rules." Fourth International Conference on Information and Knowledge Management, Baltimore, Maryland, November 1995, pages 31-36. Technical Report RC20156, IBM T. J. Watson Research Center, August 1995.
No context found.
Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. Efficient parallel data mining for association rules. In CIKM '95: Proc. of the fourth international conference on Information and knowledge management, pages 31--36, New York, NY, USA, 1995. ACM Press.
No context found.
J. S. Park, M.-S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proc. Fourth Intl. Conf. Information and Knowledge Management, pages 31--36, 1995.
No context found.
J. S. Park, M.-S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proc. of ACM Int'l. Conference on Information and Knowledge Management, pages 31 -- 36, Baltimore, MD, November 1995.
No context found.
J.S. Park, M. Chen, and P.S. Yu, "Efficient Parallel Data Mining for Association Rules," Proc. ACM Int'l Conf. Information and Knowledge Management, ACM Press, New York, 1995, pp. 31--36.
No context found.
Park, J. S.; Chen, M.; and Yu, P. S. 1995b. Efficient parallel data mining for association rules. In ACM Intl. Conf. Information and Knowledge Management.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC