48 citations found. Retrieving documents...
R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962--969, 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Effect of Data Skewness and Workload Balance in Parallel Data .. - Cheung, Lee, Xiao   (3 citations)  (Correct)

....algorithms. The prime activity in finding large itemsets is the computation of support counts of candidate itemsets. Two di#erent paradigms have been proposed for parallel system with distributed memory for this purpose. The first one is count distribution and the second one is data distribution [3]. Algorithms that use the count distribution paradigm include CD (Count Distribution) 3] and PDM (Parallel Data Mining) 18] Algorithms which adopt the data distribution paradigm include DD (Data Distribution) 3] IDD (Intelligent Data Distribution) 10] and HPA (Hash Based Parallel) 23] In ....

....counts of candidate itemsets. Two di#erent paradigms have been proposed for parallel system with distributed memory for this purpose. The first one is count distribution and the second one is data distribution [3] Algorithms that use the count distribution paradigm include CD (Count Distribution) [3] and PDM (Parallel Data Mining) 18] Algorithms which adopt the data distribution paradigm include DD (Data Distribution) 3] IDD (Intelligent Data Distribution) 10] and HPA (Hash Based Parallel) 23] In the count distribution paradigm, each processor is responsible for computing the local ....

[Article contains additional citation context not shown here]

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report TJ10004, IBM Research Division, Almaden Research Center, 1996.


Parallel and Distributed Search for Structure in.. - Oates, Schmill, Cohen (1996)   (3 citations)  (Correct)

....features in relational databases [2] The newly constructed features are then passed to a standard (serial) inductive learning algorithm. While parallelism speeds the searchfornew features, it does not affect the speed with which rules using those features can be learned. Agrawal and Shafer [1] explore several parallel algorithms for mining association rules from very large databases, and Dehaspe and De Raedt [4] present a parallel implementation of the claudien clausal discovery system. Berndt and Clifford describe a dynamic programming algorithm for finding recurring patterns in ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ 10004, IBM, 1996.


Exploratory Mining and Pruning Optimizations of.. - Ng, Lakshmanan, Pang.. (1998)   (118 citations)  (Correct)

....rules from large databases has been the subject of numerous studies. These studies cover a broad spectrum of topics including: i) fast algorithms based on the levelwise Apriori framework [3, 13] partitioning [19, 18] and sampling [24] ii) incremental updating and parallel algorithms [6, 2, 8]; iii) mining of generalized and multi level rules [21, 9] iv) mining of quantitative rules [22, 16] v) mining of multidimensional rules [7, 14, 12] vi) mining rules with item constraints [23] and (vii) association rule based query languages [15, 4] However, from the standpoint of the ....

....wants to focus the generation of rules to a specific, small subset of candidates, based on properties of the data Such a black box model would be tolerable if the turnaround time of the computation were small, e.g. a few seconds. However, despite the development of many efficient algorithms [2, 3, 6, 8, 13, 18, 19, 24], association mining remains a process typically taking hours to complete. Before a new invocation of the black box, the user is not allowed to preempt the process and needs to wait for hours. Furthermore, typically only a small fraction of the computed rules might be what the user was looking ....

R. Agrawal and J. C. Sharer. Parallel mining of associa- tion rules: Design, implementation, and experience. IEEE TKDE, 8, pp 962-969, Dec 1996.


A Fast Distributed Algorithm for Mining Association Rules - Cheung, Han, Ng, Fu, Fu (1996)   (39 citations)  (Correct)

....in recent data min ing research [5] Mining association rules may require iterative scanning of large transaction or relational databases which is quite costly in processing. Therefore, efficient mining of association rules in transaction and or relational databases has been studied substantially [1, 2, 4, 8, 10, 11, 12, 14, 15]. The research of the first author was supported in part by RGC (the Hong Kong Research Grants Council) grant 338 065 0026. The research of the second author was supported in part by the research grant NSERC A3723 from the Natural Sciences and Engineering Research Council of Canada, the re ....

....rules [8] quantitative association rules [15] etc. Maintenance of discovered association rules by incremental updating has been studied in [4] Although these studies are on sequential data mining techniques, algorithms for parallel mining of association rules have been proposed recently [11, 1]. We feel that the development of distributed algorithms for efficient mining of association rules has its unique importance, based on the following reasoning. 1) Databases or data warehouses [13] may store a huge amount of data. Mining association rules in such databases may require substantial ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Sharer. Parallel mining of association rules: Design, implementation, and experience. In IBM Research Report, 1996.


Efficient Parallel Frequency Mining Based On A Novel Top-Down.. - Özkural (2002)   (Correct)

....the iteration the database is redistributed according to item set partitioning. The partitioning algorithm considers a lexicographical ordering of L k and L k 1 . The item sets X in L k 1 with the same k 1 length pre xes as item sets Y in L k are sucient to compute the candidates and results of Y [6]. The load balanced partition of item sets is achieved by distributing the connected components in a weighted dependency graph which represents candidate generation dependencies among k 1 length pre xes of L k . After iteration l, each processor proceeds independently only using pruning ....

....employs two improved item set partitioning schemes for task parallelism in the design of these algorithms [78] Equivalence class clustering uses the same idea as the partitioning Candidate Distribution described in the previous section. Here we shall dwell on this scheme with an example from [6]. L k in this scheme have their item sets represented as lexicographically ordered strings. Let L 3 = fABC;ABD;ABE;ACD;ACE;BCD;BCE;BDE;CDEg; L 4 = fABCD;ABCDE;ABDE;ACDE;BCDEg; L 5 = fABCDEg. Consider a part in L 3 = fABC;ABD;ABEg with the common pre x AB. Computation of candidates ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical report, IBM Almaden Research Center, IBM Corp, Almaden Res Ctr, 650 Harry Rd, San Jose, Ca, 95120, 1996. 91


Fast Parallel Association Rule Mining Without Candidacy.. - Osmar Zaane Mohammad   (1 citation)  (Correct)

....only two full I O scans for the dataset. Our approach presented in this paper is based on this idea. In spite of the significance of the association rule mining and in particular the generation of frequent itemsets, few advances have been made on parallelizing association rule mining algorithms [6, 2]. Most of the work on parallelizing association rules mining on Sharedmemory MultiProcessor (SMP) architecture was based on apriori like algorithms. Parthasarathy et al. 10] have written an excellent recent survey on parallel association rule mining with sharedmemory architecture covering most ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962--969, 1996.


Mining of Association Rules in Very Large Databases: a.. - Becuzzi, Coppola.. (1999)   (6 citations)  (Correct)

....L 1 is often roughly the same as I Theta I. Of course the algorithm requires temporary data structures holding information about L k and C k : so the amount of memory needed can be quite large. We regard the parallel solutions of Apriori as belonging to one of three main classes outlined in [1]. Count Distribution solutions replicate the candidate set at each node, and partition the database among the processors. The itemset counting is distributed, global communications are needed at each step of Apriori. Data Distribution parallelizations spread the candidate set over the nodes, ....

....efforts has no comparison within the literature, with the parallel programs being architecture tailored (not portable) or written using communication libraries. See for instance how much attention is due to implementation details, like communication cost analysis or synchronizations, in [1] or [8] Besides, the good results maintained across different architectures are also an evidence that structured parallel environments can promote restructuring of existing programs instead of development of new ones from scratch. We plan to complete the test set for the algorithm with much ....

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. IEEE Transactions on Knowledge and Data Engineering, 8(6), December


High-Performance Data Mining with Skeleton-based Structured .. - Coppola, Vanneschi (2001)   (Correct)

....The essential limits of the partitioned scheme is that both the intermediate solution and the data have to t in memory, and that too small a block size causes data skew. The clear advantage for the parallelisation is that almost all work is done independently on each partition. Following [22], we can classify the parallel implementations of Apriori into three main classes, Count, Data and Candidate Distribution, according to the interplay of the partitioning schemes for the input and the C k sets. We have applied the two phase partitioned scheme without the vertical representation ....

R. Agrawal, J. Shafer, Parallel mining of association rules: Design, implementation and experience, IEEE Transactions on Knowledge and Data Engineering 8 (6).


Parallel Inductive Logic in Data Mining - Wang (2000)   (1 citation)  (Correct)

....systems benefit from using the results of computational logic. There is already a wide range of data mining applications using ILP algorithms. Problem with inductive logic in data mining. There exist workable sequential algorithms for data mining, e.g. neural networks [27] association rules [1], decision trees [16] and inductive logic programming [7] that have already been applied to a wide range of real world applications. However, exploring useful information from a huge amount of data will require efficient parallel algorithms running on high performance computing systems. The most ....

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Report, February 1996.


PKDD'98 Tutorial on Scalable, High-Performance Data Mining with.. - Freitas (1998)   (Correct)

....a satisfactory set of rules is found. When mining very large databases, the bottleneck is step (3) this should be the main target of parallelization Parallel discovery of association rules. Core operation: compute support counts for all candidate itemsets of a given size; Parallel Apriori [Agrawal Shafer 96] a) data parallel version (Count distribution) each processor computes local support counts for all candidate itemsets of a given size, by accessing only tuples in its local memory (b)control parallel version (Data distribution) each processor computes support counts for its local candidate ....

R. Agrawal and J.C. Shafer. Parallel mining of association rules: design, implementation and experience. IBM Research Report RJ 10004. 1996. To appear in IEEE Trans. Knowledge & Data Engineering.


A Data-Clustering Algorithm On Distributed Memory Multiprocessors - Dhillon, Modha (2000)   (39 citations)  (Correct)

....submitted to IEEE Transactions on Knowledge and Data Engineering. appealing to employ parallel computing and to exploit the main memory of all the processors. Parallel data mining algorithms have been recently considered for tasks such as association rules and classification, see, for example, Agrawal and Shafer1996] Chattratichat et al..1997] Cheung and Xiao1999] Han et al..1997] Joshi et al..1998] Kargupta et al..1997] Shafer et al..1996] Srivastava et al..1998] Zaki et al..1998] and [Zaki et al..1997] Also, see [Stolorz and Musick1997] and [Freitas and Lavington1998] for recent books on ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Eng., 8(6):962--969, 1996.


An Efficient Distributed Algorithm for Computing Association.. - Yijun Li Xuemin   (Correct)

....association rules has received a great deal of attention in the past several years; and a number [2, 6, 9, 11] of efficient algorithms have been proposed to approach this problem. As the size of a database to be mined can be very large, parallel computation techniques have also been explored [1, 10]. Consider that in a distributed organization, the database may be allocated through a computer network. This leads to a real demand for developing distributed computation techniques in data mining. In this paper, we shall restrict ourself to an investigation of mining association rules in a ....

....for developing distributed computation techniques in data mining. In this paper, we shall restrict ourself to an investigation of mining association rules in a distributed database. In [9] an efficient distributed algorithm is proposed. It should be clear that the parallel algorithms developed in [1, 10] can be immediately used as distributed algorithms. Comparing the distributed algorithm DMA [4] with an implementation of the parallel algorithm CD [1] in a distributed environment, DMA is more efficient than CD because a reduction of both network communication and local processing costs has been ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience, IEEE Transactions on Knowledge and Data Engineering, 8(6), 962-969, 1996.


Effect of Data Skewness in Parallel Mining of Association Rules - Cheung, Xiao (1998)   (4 citations)  (Correct)

....machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup. Keywords: Association Rules, Data Mining, Data Skewness, Parallel Computing 1 Introduction Mining association rules in large databases has attracted a lot of attention in data mining research [1, 2, 4, 6, 9, 13, 15] The ....

....The problem can be reduced to finding all large itemsets with respect to a given support threshold [1, 2] Two different approaches for this problem have been proposed for parallel systems with distributed memory. One is the count distribution approach and the other the data distribution approach[3]. In the count distribution approach, each processor is responsible for computing the local support counts, which are the support counts in its local partition, of all the candidates. These local support counts are then exchanged among all the processors to compute the global support counts, ....

[Article contains additional citation context not shown here]

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Special Issue in Data Mining, IEEE Trans. on Knowledge and Data Engineering, IEEE Computer Society, V8, N6, December 1996, pp. 962-- 969.


Asynchronous Parallel Algorithm for Mining Association Rules.. - Cheung, Hu, Xia (1998)   (10 citations)  (Correct)

....itemsets. This computation requires multiple scannings of the database, and is very costly in general. Hence, there is a practical need to develop parallel algorithms for this task. Many algorithms for this purpose have been proposed for parallel machines with distributed shared nothing memory [3, 5, 6, 8, 9]. The research of the first author are supported in part by RGC (the Hong Kong Research Grants Council) grant 338 065 0032. In this work, we propose to study the mining problem on a sharedmemory multi processor (SMP) parallel machine. In a distributed shared nothing memory parallel system, the ....

....and asynchronous algorithm on the shared memory model. A direct extension of the serial association rules mining algorithm Apriori to the distributed memory model has been developed. It s implementation on the IBM SP2 system is called CD (Count Distribution) which is a synchronous algorithm [3]. A variant of CD adopted to the SMP model has been proposed in [10] This work concentrates in parallel candidate generation. However, it suffers heavily from I O contention. From what we know, no asynchronous parallel algorithm has been proposed for mining association rules. In this paper, we ....

[Article contains additional citation context not shown here]

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Special Issue in Data Mining, IEEE Trans. on Knowledge and Data Engineering, IEEE Computer Society, V8, N6, December 1996, pp. 962--969.


Extended Concepts for Association Rule Discovery - Rantzau (1997)   (Correct)

....presented in [Toi96] requires at most two passes over the database and uses a probabilistic sample (subset) of transactions to produce association rules which are then verified to hold for all transactions. There have been some efforts to use parallel algorithms for finding frequent itemsets [AS96, CHN 96, CNFF96, HKK97, PCY95b] 2.2 Discovering Association Rules Only few proposals can be found in the literature that deal with the creation of association rules because the performance of the itemset generation step dominates the performance of the overall algorithm. The assumption is ....

Rakesh Agrawal and John C. Shafer. Parallel Mining of Association Rules: Design, Implementation and Experience. IEEE Transactions on Knowledge and Data Engineering, 8(6):962-- 969, December 1996.


FlexiMine - A Flexible Platform for KDD Research and .. - Ben-Eliyahu-Zohary..   (Correct)

....given minimum (frequent large itemsets) and generating the desired rules from these itemsets. In FlexiMine we have decided to implement the Apriori algorithm [3] for the following 10 reasons: ffl Other algorithms are harder to parallelize or have slower performances in their parallel versions [2]. ffl Special filtering of rules based on interestingness [15, 35] was developed (section 4.1.2) and the Apriori algorithm framework provides useful intermediate information for this purpose [9] ffl On line generation of the BKB probabilistic graphical model (section 4.2) has a natural ....

....into account automatically. 4.1.1 Distributed Induction of Association Rules Inducing association rules (ARs) from a set of transactions is an important, computeintensive task. As such, efficient parallelization of AR induction is of interest, and has been accomplished for supercomputer systems [2]. While parallelized versions of several AR induction algorithms exist, and have been profiled for parallel computers, much less is known about achieving effective parallelism for this task in distributed environments, which are far more accessible to most users than supercomputers. Our goal was ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel Mining of Association Rules: Design, Implementation and Experience. IBM Research Report, RJ 10004, 1996.


Parallel Frequent Set Counting - Skillicorn (1999)   (Correct)

....accesses to the data that limit the performance of the algorithm. In some applications, the large number of potentially frequent sets of small size (2 or 3) also creates a storage management performance bottleneck. Sequential levelwise algorithms for finding frequent sets have been well studied [1, 9], although performance increase of two orders of magnitude have been recently reported by clever strategies for representing and generating large candidates from smaller ones [7] These algorithms are relatively straightforward to parallelize, although a number of different strategies have been ....

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Report, February 1996.


Large Scale Data Mining: The Challenges and The Solutions - Chattratichat.. (1997)   (1 citation)  (Correct)

....is described by Singer [16] A similar approach is used in Section 3 of this paper. Data parallel approaches have been proposed by Hung and Adeli [8] for a Cray Y MP. A similar approach has been implemented for a 3 layer neural network on a Thinking Machine CM 2 by Farber [5] Agrawal and Shafer [1] investigated three different parallel algorithms for mining association rules. The first two approaches, count distribution and data distribution, aim to exploit data parallelism and require a barrier synchronisation of all the processors at the end of every pass. The candidate distribution ....

Rakesh Agrawal and John C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical report.


Mining Multiple-Level Association Rules in Large Databases - Han, Fu (1997)   (6 citations)  (Correct)

....rules from large databases. Index terms. Data mining, knowledge discovery in databases, association rules, multiple level association rules, algorithms, performance. 1 Introduction Mining of association rules from large data sets has been a focused topic in recent data mining research [1, 3, 4, 2, 9, 8, 10, 11, 14, 22, 15, 16, 18, 17, 19, 20, 21, 24]. Many applications at mining associations requires that mining be performed at multiple levels of abstraction. For example, besides finding 80 of customers that purchase milk may also purchase bread, it is interesting to allow users to drill down and show that 75 of people buy wheat bread if ....

....and filtering out those whose accumulated support count is lower than the minimum support. L[1; 1] is then used to filter out (1) any item which is not frequent in a transaction, and (2) the transactions in T [1] which contain only infrequent items. This results in a filtered transaction table T [2] of Figure 2. Moreover, since there are only two entries in L[1,1] the level 1 frequent 2 itemset table L[1,2] may contain only 1 candidate item f1, 2g, which is supported by 4 transactions in T [2] Level 1 minsup = 4 Level 1 frequent 1 itemsets: L[1,1] Itemset Support f1g 5 f2g 5 Level 1 ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962--969, 1996.


Quantifying the Utility of the Past in Mining Large Databases - Pudi, Haritsa (2000)   (1 citation)  (Correct)

....are changed in this manner, d number of transactions are produced with the new modified set of potentially large itemsets. 5.3 Itemset Data Structures In our implementation of the algorithms, we generally use the hashtree data structure [3] as a container for itemsets. However, as suggested in [2], the 2 itemsets are not stored in hashtrees but instead in a 2 dimensional triangular array which is indexed by the large 1 itemsets. It has been reported (and also confirmed in our study) that adding this optimization results in a considerable improvement in performance. All the algorithms in ....

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Almaden Research Center, San Jose, CA 95120, January 1996.


Quantifying the Utility of the Past in Mining Large Databases - Pudi, Haritsa (2000)   (1 citation)  (Correct)

....are changed in this manner, d number of transactions are produced with the new modified set of potentially large itemsets. 5.3. Itemset Data Structures In our implementation of the algorithms, we generally use the hashtree data structure [3] as a container for itemsets. However, as suggested in [2], the 2 itemsets are not stored in hashtrees but instead in a 2 dimensional triangular array which is indexed by the large 1 itemsets. It has been reported (and also confirmed in our study) that adding this optimization results in a considerable improvement in performance. All the algorithms in ....

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Almaden Research Center, San Jose, CA 95120 (1996).


A Data-Clustering Algorithm On Distributed Memory Multiprocessors - Dhillon, Modha   (39 citations)  (Correct)

....which is likely to be considerably slower, it is appealing to employ parallel computing and to exploit the main memory of all the processors. Parallel data mining algorithms have been recently considered for tasks such as association rules and classi cation, see, for example, Agrawal and Shafer [1], Chattratichat et al. 2] Cheung and Xiao [3] Han, Karypis, and Kumar [4] 2 Dhillon Modha Joshi, Karypis, and Kumar [5] Kargupta, Hamzaoglu, and Sta ord [6] Shafer, Agrawal, and Mehta [7] Srivastava, et al. 8] Zaki, Ho, and Agrawal [9] and Zaki et al. 10] Also, see Stolorz and ....

Agrawal, R., Shafer, J.C.: Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Eng. 8 (1996) 962-969


SkIE: a heterogeneous environment for HPC applications - Bacci, Danelutto.. (1999)   (5 citations)  (Correct)

....results and writes them on disk. Even if stages 1 and 4 are sequential, they do not constitute a bottleneck for the application, as reading the database takes a very small amount of time if compared the rest of the application. The partitioning strategy adopted is akin to count parallelizations [4], hence it scales well with respect to increasing database size, and can effectively exploit a high degree of parallelism. Moreover, this strategy doesn t force us to use 18 Apriori inside module 2, as any other sequential or parallel algorithm will work. In this application, the percentage of ....

R. Agrawal and J.C. Shafer, Parallel mining of association rules: Design, implementation and experience, IBM Research Report RJ 10004, January, 1996. Also appeared in IEEE Transactions on Knowledge and Data Engineering 8 (1996).


A Lazy Model-Based Approach to On-Line Classification - Melli (1998)   (Correct)

....to make the classification algorithm more database aware by fetching each CHAPTER 2. GENERAL FRAMEWORK 15 record from the database as required. This loosely coupled approach however, often encounters poor performance due to the copying of records over a network into the application s address space [2]. A tightly coupled approach, instead, pushes some of the processing directly to the database management system (DBMS) This approach benefits in part from the extensive research into database query optimization. In the case of a database system with SQL support, the use of the GROUP BY operation ....

....LazyDT concludes by returning the tree path just generated and also returns the class distribution of the records in D that also reach the leaf node of this path. Below is a sample result which shows a path of a tree (in brackets) and its leaf node (far right) # (A 1 # = a 1x ) # (A 2 # =[2,4]) # (A 1 # = a 1y ) # # (A 4 # [3, 0, 0] 3.6) The result can be interpreted to mean that because #e 1 in not equal to a 1x or a 1y and because #e 2 is not in the range [2,4] LazyDT predicts a class of A 4 = c 1 for event vector #e. The range [2,4] for attribute A 2 would have been generated ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962--969, 1996.


Mining Multiple-Level Association Rules in Large Databases - Han, Fu (1997)   (6 citations)  (Correct)

....association rules. Index TermsData mining, knowledge discovery in databases, association rules, multiple level association rules, algorithms, performance. 1INTRODUCTION MINING of association rules from large data sets has been a focused topic in recent data mining research [1] 3] 4] [2], 9] 8] 10] 11] 14] 22] 15] 16] 18] 17] 19] 20] 21] 24] Many applications at mining associations require that mining be performed at multiple levels of abstraction. For example, besides finding 80 percent of customers that purchase milk may also purchase bread, it is ....

R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience, IEEE Trans. Knowledge and Data Eng., vol. 8, pp. 962969, 1996.


Pincer-Search: A New Algorithm for Discovering the Maximum.. - Lin, Kedem (1997)   (51 citations)  (Correct)

....to search for them bottom up. If all maximal frequent itemsets are expected to be long (close to n in size) it seems efficient to search for them top down. In a pure bottom up approach, only Observation 1 above is used to prune candidates. This is the technique that existing algorithms ( 3] [4] [6] 9] 11] 12] 15] 16] 17] use to decrease the number of candidates. In a pure top down approach, only Observation 2 is used to prune candidates. We will show that by combining both approaches we will be able to make use of the information gathered in one direction to prune more ....

....expect even greater improvements when the average size of the maximal frequent itemsets is further increased. 13 5 Related Work There has been extensive research on designing association rule mining algorithms ( 1] 3] 7] 9] 12] 16] 18] Some papers concentrate on parallel algorithms ([4] [9] 16] Other papers focus on mining generalized association rules ( 6] 15] Another direction, as described in [2] and [11] is to provide a unified model for the process of classification, association, and sequence rules discovery. The discovery of frequent set is an important process in ....

R. Agrawal and J. C. Shafer. Parallel Mining of association rules: Design, Implementation and Experience. IBM Research Report RJ10004, Feb. 1996.


FlexiMine - A Flexible Platform for KDD Research.. -.. (1998)   (Correct)

....the given minimum (frequent large itemsets) and generating the desired rules from these itemsets. In FlexiMine we have decided to implement the Apriori algorithm [3] for the following reasons: ffl Other algorithms are harder to parallelize or have slower performances in their parallel versions [2]. ffl Special filtering of rules based on interestingness [15, 35] was developed (section 4.1.2) and the Apriori algorithm framework provides useful intermediate information for this purpose [9] ffl On line generation of the BKB probabilistic graphical model (section 4.2) has a natural ....

....into account automatically. 4.1.1 Distributed Induction of Association Rules Inducing association rules (ARs) from a set of transactions is an important, computeintensive task. As such, efficient parallelization of AR induction is of interest, and has been accomplished for supercomputer systems [2]. While parallelized versions of several AR induction algorithms exist, and have been profiled for parallel computers, much less is known about achieving effective parallelism for this task in distributed environments, which are far more accessible to most users than supercomputers. Our goal was ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel Mining of Association Rules: Design, Implementation and Experience. IBM Research Report, RJ 10004, 1996.


Mining Inter-Transaction Associations with Templates - Hongjun Jeffrey Xu   (Correct)

....latter as inter transaction associations. In [LHF98] we introduced multi dimensional intertransaction association rules mining, and discussed its related properties in comparison with [BWJ96, BWJ98, DLM 98, MT96, MTV97] A preliminary performance study was conducted by entension of Apriori [AS96] From the initial results, we found that multi dimensional inter transaction association rules do bring us more comprehensive knowledge than traditional intra transaction association rules, but this is at the expense of higher computational cost. In order to make inter transaction association ....

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report IBM Research Report RJ 10004, 1996.


A Data-Clustering Algorithm On Distributed Memory Multiprocessors - Dhillon, Modha (1999)   (39 citations)  (Correct)

....which is likely to be considerably slower, it is appealing to employ parallel computing and to exploit the main memory of all the processors. Parallel data mining algorithms have been recently considered for tasks such as association rules and classification, see, for example, Agrawal and Shafer [1], Chattratichat et al. 6] Cheung and Xiao [8] Han, Karypis, and Kumar [22] Joshi, Karypis, and Kumar [24] Kargupta, Hamzaoglu, and Stafford [25] Shafer, Agrawal, and Mehta [32] Srivastava, et al. 38] Zaki, Ho, and Agrawal [41] and Zaki et al. 42] Also, see Stolorz and Musick [39] and ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Eng., 8(6):962--969, 1996.


An Adaptive Algorithm for Mining Association Rules on.. - Cheung, Hu, Xia   (Correct)

....faster the algorithm would be. Since computing large itemsets is very costly in terms of cpu and I O resources there is a practical need to develop parallel algorithms for this task. Many algorithms for this purpose have been proposed for parallel system with distributed shared nothing memory [3, 5, 7, 13, 15]. With its popularity and cost effectiveness, shared memory multiprocessor (SMP) is another important parallel computing paradigm. Mining association rules requires the storage of large amount of intermediate values, with its large aggregated memory, SMP is particularly good for this mining task. ....

....less I O comparing with the level wise approach. A direct extension of the serial association rules mining algorithm Apriori to the distributed memory model has been developed. Its implementation on the IBM SP2 system is called CD (Count Distribution) which is a synchronous level wise algorithm [3]. A variant of CD was adopted to SMP in [17] which parallelizes the candidate generation. However, it suffers heavily from I O contention. From what we know, no asynchronous parallel algorithm has been proposed for mining association rules. In this paper, we will propose a parallel algorithm APM ....

[Article contains additional citation context not shown here]

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Special Issue in Data Mining, IEEE Trans. on Knowledge and Data Engineering, IEEE Computer Society, V8, N6, December 1996, pp. 962--969.


Strategies for Parallel Data Mining - Skillicorn (1999)   (11 citations)  (Correct)

....cost expressions for each strategy can be easily instantiated for particular data mining techniques to give a more refined basis for decision. For example, several of these strategies have been implemented for computing association rules. The replicated approach is called Count Distribution [1]. It has been shown to outperform two other techniques provided that the candidate sets can be kept in memory: Data Distribution, in which the candidate set is partitioned, and the dataset circulated among the processors; and Candidate Distribution, in which the candidate set and dataset are both ....

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Report, February 1996.


Weighted Association Rules: Model and Algorithm - Ramkumar Sanjay   (Correct)

....earned by the store. We have developed a generalization of the itemset support problem which can incorporate item weights as well as transactions weights. However, the algorithm required to generate these weighted itemsets is more difficult than the DIC algorithm of [5] or the Apriori Algorithm [2, 4, 1, 3]. If only transaction weights are required for an application the solution is considerably simpler. In the following, we briefly describe some potential ways weights can be attached to items and or transactions and explore the algorithms required to compute the weighted sets. While we do not know ....

....in the case where each transaction is assigned a weight. If a given transaction tffl T contains an itemset J ae I then all subsets of the itemset J are also contained in the transaction t. Hence, if an itemset J is large then all its subsets K J are also large. Hence, the algorithms of [5] and [2, 4, 1, 3] can be applied with minor modifications to obtain all the large weighted itemsets. The only change required is to update the support of an itemset using the transaction weight, as opposed to incrementing it by 1 as earlier. The complexity issues, the running time improvements, and the scalability ....

R. Agrawal and J.C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Division, Almaden Research Center, 1996.


Strategies for Parallelizing Data Mining - Skillicorn   (3 citations)  (Correct)

....Typical examples in this class are naive parallelizations of association rules and decision trees, and parallel genetic algorithms. The common requirement is a need to examine all of the data to compute global frequencies or scores. The data distribution technique for association rules [1] performs relatively poorly for essentially the same reason. Type 2 represents the majority of approaches being investigated at present. On each pass, a processor is responsible for 1 pth of the data, and exchanges information gleaned from the data at the end of each pass. This information may ....

....in order of increasing sophistication, data based on frequencies or counts, partial conceptual data, or complete concepts. The critical property is that the volume of information exchanged is much smaller than that of the data itself. For example, the count distribution association rule technique [1] exchanges the counts of local support for each itemset 1 . Similarly, the SPRINT [8] and ScalParC [5] decision tree techniques exchange split points in attribute tables. While it is important that the volume of data to be exchanged is minimized, this is less critical than it seems for the ....

[Article contains additional citation context not shown here]

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Report, February 1996.


Parallel Data Mining for Association Rules on.. - Zaki, Ogihara.. (1996)   (30 citations)  (Correct)

....of association algorithms. In [11] a parallel implementation of the DHP algorithm [10] is presented. However only simulation results on a shared nothing or distributed memory machine like IBM SP2 were presented. Parallel implementations of the Apriori algorithm on the IBM SP2 were presented in [4]. There has been no study on shared everything or shared memory machines to date. 1.2 Contribution In this paper we present parallel implementations of the Apriori algorithm on the SGI Power Challenge shared memory multi processor. We study the degree of parallelism, synchronization, and data ....

R. Agrawal and J. Shafer. Parallel mining of association rules: design, implementation, and experience. Technical Report RJ10004, IBM Almaden Research Center, San Jose, CA 95120, Jan. 1996.


Knowledge Discovery with FlexiMine - Domshlak, Rozen, Schiller, Shimony (1998)   (2 citations)  (Correct)

....finding all sets of items that have support above the given minimum (frequent itemsets) and generating the desired rules from these itemsets. 2.1. 1 Parallel Association Rule Algorithms Three parallel algorithms for mining association rules, based on the Apriori algorithm were suggested in [2]. These algorithms have been designed to investigate the performance trade offs between computation, memory usage, communication, and the use of problemspecific information in parallel data mining. Specifically, 1. The focus of the Count Distribution algorithm was on minimizing communication, even ....

....more efficiently, at the expense of extra communication. 3. The Candidate Distribution algorithm exploits the semantics of the particular problem at hand to reduce synchronization between the processes and has load balancing built into it. An empirical comparison of these algorithms was made [2], implemented on a IBM POWERparallel System SP2, a shared nothing machine [25] The communication primitives used by the implementation are part of the MPI (Message Passing Interface) The candidates for a message passing communication standards were under discussion [12] The algorithms, as they ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation and experience. IBM Research Report, RJ 10004, 1996.


Exploratory Mining and Pruning Optimizations of.. - Ng, Lakshmanan, Han.. (1998)   (118 citations)  (Correct)

....rules from large databases has been the subject of numerous studies. These studies cover a broad spectrum of topics including: i) fast algorithms based on the levelwise Apriori framework [3, 13] partitioning [20, 18] and sampling [25] ii) incremental updating and parallel algorithms [6, 19, 2, 8]; iii) mining of generalized and multi level rules [22, 9] iv) mining of quantitative rules [23, 16] v) mining of multi dimensional rules [7, 14, 12] vi) mining rules with item constraints [24] and (vii) association rule based query languages [15, 4] However, from the standpoint of the ....

....wants to focus the generation of rules to a specific, small subset of candidates, based on properties of the data Such a black box model would be tolerable if the turnaround time of the computation were small, e.g. a few seconds. However, despite the development of many efficient algorithms [2, 3, 6, 8, 13, 18, 19, 20, 25], association mining remains a process typically taking hours to complete. Before a new invocation of the black box, the user is not allowed to preempt the process and needs to wait for hours. Furthermore, typically only a small fraction of the computed rules might be what the user was looking ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE TKDE, 8, pp 962--969, Dec 1996.


Exploratory Mining and Pruning Optimizations of Constrained.. - Ng (1998)   (118 citations)  (Correct)

....rules from large databases has been the subject of numerous studies. These studies cover a broad spectrum of topics including: i) fast algorithms based on the levelwise Apriori framework [3, 13] partitioning [19, 18] and sampling [24] ii) incremental updating and parallel algorithms [6, 2, 8]; iii) mining of generalized and multi level rules [21, 9] iv) mining of quantitative rules [22, 16] v) mining of multidimensional rules [7, 14, 12] vi) mining rules with item constraints [23] and (vii) association rule based query languages [15, 4] However, from the standpoint of the ....

....wants to focus the generation of rules to a specific, small subset of candidates, based on properties of the data Such a black box model would be tolerable if the turnaround time of the computation were small, e.g. a few seconds. However, despite the development of many efficient algorithms [2, 3, 6, 8, 13, 18, 19, 24], association mining remains a process typically taking hours to complete. Before a new invocation of the black box, the user is not allowed to preempt the process and needs to wait for hours. Furthermore, typically only a small fraction of the computed rules might be what the user was looking ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE TKDE, 8, pp 962--969, Dec 1996.


A Fast Distributed Algorithm for Mining Association Rules - Cheung, Han, Ng, Fu, Fu (1996)   (39 citations)  (Correct)

....in recent data mining research [5] Mining association rules may require iterative scanning of large transaction or relational databases which is quite costly in processing. Therefore, efficient mining of association rules in transaction and or relational databases has been studied substantially [1, 2, 4, 8, 10, 11, 12, 14, 15]. The research of the first author was supported in part by RGC (the Hong Kong Research Grants Council) grant 338 065 0026. The research of the second author was supported in part by the research grant NSERC A3723 from the Natural Sciences and Engineering Research Council of Canada, the ....

....rules [8] quantitative association rules [15] etc. Maintenance of discovered association rules by incremental updating has been studied in [4] Although these studies are on sequential data mining techniques, algorithms for parallel mining of association rules have been proposed recently [11, 1]. We feel that the development of distributed algorithms for efficient mining of association rules has its unique importance, based on the following reasoning. 1) Databases or data warehouses [13] may store a huge amount of data. Mining association rules in such databases may require substantial ....

[Article contains additional citation context not shown here]

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. In IBM Research Report, 1996.


Parallel and Distributed Search for Structure in Multivariate.. - Tim Oates (1996)   (3 citations)  (Correct)

....features in relational databases [2] The newly constructed features are then passed to a standard (serial) inductive learning algorithm. While parallelism speeds the search for new features, it does not affect the speed with which rules using those features can be learned. Agrawal and Shafer [1] explore several parallel algorithms for mining association rules from very large databases, and Dehaspe and De Raedt [4] present a parallel implementation of the claudien clausal discovery system. Berndt and Clifford describe a dynamic programming algorithm for finding recurring patterns in ....

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ 10004, IBM, 1996.


Strategies for Parallelizing Data Mining - Skillicorn   (3 citations)  (Correct)

....Typical examples in this class are naive parallelizations of association rules and decision trees, and parallel genetic algorithms. The common requirement is a need to examine all of the data to compute global frequencies or scores. The data distribution technique for association rules [1] performs relatively poorly for essentially the same reason. Type 2 represents the majority of approaches being investigated at present. On each pass, a processor is responsible for 1 pth of the data, and exchanges information gleaned from the data at the end of each pass. This information may ....

....in order of increasing sophistication, data based on frequencies or counts, partial conceptual data, or complete concepts. The critical property is that the volume of information exchanged is much smaller than that of the data itself. For example, the count distribution association rule technique [1] exchanges the counts of local support for each itemset 1 . Similarly, the SPRINT [8] and ScalParC [5] decision tree techniques exchange split points in attribute tables. While it is important that the volume of data to be exchanged is minimized, this is less critical than it seems for the ....

[Article contains additional citation context not shown here]

R. Agrawal and J. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Report, February 1996.


Parallel Data Mining for Association Rules on.. - Zaki, Ogihara.. (1996)   (30 citations)  (Correct)

....of association algorithms. In [9] a parallel implementation of the DHP algorithm [8] is presented. However only simulation results on a shared nothing or distributed memory machine like IBM SP2 were presented. Parallel implementations of the Apriori algorithm on the IBM SP2 were presented in [3]. There has been no study on shared everything or shared memory machines to date. In this paper we present parallel implementations of the Apriori algorithm on the SGI Power Challenge shared memory multi processor. We study the degree of parallelism, synchronization, and data locality issues in ....

R. Agrawal and J. Shafer. Parallel mining of association rules: design, implementation, and experience. Technical Report RJ10004, IBM Almaden Research Center, San Jose, CA 95120, Jan. 1996.


FlexiMine - A Flexible Platform for KDD Research.. - Domshlak.. (1998)   (3 citations)  (Correct)

....above the given minimum (frequent itemsets) and generating the desired rules from these itemsets. In FlexiMine we have decided to implement the Apriori algorithm [21] for the following reasons: ffl Other algorithms are harder to parallelize or have slower performances in their parallel versions [20]. ffl We are currently working on filtering of rules based in interestingness [13, 30] and the Apriori algorithm framework provides useful intermediate information for this purpose [31] ffl On line generation of the BKB probabilistic graphical model (see 3.2) has a natural integration with ....

....into account automatically. 4.1.1 Distributed Induction of Association Rules Inducing association rules (ARs) from a set of transactions is an important, computeintensive task. As such, efficient parallelization of AR induction is of interest, and has been accomplished for supercomputer systems [20]. While parallelized versions of several AR induction algorithms exist, and have been profiled for parallel computers, much less is known about achieving effective parallelism for this task in distributed environments, which are far more accessible to most users than supercomputers. Our goal was ....

[Article contains additional citation context not shown here]

R.Agrawal and J.C.Shafer. Parallel mining of association rules: Design, implementation and experience. IBM Research Report, RJ 10004, 1996.


Parallel Mining of Association Rules - Agrawal, Shafer (1996)   (53 citations)  Self-citation (Agrawal Shafer)   (Correct)

....algorithm[3] on which the proposed parallel algorithms are based. Section 3 gives the description of the parallel algorithms. Section 4 presents the results of the performance measurements of these algorithms. Section 5 contains conclusions. A more detailed version of this paper can be found in [2]. 2 Overview of the Serial Algorithm 2.1 Association Rules The basic problem of finding association rules as introduced in [1] is as follows. Let I = fi 1 ; i 2 ; i m g be a set of literals, called items. Let D be a set of transactions, where each transaction T is an itemset such that T ....

....values and can be done in O(log(n) communication steps. It also avoids any time consuming logic that would otherwise be needed to assure that we only combine counts that belong to the same candidate. The full details of this process including the MPI communication primitives used are described in [2]. 3.2 Algorithm 2: Data Distribution The attractive feature of the Count distribution algorithm is that no data tuples are exchanged between processors only counts are exchanged. Thus, processors can operate independently and asynchronously while reading the data. However, the disadvantage ....

[Article contains additional citation context not shown here]

Rakesh Agrawal and John Shafer. Parallel mining of association rules: Design, implementation and experience. Research Report RJ 10004, IBM Almaden Research Center, San Jose, California, February 1996. Available from http://www.almaden.ibm.com/cs/quest.


The Quest Data Mining System - Agrawal, Mehta, Shafer, Srikant.. (1996)   (43 citations)  Self-citation (Agrawal Shafer)   (Correct)

....the values of the attribute and then combining adjacent partitions as necessary. We also have measures of partial completeness that quantify the information loss due to partitioning. This generalization and the algorithm for finding such rules used in Quest are presented in (Srikant Agrawal 1996a) One potential problem that users experience in applying association rules to real problems is that many uninteresting or redundant rules may be generated along with the interesting rules. In (Srikant Agrawal 1995) further generalized in (Srikant Agrawal 1996a) a ....

....are presented in (Srikant Agrawal 1996a) One potential problem that users experience in applying association rules to real problems is that many uninteresting or redundant rules may be generated along with the interesting rules. In (Srikant Agrawal 1995) further generalized in (Srikant Agrawal 1996a) a greater than expected value interest measure was introduced, which is used in Quest to prune redundant rules. Sequential Patterns We introduced the problem of discovering sequential patterns in (Agrawal Srikant 1995) The input data is a set of sequences, called data sequences. Each ....

[Article contains additional citation context not shown here]

Agrawal, R., and Shafer, J. 1996. Parallel mining of association rules: Design, implementation and experience. Research Report RJ 10004, IBM Almaden Research Center, San Jose, California. To appear in IEEE Transactions on Knowledge and Data Engineering.


Fast Parallel Association Rule Mining without Candidacy.. - Zaïane, El-Hajj, Lu (2001)   (Correct)

No context found.

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962--969, 1996.


CONQUEST: A Distributed Tool for Constructing Summaries of.. - Chi, Koyuturk, Grama (2004)   (Correct)

No context found.

R. Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation and experience. Technical report, IBM Almaden Research Center, IBM Corp, 1996.


Strategies for Parallel Data Mining - These Explanations Can   (Correct)

No context found.

R. Agrawal and J. Shafer, Parallel Mining of Association Rules: Design, Implementation and Experience, IBM Research Report RJ10004, IBM Almaden Research Center, San Jose, Calif., Feb. 1996.


An Efficient Parallel Algorithm for Mining Association Rules in.. - Chen (1998)   (1 citation)  (Correct)

No context found.

Rakesh Agrawal and John C. Shafer, Parallel Mining of Association Rules: Design, Implementation and Experience,1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC