115 citations found. Retrieving documents...
E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In ACM SIGMOD Conference, pages 277--288. ACM Press, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Effect of Data Skewness and Workload Balance in Parallel Data .. - Cheung, Lee, Xiao   (3 citations)  (Correct)

.... and the second one is data distribution [3] Algorithms that use the count distribution paradigm include CD (Count Distribution) 3] and PDM (Parallel Data Mining) 18] Algorithms which adopt the data distribution paradigm include DD (Data Distribution) 3] IDD (Intelligent Data Distribution) [10] and HPA (Hash Based Parallel) 23] In the count distribution paradigm, each processor is responsible for computing the local support counts of all the candidates, which are the support counts in its partition. By exchanging the local support counts, all processors then compute the global ....

....computation, because every transaction is processed as many times as the number of processors. In addition, it requires a lot of communication, and its performance is worse than CD [3] IDD (Intelligent Data Distribution) and its variant HD (Hybrid Distribution) are important improvements on DD [10]. They partition the candidates across the processors based on the first item of a candidate. Therefore, each processor only needs to handle the subsets of a transaction which begin with the items assigned to the processor. This reduces significantly the redundant computation in DD. HPA (Hash ....

E. Han, G. Karypis and V. Kumar. Scalable parallel data mining for association rules. In Proc. of 1997.


An Efficient Parallel and Distributed Algorithm.. - Orlando.. (2002)   (Correct)

....that at least contain an infrequent (k 1) itemset are not included in C k . Several variations to the original Apriori algorithm, as well as many parallel implementations, have been proposed in the last years. We can recognize two main methods for determining itemset supports: a counting based [1,3,5,8,13] and an intersection based [14,16] one. The former one, also adopted by Apriori , exploits a horizontal dataset, where the transactions are stored sequentially. The method is based on counting how many times each candidate k itemset occurs in every transaction. The intersection based method, on ....

....In Section 3 we describe ParDCI in depth, while Section 4 presents and discusses the results of the experiments conducted. Finally in Section 5 we draw future works and some conclusions. 2 Related Work Several parallel algorithms for solving the FSC problem have been proposed in the last years [2,8]. Zaki authored a survey on ARM algorithms and relative parallelization schemas [15] Most proposals can be considered parallelizations of the well known Apriori algorithm. Agrawal et al. in [2] proposes a broad taxonomy of the parallelization strategies that can be adopted for Apriori on ....

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337-- 352, May/June 2000.


Association Rule Mining on Remotely Sensed Imagery Using P-Trees - Ding (2002)   (3 citations)  (Correct)

....all nodes have to broadcast the hashing result. A technique is proposed to decrease the number of messages. Among all the hash buckets, only those in which the total count is larger than a threshold are selected for bucket count exchange. Therefore, it avoids broadcasting all the buckets. In [HKK97], two algorithms were proposed for parallel association rule mining. One is the Intelligent Data Distribution algorithm; the other is the Hybrid Distribution algorithm. The Intelligent Data Distribution algorithm efficiently uses aggregate memory of the parallel computer by employing the ....

....mining algorithms are similar to one of these three algorithms, except some are special parallel algorithms from their serial ones. For example, PEAR [Mue95] PDM [PCY95b] NPA [SK96] FDM [CHN 96a] FPM [CX99] and CCPD [ZOP 96] are all similar to Count Distribution. Likewise, SPA [SK96] IDD [HKK97] and PCCD [ZOP 96, ZPO 97] are similar to Data Distribution, whereas HPA [SK96] and HPA ELD [SK96] are similar to Candidate Distribution. The Count Distribution algorithm is a simple parallelization of Apriori. In this algorithm, all processors generate the entire candidate set, thus each ....

E. H. Han, G. Karypis, and V. Kumar, "Scalable Parallel Data Mining for Association Rules," Proceedings of the ACM SIGMOD, Tucson, AZ, May 1997, pp. 277-288. 127


Supporting User Interaction for the Exploratory Mining of.. - Mah   (Correct)

....could be easily found by partitioning and enumerating the domain of each categorical attribute in the database and then applying Apriori, using the partitioned categories as items. Another popular topic in association mining is developing parallel mining algo rithms for finding association rules [12] [21] Other researchers are concerned vith different issues; one recent debate is the appropriateness of using confidence to assess relationship or association. Brin, Motvani and Silverstein in [9] suggested that the dependence ratio or correlation betveen tvo sets are more appropriate to ....

E. Han and G. Karypis. Scalable Parallel Data Mining for Association Rules. Proc. ACM SIGMOD Conference, pages 277-288, 1997.


On the Optimality of Association-rule Mining Algorithms - Pudi, Haritsa (2001)   (Correct)

.... Contact Author: haritsa dsl.serc.iisc.ernet.in 1 Introduction The problem of efficiently mining association rules from large historical market basket databases was introduced almost a decade ago, in [4] Since then, a whole host of algorithms for addressing this problem have been proposed [4, 6, 20, 17, 11, 13, 12, 21, 3]. The latest include FP growth [12] which utilizes a prefix tree structure for compactly representing and processing pattern information, and VIPER [21] which organizes and processes the database on a vertical (column) basis as opposed to the more traditional horizontal (row) basis. While the ....

....dynamically change the database layout during the mining process, we assume that the initial database is always provided in the horizontal item list (IL) format. 2. 3 System Characteristics While there has been significant work in designing algorithms for the parallel mining of association rules [5, 11, 29, 18], in this study we focus on single processor environments. We also assume that the database is much larger than the available main memory. 3 1 2 3 4 TID 1 2 3 4 TID 1 2 3 4 TID ItemID ItemIDs 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 2 3 4 1 2 3 4 5 1 0 1 0 0 ....

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1997.


DCI: a Hybrid Algorithm for Frequent Set Counting - Orlando, Palmerini, Perego   (Correct)

....is exponential in m = jM j, e ective pruning techniques exist for reducing it. Unfortunately, for small support thresholds, pruning becomes less e ective, thus making the FSC problem very expensive to solve both in time and space. A lot of proposals regard the ecient solution of the FSC problem [3, 4, 5, 12, 14, 15, 16, 18, 20, 21, 22]. The main goals of these algorithms are to eciently prune or partition P(M ) and to provide e ective strategies for traversing it. The capability of e ectively pruning the search space derives from the intuitive observation that none of the superset of an infrequent itemset can be frequent. The ....

....(or larger) is present in the database. Several variations to the original Apriori algorithm, as well as many parallel implementations, have been proposed in the last years. We can recognize two main methods for determining the supports of the various itemsets present in P(M ) a counting based [4, 5, 12, 18, 7, 1] and an intersection based [20, 9, 22] one. The former one, also adopted by Apriori, exploits a horizontal dataset, where the various transactions, containing information about the items included, are stored sequentially. The method is based on counting how many times each candidate k itemset ....

[Article contains additional citation context not shown here]

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337-352, May/June 2000.


Enhancing the Apriori algorithm for Frequent Set Counting - Orlando, Palmerini, Perego (2001)   (3 citations)  (Correct)

....k itemsets Ck candidate k itemsets Hk hash table used by DHP at iteration k Dk pruned transaction database read at iteration k (D1 = D) Mk set of the signi cative items in Dk mk cardinality of Mk Table I: Symbols used in the paper. 2 2 Related work The algorithms of the Apriori class [2, 3, 8, 11] are based on the same simple observation: if a given itemset in not frequent then none of its supersets can be frequent. They have a level wise behavior: they start with k = 1 by evaluating singleton itemsets, and base the computations performed at step k on the results of the previous iteration ....

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337-352, May/June 2000.


BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing.. - Wang, Karypis   (3 citations)  Self-citation (Karypis)   (Correct)

No context found.

E. Han, G. Karypis, V. Kumar, Scalable Parallel Data Mining for Association Rules, SIGMOD'97, May 1997.


BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing.. - Wang, Karypis   (3 citations)  Self-citation (Karypis)   (Correct)

No context found.

E. Han, G. Karypis, V. Kumar, Scalable Parallel Data Mining for Association Rules, SIGMOD'97, May 1997.


CLOSET+: Searching for the Best Strategies for Mining.. - Wang, Han, Pei (2003)   Self-citation (Han)   (Correct)

....specific permission and or a fee. SIGKDD 03, August 24 27, 2003, Washington, DC, USA Copyright 2003 ACM 1 58113 737 0 03 0008 . 5.00. 1. INTRODUCTION Since the introduction of association rule mining [1] there have been extensive studies on e#cient frequent itemset mining methods, such as [2, 11, 16, 6, 4, 3, 8, 7, 18, 10]. Most of the well studied frequent pattern mining algorithms, including Apriori [2] FP growth [8] H mine [13] and OP [10] mine the complete set of frequent itemsets. These algorithms may have good performance when the support threshold is high and the pattern space is sparse. However, when ....

E. Han, G. Karypis, V. Kumar. Scalable Parallel Data Mining for Association Rules. In TKDE 12(2), 2000.


Parallel Formulations of Tree-Projection-Based Sequence.. - Guralnik, Karypis   Self-citation (Karypis)   (Correct)

....a substantial amount of time. A number of efficient and scalable parallel formulations have been developed for finding frequent itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 22, 16, 4] both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20]. However, the problem of parallelizing equivalence class based and projection based algorithms has received relatively little attention and existing parallel formulations for them have been targeted only toward shared memory architectures [30, 28] However, the irregular and unstructured nature ....

....original database. The basic ideas in this algorithm were recently used to develop a similar algorithm for finding sequential patterns [21] A number of parallel frequent itemset discovery algorithms have been developed that focus on parallelizing the various serial algorithms for that problem [2, 22, 17, 8, 25, 29, 20, 28]. Depending on the nature of the underlying serial algorithm, many of these approaches follow the same parallelization strategy. Providing exact details on all of these algorithms is beyond the scope of this section, and for this reason we only focus on the various issues involved in parallelizing ....

[Article contains additional citation context not shown here]

E.H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Eng., 12(3):337--352, 2000.


Parallel Formulations of Tree-Projection-Based Sequence.. - Guralnik, Karypis   Self-citation (Karypis)   (Correct)

....a substantial amount of time. A number of efficient and scalable parallel formulations have been developed for finding frequent itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 22, 16, 4] both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20]. However, the problem of parallelizing equivalence class based and projection based algorithms has received relatively litfie attention and existing parallel formulations for them have been targeted only toward shared memory architectures [30, 28] However, the irregular and unstructured nature ....

....original database. The basic ideas in this algorithm were recently used to develop a similar algorithm for finding sequential patterns [21] A number of parallel frequent itemset discovery algorithms have been developed that focus on parallelizing the various serial algorithms for that problem [2, 22, 17, 8, 25, 29, 20, 28]. Depending on the nature of the underlying serial algorithm, many of these approaches follow the same parallelization strategy. Providing exact details on all of these algorithms is beyond the scope of this section, and for this reason we only focus on the various issues involved in parallelizing ....

[Article contains additional citation context not shown here]

E.H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Eng., 12(3):337-352, 2000.


Exploratory Mining and Pruning Optimizations of.. - Ng, Lakshmanan, Pang.. (1998)   (118 citations)  Self-citation (Hah)   (Correct)

....rules from large databases has been the subject of numerous studies. These studies cover a broad spectrum of topics including: i) fast algorithms based on the levelwise Apriori framework [3, 13] partitioning [19, 18] and sampling [24] ii) incremental updating and parallel algorithms [6, 2, 8]; iii) mining of generalized and multi level rules [21, 9] iv) mining of quantitative rules [22, 16] v) mining of multidimensional rules [7, 14, 12] vi) mining rules with item constraints [23] and (vii) association rule based query languages [15, 4] However, from the standpoint of the ....

....wants to focus the generation of rules to a specific, small subset of candidates, based on properties of the data Such a black box model would be tolerable if the turnaround time of the computation were small, e.g. a few seconds. However, despite the development of many efficient algorithms [2, 3, 6, 8, 13, 18, 19, 24], association mining remains a process typically taking hours to complete. Before a new invocation of the black box, the user is not allowed to preempt the process and needs to wait for hours. Furthermore, typically only a small fraction of the computed rules might be what the user was looking ....

E.-H. Hah, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD 97, pp 277- 288.


Algorithms for Clustering High Dimensional and - Tao   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In ACM SIGMOD Conference, pages 277--288. ACM Press, 1997.


Association-Based Similarity Testing and Its Applications - Tao Li Department   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In ACM SIGMOD Conference, pages 277--288. ACM Press, 1997.


Estimating Joint Probabilities without - Combinatory Counting April   (Correct)

No context found.

Han, E., Karypis, G., & Kumar, V. (1997). Scalable parallel data mining for association rules. Proc. of ACM SIGMOD.


March 2002 - Un Vers Ty   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In Proc. of ACM SIGMOD, 1997.


Parallel Mining for Frequent Fragments on a.. - Meinl, Fischer.. (2005)   (Correct)

No context found.

Eui-Hong Han, George Karypis, and Vipin Kumar. Scalable Parallel Data Mining for Association Rules. In Proc. 1997.


Efficient Hardware Data Mining with the Apriori Algorithm on.. - Baker, Prasanna   (Correct)

No context found.

E.(Sam) Han, G. Karypis, and V. Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3), 2000.


Efficient Parallel Data Mining with the Apriori Algorithm on.. - Baker, Prasanna   (Correct)

No context found.

E.(Sam) Han, G. Karypis, and V. Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3), 2000. 15


Multi-Database Mining - Shichao Zhang Xindong (2003)   (1 citation)  (Correct)

No context found.

E. Han, G. Karypis and V. Kumar, Scalable Parallel Data Mining for association rules. In: Proceedings of ACM SIGMOD, 1997: 277-288.


Identifying Global Exceptional Patterns in - Multi-Database Mining Chengqi (2004)   (Correct)

No context found.

E. Han, G. Karypis and V. Kumar, Scalable Parallel Data Mining for association rules. In: Proceedings of the ACM SIGMOD Conference on Management of Data, 1997: 277-288.


A Scalable Multi-Strategy Algorithm for Counting.. - Orlando, Palmerini.. (2002)   (Correct)

No context found.

Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transaction on Knowledge and Data Engineering, 12(3):337--352, may/june 2000.


A Scalable Multi-Strategy Algorithm for Counting.. - Orlando, Palmerini.. (2002)   (Correct)

No context found.

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337--352, May/June 2000.


Adaptive and Resource-Aware Mining of Frequent Sets - Orlando Palmerini Perego (2002)   (7 citations)  (Correct)

No context found.

Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transaction on Knowledge and Data Engineering, 12(3):337--352, may/june 2000.


Adaptive and Resource-Aware Mining of Frequent Sets - Orlando Palmerini Perego (2002)   (7 citations)  (Correct)

No context found.

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337--352, May/June 2000.


Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Yang, Agrawal (2004)   (1 citation)  (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar, "Scalable Parallel Datamining for Association Rules," IEEE Trans. Data and Knowledge Eng., vol. 12, no. 3, May/June 2000.


Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Yang, Agrawal (2004)   (1 citation)  (Correct)

No context found.

E.-H.Han,G.Karypis,andV.Kumar,"ScalableParallel Datamining for Association Rules," Proc. ACM SIGMOD 1997.


Compiler and Middleware Support for Scalable Data Mining - Agrawal, Jin, Li   (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2000.


Compiler and Middleware Support for Scalable Data Mining - Agrawal, Jin, Li   (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. In Proceedings of ACM SIGMOD 1997.


Asynchronous and Anticipatory Filter-Stream Based.. - Veloso.. (2004)   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. Transactions on Knowledge and Data Engineering, 12(3):728--737, 2000.


An Algorithm for In-Core Frequent Itemset Mining on Streaming.. - Jin, Agrawal (2003)   (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2000.


Identifying Global Exceptional Patterns in - Multi-Database Mining Chengqi (2004)   (Correct)

No context found.

E. Han, G. Karypis and V. Kumar, Scalable Parallel Data Mining for association rules. In: Proceedings of the ACM SIGMOD Conference on Management of Data, 1997: 277-288.


Multi-Database Mining - Zhang, Wu, Zhang (2003)   (1 citation)  (Correct)

No context found.

E. Han, G. Karypis and V. Kumar, Scalable Parallel Data Mining for association rules. In: Proceedings of ACM SIGMOD, 1997: 277-288.


New Parallel Algorithms for Frequent Itemset.. - Veloso.. (2003)   (Correct)

No context found.

E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. Transactions on Knowledge and Data Engineering, 12(3):728--737, 2000.


Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Agrawal (2002)   (1 citation)  (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2000.


Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Agrawal (2002)   (1 citation)  (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. In Proceedings of ACM SIGMOD 1997.


CONQUEST: A Distributed Tool for Constructing Summaries of.. - Chi, Koyuturk, Grama (2004)   (Correct)

No context found.

E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. pages 277--288, 1997.


Towards Effective and Efficient Distributed Clustering - Januzaj, Kriegel, Pfeifle (2003)   (Correct)

No context found.

Han E. H., Karypis G., Kumar V.: "Scalable parallel data mining for association rules" In: SIGMOD Record: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, Tucson, AZ, USA. (1997) 277-288


Load Balancing on PC Clusters with the Super-Programming Model - Jin, Ziavras (2003)   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar, " Scalable Parallel Data Mining for Association Rules ," Proc. ACM Special Interest Group on Management of Data (SIGMOD), p277-88, May 1997.


Association Rule Mining in Peer-to-Peer Systems - Wolff, Schuster (2003)   (Correct)

No context found.

E.-H. S. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):352 -- 377, 2000.


Parallel and Distributed Frequent Itemset Mining on.. - Veloso, Otey.. (2003)   (Correct)

No context found.

E.-H. Han, G. Karypis, , and V. Kumar. Scalable parallel data mining for association rules. In ACM SIGMOD Conf. Management of Data, 1997.


Thesis Proposal - Ruoming Jin Department   (Correct)

No context found.

E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2000.


A Scalable Multi-Strategy Algorithm for Counting.. - Orlando, Palmerini.. (2002)   (Correct)

No context found.

Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transaction on Knowledge and Data Engineering, 12(3):337--352, may/june 2000.


A Scalable Multi-Strategy Algorithm for Counting.. - Orlando, Palmerini.. (2002)   (Correct)

No context found.

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337--352, May/June 2000. 22


Non Recursive Generation of Frequent K-itemsets from Frequent .. - El-Hajj, Zaiane   (Correct)

No context found.

E.-H. Han, G. Karypis, and V.Kumar. Scalable parallel data mining for association rule. Transactions on Knowledge and data engineering, 12(3):337--352, May-June 2000.


Adaptive and Resource-Aware Mining of Frequent Sets - Orlando Palmerini Perego (2002)   (7 citations)  (Correct)

No context found.

E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE TKDE, 12(3):337--352, May/June 2000.


Estimating Joint Probabilities without - Combinatory Counting April   (Correct)

No context found.

Han, E., Karypis, G., & Kumar, V. (1997). Scalable parallel data mining for association rules. Proc. of ACM SIGMOD.


Similarity Testing Between Heterogeneous Basket Datasets - Li, al. (2002)   (Correct)

No context found.

E. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In Proc. of ACM SIGMOD, 1997.


MINTO: A Software Tool for Mining Manufacturing Databases - Haritsa   (Correct)

No context found.

E. Han, G. Karypis and V. Kumar, "Scalable Parallel Data Mining for Association Rules", Proc. of 26th SIGMOD Conf., June 1997.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC