• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

M.J.: Efficiently mining maximal frequent itemsets. In: ICDM, (2001)

by K Gouda, Zaki
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 161
Next 10 →

CHARM: An efficient algorithm for closed itemset mining

by Mohammed J. Zaki, Ching-jui Hsiao , 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract - Cited by 320 (14 self) - Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hash-based approach to remove any “non-closed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
(Show Context)

Citation Context

...gram grant EIA-0103708. † The workwas done while the author was at RPI.sThere are two current solutions to the long pattern mining problem. The first one is to mine only the maximal frequent itemsets =-=[2, 4, 6, 10, 12]-=-, which are typically orders of magnitude fewer than all frequent patterns. While mining maximal sets help understand the long patterns in dense domains, they lead to a loss of information; since subs...

Efficiently Using Prefix-trees in Mining Frequent Itemsets

by Gösta Grahne , Jianfei Zhu , 2003
"... Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. ..."
Abstract - Cited by 180 (1 self) - Add to MetaCart
Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper we present a novel array-based technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. Our technique works especially well for sparse datasets. Furthermore,

Fast Vertical Mining Using Diffsets

by Mohammed J. Zaki, Karam Gouda , 2001
"... A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction i ..."
Abstract - Cited by 153 (5 self) - Add to MetaCart
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.

Spin: Mining maximal frequent subgraphs from graph databases

by Jun Huan, Wei Wang, Jan Prins, Jiong Yang - IN KDD , 2004
"... One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this ..."
Abstract - Cited by 99 (12 self) - Add to MetaCart
One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude. Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a five-fold speed up over the current state-of-the-art subgraph mining algorithms.
(Show Context)

Citation Context

... permits the subgraph test to be performed incrementally at successive levels during the search [10]. Our current work benefits extensively from existing algorithms for maximal itemset mining such as =-=[3, 7]-=- and frequent subtree mining algorithms [1, 22]. 5. CONCLUSION AND FUTURE WORK In this paper we present SPIN, an algorithm to mine maximal frequent subgraphs from a graph database. A new framework, wh...

Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

by Mohammed J. Zaki, Ching-Jui Hsiao - IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract - Cited by 85 (7 self) - Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

by Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz - In ICDM , 2004
"... This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory const ..."
Abstract - Cited by 77 (4 self) - Add to MetaCart
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets consist of a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than previous approaches.
(Show Context)

Citation Context

...hus, algorithms that mine all frequent itemsets often suffer from the problem of combinatorial explosion. Two solutions have been proposed to alleviate this problem. In the first solution (e.g., [4], =-=[11]-=-), only maximal frequent itemsets are discovered. A frequent itemset is maximal if none of its proper supersets is frequent. The total number of maximal frequent itemsets M is much smaller than that o...

Discovering All Most Specific Sentences

by Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Ram Sewak Sharma - ACM Transactions on Database Systems , 2003
"... this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity o ..."
Abstract - Cited by 73 (4 self) - Add to MetaCart
this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity of algorithms for these data mining tasks

Fast algorithms for frequent itemset mining using FP-trees

by Gösta Grahne, Jianfei Zhu - IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005
"... Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent it ..."
Abstract - Cited by 64 (0 self) - Add to MetaCart
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.
(Show Context)

Citation Context

...ndidate frequent itemset generation can be done by bitvector and-operations. Pruning techniques are also used in the MAFIA algorithm. GenMax, another depth-first algorithm, proposed by Gouda and Zaki =-=[11]-=-, takes an approach called progressive focusing to do maximality testing. This technique, instead of comparing a newly found frequent itemset with all maximal frequent itemsets found so far, maintains...

Association Mining

by Aaron Ceglar, John F. Roddick , 2006
"... The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the state-of-the- ..."
Abstract - Cited by 61 (1 self) - Add to MetaCart
The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the state-of-the-art. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.

The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns

by Guizhen Yang - In KDD ’04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining , 2004
"... Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of countin ..."
Abstract - Cited by 50 (0 self) - Add to MetaCart
Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. This result is of particular interest since the associated decision problem of checking the existence of a maximal frequent itemset is in P. We also extend our complexity analysis to other similar data mining problems dealing with complex data structures, such as sequences, trees, and graphs, which have attracted intensive research interests in recent years. Normally, in these problems a partial order among frequent patterns can be defined in such a way as to preserve the downward closure property, with maximal frequent patterns being those without any successor with respect to this partial order. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University