• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Survey on Frequent Pattern Mining, , (0)

by Bart Goethals
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 66
Next 10 →

An efficient algorithm for discovering frequent subgraphs

by Michihiro Kuramochi, George Karypis - IEEE Transactions on Knowledge and Data Engineering , 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract - Cited by 120 (7 self) - Add to MetaCart
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
(Show Context)

Citation Context

...s effective pruning, it achieves comparable performance with that achieved by the various depth-firstbased approaches, as long as the data set is not dense or the support value is not extremely small =-=[18]-=-, [22]. The overall flow of our algorithm, called FSG, is similar to that of Apriori, and works as follows. FSG starts by enumerating all frequent single- and double-edge subgraphs. Then, it enters it...

A fast apriori implementation

by Ferenc Bodon - Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), volume 90 of Workshop Proceedings , 2003
"... The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation detail ..."
Abstract - Cited by 71 (2 self) - Add to MetaCart
The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation details are almost always neglected. In this paper we show that the effect of implementation can be more important than the selection of the algorithm. Ideas that seem to be quite promising, may turn out to be ineffective if we descend to the implementation level. We theoretically and experimentally analyze APRIORI which is the most established algorithm for frequent itemset mining. Several implementations of the algorithm have been put forward in the last decade. Although they are implementations of the very same algorithm, they display large differences in running time and memory need. In this paper we describe an implementation of APRIORI that outperforms all implementations known to us. We analyze, theoretically and experimentally, the principal data structure of our solution. This data structure is the main factor in the efficiency of our implementation. Moreover, we present a simple modification of APRIORI that appears to be faster than the original algorithm. 1
(Show Context)

Citation Context

...s through fewer nodes, which means fewer recursive steps which is a slow operation (subroutine call with at least five parameters in our implementation) compared to finding proper edges at a node. In =-=[12]-=- it was showed that it is advantageous to recode frequent items according to ascending order of their frequencies (i.e.: the inverse of the frequency codes) because candidate generation will be faster...

Discovering Frequent Geometric Subgraphs

by Michihiro Kuramochi, George Karypis - In IEEE Intl. Conference on Data Mining ’02 , 2002
"... As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the ..."
Abstract - Cited by 38 (1 self) - Add to MetaCart
As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally e#cient algorithm for finding frequent geometric subgraphs in a large collection of geometric graphs. Our algorithm is able to discover geometric subgraphs that can be rotation, scaling and translation invariant, and it can accommodate inherent errors on the coordinates of the vertices. We evaluated the performance of the algorithm using a large database of over 20,000 real two dimensional chemical structures, and our experimental results show that our algorithms requires relatively little time, can accommodate low support values, and scales linearly on the number of transactions.
(Show Context)

Citation Context

...s effective pruning, it achieves comparable performance with that achieved by the various depthfirst-based approaches, as long as the data set is not dense or the support value is not extremely small =-=[21,18]-=-. At the same time, the relatively simple algorithmic structure of this approach, allows us to focus on the non-trivial aspects of operating on geometric graphs. To ensure that gFSG can correctly oper...

Discovering frequent patterns in sensitive data

by Raghav Bhaskar, Srivatsan Laxman, Abhradeep Thakurta, Adam Smith
"... Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g. patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one ..."
Abstract - Cited by 37 (1 self) - Add to MetaCart
Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g. patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the K most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning ‘noisy ’ lists of patterns that are close to the actual list of K most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-K pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately ‘robust ’ measure of interest. 1.

Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs

by Zachary K. Baker, Viktor K. Prasanna
"... The Apriori algorithm is a popular correlation-based datamining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architect ..."
Abstract - Cited by 22 (1 self) - Add to MetaCart
The Apriori algorithm is a popular correlation-based datamining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be significantly reduced. Our array architecture implementation on a Xilinx Virtex-II Pro 100 provides a performance improvement that can be orders of magnitude faster than the state-of-the-art software implementations. The system is easily scalable and introduces an efficient "systolic injection " method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling.

Fast Algorithm for Mining Association Rules

by M. H. Margahny, A. A. Mitwaly
"... One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. Can one develop a method that may avoid or reduce candidate generation and test and utilize some novel data structures to reduce the cost in frequent pattern mining? This is the motivation of my study. A fast algorithm has been proposed for solving this problem. Our algorithm uses the "TreeMap " which is a structure in Java language. Also we present "Arraylist" technique that greatly reduces the need to traverse the database. Moreover we present experimental results which show our structure outperforms all existing available algorithms in all common data mining problems. Keywords: data mining, association rules, TreeMap, ArrayList.
(Show Context)

Citation Context

...d, the closed frequent itemsets from a lossless representation of all frequent itemsets since the support of those itemsets that are not closed is uniquely determined by the closed frequent itemsets. =-=[2]-=- Through our study to find patterns problem we can divide algorithms into two types: algorithms respectively with and without candidate generation. Any Aprioi-like instance belongs to the first type. ...

nonordfp: An FP-Growth Variation without Rebuilding the FP-Tree

by Bal´azs R´acz
"... We describe a frequent itemset mining algorithm and implementation based on the well-known algorithm FP-growth. The theoretical difference is the main data structure (tree), which is more compact and which we do not need to rebuild for each conditional step. We thoroughly deal with implementation is ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
We describe a frequent itemset mining algorithm and implementation based on the well-known algorithm FP-growth. The theoretical difference is the main data structure (tree), which is more compact and which we do not need to rebuild for each conditional step. We thoroughly deal with implementation issues, data structures, memory layout, I/O and library functions we use to achieve comparable performance as the best implementations of the 1st Frequent Itemset Mining Implementations (FIMI) Workshop.

Memory Issues in Frequent Itemset Mining

by Bart Goethals - Proceedings of the 2004 ACM Symposium on Applied Computing , 2004
"... During the past decade, many algorithms have been proposed to solve the frequent itemset mining problem, i.e. find all sets of items that frequently occur together in a given database of transactions. Although very e#cient techniques have been presented, they still su#er from the same problem. That ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
During the past decade, many algorithms have been proposed to solve the frequent itemset mining problem, i.e. find all sets of items that frequently occur together in a given database of transactions. Although very e#cient techniques have been presented, they still su#er from the same problem. That is, they are all inherently dependent on the amount of main memory available. Moreover, if this amount is not enough, the presented techniques are simply not applicable anymore, or significantly need to pay in performance. In this paper, we give a rigorous comparison between current state of the art techniques and present a new and simple technique, based on sorting the transaction database, resulting in a sometimes more e#cient algorithm for frequent itemset mining using less memory.
(Show Context)

Citation Context

..., we performed several experiments on a wide variety of di#erent datasets. Due to space limitations, we will only consider 2 of them in this paper. For more results, we refer the interested reader to =-=[4]-=-. Here, we present our results on a synthetic data set generated by the program provided by the Quest research group at IBM Almaden [3], which is a dense dataset that contains 100 000 transactions ove...

An architecture for efficient hardware data mining using reconfigurable computing systems

by Zachary K. Baker, Viktor K. Prasanna - Brigham Young University , 2006
"... The Apriori algorithm is a fundamental correlation-based data mining kernel used in a variety of fields. The innovation in this paper is a highly parallel custom architecture implemented on a reconfigurable computing system. Using this “bitmapped CAM, ” the time and area required for executing the s ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
The Apriori algorithm is a fundamental correlation-based data mining kernel used in a variety of fields. The innovation in this paper is a highly parallel custom architecture implemented on a reconfigurable computing system. Using this “bitmapped CAM, ” the time and area required for executing the subset operations fundamental to data mining can be significantly reduced. The bitmapped CAM architecture implementation on an FPGA-accelerated high performance workstation provides a performance acceleration of orders of magnitude over software-based systems. The bitmapped CAM utilizes redundancy within the candidate data to efficiently store and process many subset operations simultaneously. The efficiency of this operation allows 140 units to process about 2,240 subset operations simultaneously. Using industry-standard benchmarking databases, we have tested the bitmapped CAM architecture on the SRC-6E reconfigurable hardware system. The platform provides a minimum of 24x (and often much higher) time performance advantage over the fastest software Apriori implementations. 1
(Show Context)

Citation Context

...[2] is a popular approach for progressively grouping together frequent itemsets in large databases given a particular cutoff of occurrence frequency. Software implementations of the Apriori algorithm =-=[6, 7, 10]-=- utilize various tree structures, hashing methods, or approaches such as vertical mining [15] that radically change data structures to handle the support and candidate generation operations. This pape...

Apriori, a depth first implementation

by Walter A. Kosters, Wim Pijls - Proc. of the Workshop on Frequent Itemset Mining Implementations , 2003
"... We will discuss DF, the depth £rst implementation of APRIORI as devised in 1999 (see [8]). Given a database, this algorithm builds a trie in memory that contains all frequent itemsets, i.e., all sets that are contained in at least minsup transactions from the original database. Here minsup is a thre ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
We will discuss DF, the depth £rst implementation of APRIORI as devised in 1999 (see [8]). Given a database, this algorithm builds a trie in memory that contains all frequent itemsets, i.e., all sets that are contained in at least minsup transactions from the original database. Here minsup is a threshold value given in advance. In the trie, that is constructed by adding one item at a time, every path corresponds to a unique frequent itemset. We describe the algorithm in detail, derive theoretical formulas, and provide experiments. 1
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University