Results 1 
4 of
4
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
"... As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent i ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s ∗ for a dataset, such that the number of itemsets with support at least s ∗ represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. Our methodology hinges on a Poisson approximation to the Harvard School of Engineering and Applied Sciences, Cambridge,
A probability analysis for candidatebased frequent itemset algorithms
 In Proceedings of the 2006 ACM Symposium on Applied Computing, DM track, volume 1 of 2
, 2006
"... This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a select ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a selection of candidatebased frequent itemset mining algorithms, the probabilities of these events are studied for the shopping model where all the shoppers are independent and each combination of items has its own probability, so any correlation between items is possible. The Apriori Algorithm is considered in detail; for AIS, Eclat, FPgrowth and the Fast Completion Apriori Algorithm, the main principles are sketched. The results of the analysis are used to compare the behaviour of the algorithms for a variety of data distributions. 1.
PeakJumping Frequent Itemset Mining Algorithms
"... Abstract. We analyze algorithms that, under the right circumstances, permit efficient mining for frequent itemsets in data with tall peaks (large frequent itemsets). We develop a family of levelbylevel peakjumping algorithms, and study them using a simple probability model. The analysis clarifies ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We analyze algorithms that, under the right circumstances, permit efficient mining for frequent itemsets in data with tall peaks (large frequent itemsets). We develop a family of levelbylevel peakjumping algorithms, and study them using a simple probability model. The analysis clarifies why the jumping idea sometimes works well, and which properties the data needs to have for this to be the case. The link with MaxMiner arises in a natural way and the analysis makes clear the role and importance of each major idea used in this algorithm. 1