Results 1  10
of
52
Mining All NonDerivable Frequent Itemsets
, 2002
"... Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently sev ..."
Abstract

Cited by 127 (12 self)
 Add to MetaCart
(Show Context)
Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on reallife datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms.
Freesets: a condensed representation of Boolean data for the approximation of frequency queries
 Data Mining and Knowledge Discovery
, 2003
"... Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can ..."
Abstract

Cited by 105 (20 self)
 Add to MetaCart
(Show Context)
Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ɛadequate representations (H. Mannila and H. Toivonen, 1996. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 189–194). We show that frequent freesets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemset extraction. Furthermore, the experiments show that the extraction of frequent freesets is still possible when the extraction of frequent itemsets becomes intractable, and that the supports of the frequent freesets can be used to approximate very closely the supports of the frequent itemsets. Finally, we consider the effect of this approximation on association rules (a popular kind of patterns that can be derived from frequent itemsets) and show that the corresponding errors remain very low in practice.
On private scalar product computation for privacypreserving data mining
 In Proceedings of the 7th Annual International Conference in Information Security and Cryptology
, 2004
"... Abstract. In mining and integrating data from multiple sources, there are many privacy and security issues. In several different contexts, the security of the full privacypreserving data mining protocol depends on the security of the underlying private scalar product protocol. We show that two of t ..."
Abstract

Cited by 75 (4 self)
 Add to MetaCart
Abstract. In mining and integrating data from multiple sources, there are many privacy and security issues. In several different contexts, the security of the full privacypreserving data mining protocol depends on the security of the underlying private scalar product protocol. We show that two of the private scalar product protocols, one of which was proposed in a leading data mining conference, are insecure. We then describe a provably private scalar product protocol that is based on homomorphic encryption and improve its efficiency so that it can also be used on massive datasets. Keywords: Privacypreserving data mining, private scalar product protocol, vertically partitioned frequent pattern mining 1
Approximation of frequency queries by means of freesets
, 2000
"... Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can a ..."
Abstract

Cited by 72 (27 self)
 Add to MetaCart
(Show Context)
Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ɛadequate representation [10].We show that frequent freesets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent freesets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice. 1
Depthfirst nonderivable itemset mining
 In SIAM Int. Conf. on Data Mining (SDM’05
, 2005
"... Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, indepe ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
(Show Context)
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the nonderivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of nonderivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadthfirst, Aprioribased algorithm, called NDI, to find all nonderivable itemsets was proposed. In this paper we present a depthfirst algorithm, dfNDI, that is based on Eclat for mining the nonderivable itemsets. dfNDI is evaluated on reallife datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude. 1
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1
Constraintbased concept mining and its application to microarray data analysis
 Intell. Data Anal
, 2005
"... data analysis ..."
Minimal kFree Representations of Frequent Sets
"... Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets. ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets.
TFP: An Efficient Algorithm for Mining TopK Frequent Closed Itemsets
 IEEE Trans. on Knowledge and Data Engineering
, 2005
"... Abstract—Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropri ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining topk frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of topk frequent closed itemsets, min_support can be raised effectively and FPTree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a topdown and bottomup combined FPTree traversing strategy, a set of search space pruning methods, a fast 2level hashindexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size. Index Terms—Data mining, frequent itemset, association rules, mining methods and algorithms. 1
Simplest Rules Characterizing Classes Generated by δFree Sets
, 2002
"... We present a new approach that provides the simplest rules characterizing classes with respect to their lefthand sides. This approach is based on a condensed representation (free sets) of data which is eciently computed. Produced rules have a minimal body (i.e. any subset of the lefthand side ..."
Abstract

Cited by 24 (13 self)
 Add to MetaCart
We present a new approach that provides the simplest rules characterizing classes with respect to their lefthand sides. This approach is based on a condensed representation (free sets) of data which is eciently computed. Produced rules have a minimal body (i.e. any subset of the lefthand side of a rule does not enable to conclude on the same class value). We show a sensible sucient condition that avoids important classi cation conicts. Experiments show that the number of rules characterizing classes drastically decreases. The technique is operational for large data sets and can be used even in the dicult context of highlycorrelated data where other algorithms fail.