Results 11  20
of
410
Mining All NonDerivable Frequent Itemsets
, 2002
"... Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently sev ..."
Abstract

Cited by 127 (12 self)
 Add to MetaCart
(Show Context)
Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on reallife datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms.
Mining Frequent Patterns with Counting Inference
 Sigkdd Explorations
, 2000
"... ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^> ..."
Abstract

Cited by 113 (10 self)
 Add to MetaCart
ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^>C8&; F"7; <j1>*<G?XF"7E>.7H?Dk<G8GA>C8&?J*lU>*I I ?X mHn*o opqrks&t*u rHogv r wxv rCypqpr@sp 8:1>C8TA?I ; ?<.F*7z8&:1?/UF"7HU?=H8{F*_c p} mHn*o opqrH~ f?9<G:1FD8&:@>C8]8&:H?9<GY1=H=(FCA&8xFC_`_ A?Y1?78x71F*7HWa?l =1>C8G8?A&7H<U>C7j@?x; _ ?AGA&?X_ AF*KM_ A&?bYH?7b8a?l=1>C8G8&?A&71<`DT; 8&: W F"Y 8E>*UU?<G<&; 71J98:H?ZX1>8>Cj@>C<&?"f\H=@?A&; KE?7b8&<`UF"KE=1>CA&; 71JLNP R1NS/8&F8&:1?T8: A&??`>*I J"F*A&; 8&:HK]< e = A&; F*A&;gB@,I F*<&?`>*71Xzz>CbWGZ; 71?AB <G:1FD8&:@>C8xLNQPR1NS; <]>*KEF"7HJ8&:1?ZKEF"<8?EU; ?7b8]>CI J"F*A&; 8&:HK]< _ FCA{KE; 7H; 71J`_ A?Y1?78T=1>C8G8?A&7H<f 1.
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 112 (15 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
Mining Minimal NonRedundant Association Rules using Frequent Closed Itemsets
, 2000
"... The problem of the relevance and the usefulness of extracted association rules is of primary importance because, in the majority of cases, reallife databases lead to several thousands association rules with high condence and among which are many redundancies. Using the closure of the Galois con ..."
Abstract

Cited by 109 (11 self)
 Add to MetaCart
The problem of the relevance and the usefulness of extracted association rules is of primary importance because, in the majority of cases, reallife databases lead to several thousands association rules with high condence and among which are many redundancies. Using the closure of the Galois connection, we dene two new bases for association rules which union is a generating set for all valid association rules with support and condence. These bases are characterized using frequent closed itemsets and their generators; they consist of the nonredundant exact and approximate association rules having minimal antecedents and maximal consequents, i.e. the most relevant association rules. Algorithms for extracting these bases are presented and results of experiments carried out on reallife databases show that the proposed bases are useful, and that their generation is not time consuming.
Catching the best views of skyline: A semantic approach based on decisive subspaces
 In VLDB
, 2005
"... The skyline operator is important for multicriteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an o ..."
Abstract

Cited by 87 (12 self)
 Add to MetaCart
(Show Context)
The skyline operator is important for multicriteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an object in the skyline? Practically, users may also be interested in the skylines in any subspaces. Then, what is the relationship between the skylines in the subspaces and those in the superspaces? How can we effectively analyze the subspace skylines? Can we efficiently compute skylines in various subspaces? In this paper, we investigate the semantics of skylines, propose the subspace skyline analysis, and extend the fullspace skyline computation to subspace skyline computation. We introduce a novel notion of skyline group which essentially is a group of objects that are coincidentally in the skylines of some subspaces. We identify the decisive subspaces that qualify skyline groups in the subspace skylines. The new notions concisely capture the semantics and the structures of skylines in various subspaces. Multidimensional rollup and drilldown analysis is introduced. We also develop
CHARM: An Efficient Algorithm for Closed Association Rule Mining
 COMPUTER SCIENCE, RENSSELAER POLYTECHNIC INSTITUTE
, 1999
"... The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in th ..."
Abstract

Cited by 86 (7 self)
 Add to MetaCart
The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in the first step, instead it is sufficient to mine the set of closed frequent itemsets, which is much smaller than the set of all frequent itemsets. It is also not necessary to mine the set of all possible rules. We show that any rule between itemsets is equivalent to some rule between closed itemsets. Thus many redundant rules can be eliminated. Furthermore, we present CHARM, an efficient algorithm for mining all closed frequent itemsets. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM outperforms previous methods by an order of magnitude or more. It is also linearly scalable in the number of transactions and the number of closed itemsets found.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
Mining TopK Frequent Closed Patterns without Minimum Support
 In Proceedings of ICDM’02
, 2002
"... In this paper, we propose a new mining task: mining topk frequent closed patterns of length no less than min_l, where k is the desired number of frequent closed patterns to be mined, and min_l is the minimal length of each pattern. An efficient algorithm, called TFP, is developed for mining such pa ..."
Abstract

Cited by 79 (14 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a new mining task: mining topk frequent closed patterns of length no less than min_l, where k is the desired number of frequent closed patterns to be mined, and min_l is the minimal length of each pattern. An efficient algorithm, called TFP, is developed for mining such patterns without minimum support. Two methods, closed_node_count and descendant_sum are proposed to effectively raise support threshold and prune FPtree both during and after the construction of FPtree. During the mining process, a novel topdown and bottomup combined FPtree mining strategy is developed to speedup supportraising and closed frequent pattern discovering. In addition, a fast hashbased closed pattern verification scheme has been employed to check efficiently if a potential closed pattern is really closed. Our...
CARPENTER: Finding Closed Patterns in Long Biological Datasets
, 2003
"... new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000100, 000 columns but only 1001000 rows. ..."
Abstract

Cited by 75 (8 self)
 Add to MetaCart
(Show Context)
new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000100, 000 columns but only 1001000 rows.
Summarizing itemset patterns: a profilebased approach
 In KDD
, 2005
"... Frequentpattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequentpattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generat ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
(Show Context)
Frequentpattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequentpattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.