Results 1  10
of
153
Frequent Subgraph Discovery
, 2001
"... Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of th ..."
Abstract

Cited by 406 (10 self)
 Add to MetaCart
(Show Context)
Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
CHARM: An efficient algorithm for closed itemset mining
, 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract

Cited by 320 (14 self)
 Add to MetaCart
(Show Context)
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hashbased approach to remove any “nonclosed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
Graph Indexing: A Frequent Structurebased Approach
, 2004
"... Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graphbased indices. In this paper, we investigate the is ..."
Abstract

Cited by 201 (25 self)
 Add to MetaCart
Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graphbased indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Di#erent from the existing pathbased methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, sizeincreasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 310 times better performance in comparison with a typical pathbased method, GraphGrep. The gIndex approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.
Efficiently Using Prefixtrees in Mining Frequent Itemsets
, 2003
"... Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefixtree structure, known as an FPtree, for storing compressed information about frequent itemsets. ..."
Abstract

Cited by 180 (1 self)
 Add to MetaCart
Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefixtree structure, known as an FPtree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper we present a novel arraybased technique that greatly reduces the need to traverse FPtrees, thus obtaining significantly improved performance for FPtree based algorithms. Our technique works especially well for sparse datasets. Furthermore,
Efficiently mining maximal frequent itemsets
 In ICDM
, 2001
"... We present GenMax, a backtrack search based algorithm for mining maximal frequent itemsets. GenMax uses a number of optimizations to prune the search space. It uses a novel technique called progressive focusing to perform maximality checking, and diffset propagation to perform fast frequency computa ..."
Abstract

Cited by 161 (11 self)
 Add to MetaCart
(Show Context)
We present GenMax, a backtrack search based algorithm for mining maximal frequent itemsets. GenMax uses a number of optimizations to prune the search space. It uses a novel technique called progressive focusing to perform maximality checking, and diffset propagation to perform fast frequency computation. Systematic experimental comparison with previous work indicates that different methods have varying strengths and weaknesses based on dataset characteristics. We found GenMax to be a highly efficient method to mine the exact set of maximal patterns. 1
Frequent SubStructureBased Approaches for Classifying Chemical Compounds
 In Proceedings of ICDM’03
, 2003
"... In this paper we study the problem of classifying chemical compound datasets. We present a substructurebased classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topologi ..."
Abstract

Cited by 140 (6 self)
 Add to MetaCart
In this paper we study the problem of classifying chemical compound datasets. We present a substructurebased classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset. The advantage of our approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and outperforms existing schemes by 10% to 35%, on the average.
Finding frequent patterns in a large sparse graph
 SIAM Data Mining Conference
, 2004
"... This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine ..."
Abstract

Cited by 130 (4 self)
 Add to MetaCart
(Show Context)
This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edgedisjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform a previously developed algorithm.
An efficient algorithm for discovering frequent subgraphs
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract

Cited by 120 (7 self)
 Add to MetaCart
(Show Context)
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window
 In ICDM
, 2004
"... This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory const ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a slidingwindow. The selected itemsets consist of a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from nonfrequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than previous approaches.