Results 1 -
4 of
4
Frequent Subgraph Discovery
, 2001
"... Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of th ..."
Abstract
-
Cited by 226 (8 self)
- Add to MetaCart
Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
An efficient algorithm for discovering frequent subgraphs
- IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract
-
Cited by 68 (5 self)
- Add to MetaCart
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
A performance comparison of five algorithms for graph isomorphism
- in Proceedings of the 3rd IAPR TC-15 Workshop on Graph-based Representations in Pattern Recognition
, 2001
"... Despite the significant number of isomorphism algorithms presented in the literature, till now no efforts have been done for characterizing their performance. Consequently, it is not clear how the behavior of those algorithms varies as the type and the size of the graphs to be matched varies in case ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Despite the significant number of isomorphism algorithms presented in the literature, till now no efforts have been done for characterizing their performance. Consequently, it is not clear how the behavior of those algorithms varies as the type and the size of the graphs to be matched varies in case of real applications. In this paper we present a benchmarking activity for characterizing the performance of a bunch of algorithms for exact graph isomorphism. To this purpose we use a large database containing 10,000 couples of isomorphic graphs with different topologies (regular graphs, randomly connected graphs, bounded valence graph), enriched with suitably modified versions of them for simulating distortions occurring in real cases. The size of the considered graphs ranges from a few nodes to about 1000 nodes. 1.
A PCA Approach for Fast Retrieval of Structural Patterns in Attributed Graphs
- Humboldt University Berlin
, 2001
"... Attributed graph (AG) is a useful data structure for representing complex patterns in a wide range of applications such as computer vision, image database retrieval, and other knowledge representation tasks where similar or exact corresponding structural patterns must be found. Existing methods for ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Attributed graph (AG) is a useful data structure for representing complex patterns in a wide range of applications such as computer vision, image database retrieval, and other knowledge representation tasks where similar or exact corresponding structural patterns must be found. Existing methods for attributed graph matching (AGM) often suffer from the combinatorial problem whereby the execution cost for finding an exact or similar match is exponentially related to the number of nodes the AG contains. In this paper, the square matching error of two AGs subject to permutations is approximately relaxed to a square matching error of two AGs subject to orthogonal transformations. Hence, the principal component analysis (PCA) algorithm can be used for the fast computation of the approximate matching error, with a considerably reduced execution complexity. Experiments demonstrate that this method works well and is robust against noise and other simple types of transformations.

