by Yun Chi, Yirong Yang, Yi Xia, Richard R. Muntz
In The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04
http://www.cs.ucla.edu/~ychi/./publication/cmtreeminer_pakdd04.pdf
Add To MetaCart
Abstract:
Abstract. Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers all closed and maximal frequent subtrees in a database of rooted unordered trees. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees, while using an enumeration DAG to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. The enumeration tree and the enumeration DAG are defined based on a canonical form for rooted unordered trees–the depth-first canonical form (DFCF). We compare the performance of our algorithm with that of PathJoin, a recently published algorithm that mines maximal frequent subtrees.
Citations
|
2005
|
The Design and Analysis of Computer Algorithms
– Aho, Hopcroft, et al.
- 1974
|
|
537
|
Mining frequent patterns without candidate generation
– Han, Pei, et al.
- 2000
|
|
191
|
Discovering frequent closed itemsets for association rules
– Pasquier, Bastide, et al.
|
|
162
|
gSpan, “Graph-based substructure pattern mining
– Yan, Han
- 2002
|
|
124
|
Frequent subgraph discovery
– Kuramochi, Karypis
- 2001
|
|
122
|
An Apriori-based algorithm for mining frequent substructures from graph data
– Inokuchi, Washio, et al.
- 2000
|
|
97
|
Efficiently mining frequent trees in a forest
– Zaki
- 2002
|
|
81
|
Closegraph: mining closed frequent graph patterns
– Yan, Han
- 2003
|
|
57
|
Efficient Substructure Discovery from Large Semi-structured Data
– Asai, Abe, et al.
- 2002
|
|
53
|
Efficient mining of frequent subgraph in the presence of isomorphism
– Huan, Wang, et al.
- 2003
|
|
50
|
Modeling the branching characteristics and efficiency gains in global multicast trees
– Chalmers, Almeroth
- 2001
|
|
50
|
R.: Algorthmics and Applications of Tree and Graph Searching
– Shasha, Wang, et al.
|
|
39
|
XRules: An Effective Structural Classifier for XML Data
– Zaki, Aggarwal
- 2003
|
|
38
|
On the complexity of comparing evolutionary trees
– Hein, Jiang, et al.
- 1996
|
|
35
|
Treefinder: A First Step Towards XML Data Mining
– Termier, Rousset, et al.
- 2002
|
|
28
|
Discovering Frequent Substructures in Large Unordered Trees
– Asai, Arimura, et al.
|
|
23
|
Indexing and mining free trees
– Chi, Yang, et al.
- 2003
|
|
22
|
On the Topology of Multicast Trees
– Chalmers, Almeroth
- 2003
|
|
16
|
Efficient data mining for maximal frequent subtrees
– Xiao, Yao, et al.
- 2003
|
|
9
|
Aggregated multicast–a comparative study
– Cui, Kim, et al.
- 2002
|
|
6
|
Mining frequent rooted trees and free trees using canonical forms
– Chi, Yang, et al.
- 2003
|
|
6
|
WWWPal system–a system for analysis and synthesis of web pages
– Punin, Krishnamoorthy
- 1998
|