#### DMCA

## Mining Frequent Closed Graphs on Evolving Data Streams

Citations: | 8 - 0 self |

### Citations

630 | gSpan: Graph-based substructure pattern mining
- Yan, Han
- 2002
(Show Context)
Citation Context ... in this paper, as it does not use coresets, or weighted frequent mining techniques. In terms of graphs, two main algorithms exist for mining frequent closed graphs: • CloseGraph [30]: based on gSpan =-=[29]-=-, a miner for finding frequent subgraphs, based on depth-first search (DFS) • MoSS [13]: an extension to MoFa [11] based on breadthfirst search (BFS). Aggarwal et al. [3] present a mining methodology ... |

266 | Maintaining stream statistics over sliding windows
- Datar, Gionis, et al.
- 2002
(Show Context)
Citation Context ...t to all algorithms dealing with random processes. It is important to note that ADWIN does not maintain the window explicitly, but compresses it using a variant of the exponential histogram technique =-=[20]-=-. This means that it keeps a window of length W using only O(log W ) memory and O(log W ) processing time per item, rather than the O(W ) one expects from a naïve implementation. The main technical re... |

251 | CloseGraphs: Mining Closed Frequent Graph Patterns
- Yan, Han
(Show Context)
Citation Context ...all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences [31, 28], and trees [18, 6, 26, 5], but only two for frequent closed graphs =-=[30, 13]-=-, and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in a data stream. We develop three closed graph algorithms: IncGraphMiner, an ... |

215 | CloSpan: Mining closed sequential patterns in large datasets
- Yan, Han, et al.
- 2003
(Show Context)
Citation Context ...t of frequent closed subgraphs maintains the same information as the set of all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences =-=[31, 28]-=-, and trees [18, 6, 26, 5], but only two for frequent closed graphs [30, 13], and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in... |

175 |
Frequent pattern mining: current status and future directions
- Han, Cheng, et al.
- 2007
(Show Context)
Citation Context ...l and useful knowledge from graph data [19, 4]. Due to novel applications in social networks, chemical informatics, bioinformatics, communication networks, computer vision, video indexing and the Web =-=[21]-=-, more and more large-scale graphs and sets of graphs are becoming available for analysis. Frequent pattern mining on graphs is one of the ways to obtain Permission to make digital or hard copies of a... |

158 | Mining molecular fragments: Finding relevant substructures of molecules
- Borgelt
- 2002
(Show Context)
Citation Context ...algorithms exist for mining frequent closed graphs: • CloseGraph [30]: based on gSpan [29], a miner for finding frequent subgraphs, based on depth-first search (DFS) • MoSS [13]: an extension to MoFa =-=[11]-=- based on breadthfirst search (BFS). Aggarwal et al. [3] present a mining methodology to find frequent and dense patterns in graph streams. Their notion of density is based both on node-occurrence and... |

149 | BIDE: efficient mining of frequent closed sequences
- Wang, Han
- 2004
(Show Context)
Citation Context ...t of frequent closed subgraphs maintains the same information as the set of all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences =-=[31, 28]-=-, and trees [18, 6, 26, 5], but only two for frequent closed graphs [30, 13], and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in... |

83 | Geometric approximation via coresets
- Agarwal, Har-peled, et al.
- 2005
(Show Context)
Citation Context ... can observe that when adding to or removing from the weights of transactions, we are not modifying the number of closed patterns provided they remain frequent. 4. CORESETS OF CLOSED GRAPHS A coreset =-=[2]-=- of a set P with respect to some problem is a small subset that approximates the original set P , in the sense that solving the problem for the coreset provides an approximate solution for the problem... |

78 | Moment: Maintaining closed frequent itemsets over a stream sliding window
- Chi, Wang, et al.
- 2004
(Show Context)
Citation Context ... classes depending on whether they use a landmark window, containing all the examples seen so far, or a sliding window. Only a small fraction of these methods deal with frequent closed mining. Moment =-=[17]-=-, CFI-Stream [24] and IncMine [22] are state-of-the-art algorithms for mining frequent closed itemsets over a sliding window. CFI-Stream stores only closed itemsets in memory, but maintains all closed... |

75 | Fg-index: towards verification-free query processing on graph databases
- Cheng, Ke, et al.
- 2007
(Show Context)
Citation Context ...ighted support as it has. A graph g is maximal if none of its proper supergraphs is frequent. All maximal graphs are closed but not necessarily otherwise. We define a graph g to be δ-tolerance closed =-=[16, 25]-=- if none of its proper frequent supergraphs has a weighted support larger than or equal to (1 − δ) · support(g). Note that a maximal graph is a 1-tolerance closed graph, and a closed graph is a 0-tole... |

70 | Managing and Mining Graph Data
- Aggarwal, Wang
- 2010
(Show Context)
Citation Context ...Mining General Terms Algorithms Keywords Data streams, closed mining, graphs, concept drift 1. INTRODUCTION Graph mining is a challenging task that extracts novel and useful knowledge from graph data =-=[19, 4]-=-. Due to novel applications in social networks, chemical informatics, bioinformatics, communication networks, computer vision, video indexing and the Web [21], more and more large-scale graphs and set... |

69 | New ensemble methods for evolving data streams
- Bifet, Holmes, et al.
- 2009
(Show Context)
Citation Context ...l data stream with exactly 15 frequent closed graphs generated artificially. Concept drift occurs three times, after 250,000, 500,000, and 750,000 examples, respectively. We follow the methodology in =-=[9]-=- to combine two data streams into one in order to create artificial drift. We observe that the incremental method is the slowest to adapt, as it does not have any forgetting mechanism. Comparing the a... |

66 | Learning from time-changing data with adaptive windowing
- Bifet, Gavaldà
- 2007
(Show Context)
Citation Context ... an option, however, it has the cost of maintaining it in memory. In this paper, we propose to use ADWIN to estimate frequencies of graphs with theoretical guarantees. ADWIN (ADaptive sliding WINdow) =-=[7]-=- is a change detector and estimation algorithm. It solves, in a well-specified way, the problem of tracking the average of a stream of bits or real-valued numbers. ADWIN keeps a variable-length window... |

38 | Mining closed and maximal frequent subtrees from databases of labeled rooted trees
- Chi, Xia, et al.
(Show Context)
Citation Context ... subgraphs maintains the same information as the set of all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences [31, 28], and trees =-=[18, 6, 26, 5]-=-, but only two for frequent closed graphs [30, 13], and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in a data stream. We develop... |

33 | Chemdb: a public database of small molecules and related chemoinformatics resources
- Chen, Swamidass, et al.
- 2005
(Show Context)
Citation Context ...S/MoFa algorithm, this framework also contains the gSpan [29] and CloseGraph [30] algorithms as special processing modes. For our experiments we use the following real datasets: ChemDB dataset ChemDB =-=[14, 15]-=- is a public dataset of approximately 4 million molecules built using the digital catalogs of over a hundred vendors and other public60 Number of Closed Graphs 50 40 30 20 10 0 10.000 60.000 110.000 ... |

29 |
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
- Jiang, Gruenwald
- 2006
(Show Context)
Citation Context ...g on whether they use a landmark window, containing all the examples seen so far, or a sliding window. Only a small fraction of these methods deal with frequent closed mining. Moment [17], CFI-Stream =-=[24]-=- and IncMine [22] are state-of-the-art algorithms for mining frequent closed itemsets over a sliding window. CFI-Stream stores only closed itemsets in memory, but maintains all closed itemsets as it d... |

28 |
Comparison of the NCI open database with seven large chemical structural databases
- Voigt, Bienfait, et al.
- 2001
(Show Context)
Citation Context ...ically whenever possible. It is maintained by the Institute for Genomics and Bioinformatics at the University of California, Irvine. Open NCI Database The open National Cancer Institute (NCI) dataset =-=[27]-=- consists of approximately 250,000 structures. It is based on a large NCI database, built using samples from organic synthesis submitted to NCI for testing. While about half of the NCI database is not... |

25 | Massive Online Analysis, a Framework for Stream Classification
- Bifet, Holmes, et al.
- 2010
(Show Context)
Citation Context ...w the ✷ benefits in memory and time. Second, we perform experiments on data streams with synthetic concept drift to show the performance of our adaptive strategy. We run our experiments extending MOA =-=[10]-=- using MoSS [12]. All experiments were performed on a 2.66 GHz Core 2 Duo machine with 64 GB of memory main memory, running CentOS 5.5. Massive Online Analysis (MOA) [10] is a framework for online lea... |

23 | A survey on algorithms for mining frequent itemsets over data streams
- Cheng, Ke, et al.
- 2008
(Show Context)
Citation Context ...ning framework. Experimental results are given in Section 7, and conclusions are drawn in Section 8. 1.1 Related Work There is a large body of work on itemset mining from data streams; see the survey =-=[23]-=- and the references therein. We can divide these data stream methods into two different classes depending on whether they use a landmark window, containing all the examples seen so far, or a sliding w... |

21 | StreamKM++: A clustering algorithms for data streams
- Ackermann, Lammersen, et al.
- 2012
(Show Context)
Citation Context ...t P . This notion was introduced in computational geometry to denote a small subset of points that can be used to obtain approximate solutions with theoretical guarantees. For example, for clustering =-=[1]-=-, a coreset for a set is a small set, such that for any set of cluster centers the clustering cost of the coreset is an approximation for the clustering cost of the original set with small relative er... |

20 | An output-polynomial time algorithm for mining frequent closed attribute trees
- Arimura, Uno
- 2005
(Show Context)
Citation Context ... subgraphs maintains the same information as the set of all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences [31, 28], and trees =-=[18, 6, 26, 5]-=-, but only two for frequent closed graphs [30, 13], and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in a data stream. We develop... |

13 |
On dense pattern mining in graph streams
- AGGARWAL, LI, et al.
(Show Context)
Citation Context ...seGraph [30]: based on gSpan [29], a miner for finding frequent subgraphs, based on depth-first search (DFS) • MoSS [13]: an extension to MoFa [11] based on breadthfirst search (BFS). Aggarwal et al. =-=[3]-=- present a mining methodology to find frequent and dense patterns in graph streams. Their notion of density is based both on node-occurrence and edge density, and they present an approach based on fin... |

13 | Mining adaptively frequent closed unlabeled rooted trees in data streams
- Bifet, Gavaldà
- 2008
(Show Context)
Citation Context ... reduce the number of patterns found. IncMine proposes a notion of semi-FCIs that increases the minimum support threshold for an itemset as it is retained longer in the window. For trees, the work in =-=[8]-=- shows a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. This approach is based on an efficient representation of trees and a low complexity notion of re... |

9 | Advanced Pruning Strategies to Speed Up Mining Closed Molecular Fragments
- Borgelt, Meinl, et al.
- 2004
(Show Context)
Citation Context ...all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences [31, 28], and trees [18, 6, 26, 5], but only two for frequent closed graphs =-=[30, 13]-=-, and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in a data stream. We develop three closed graph algorithms: IncGraphMiner, an ... |

9 | Maintaining frequent closed itemsets over a sliding window
- Cheng, Ng
- 2008
(Show Context)
Citation Context ... use a landmark window, containing all the examples seen so far, or a sliding window. Only a small fraction of these methods deal with frequent closed mining. Moment [17], CFI-Stream [24] and IncMine =-=[22]-=- are state-of-the-art algorithms for mining frequent closed itemsets over a sliding window. CFI-Stream stores only closed itemsets in memory, but maintains all closed itemsets as it does not apply a m... |

7 |
an efficient and robust closed attribute tree mining algorithm
- DRYADEPARENT
(Show Context)
Citation Context ... subgraphs maintains the same information as the set of all frequent subgraphs. There are many methods for computing frequent closed itemsets (see [21]), frequent closed sequences [31, 28], and trees =-=[18, 6, 26, 5]-=-, but only two for frequent closed graphs [30, 13], and none for frequent closed graphs on data streams. We propose the first general methodology to identify closed graphs in a data stream. We develop... |

5 |
Mining Frequent Closed Rooted Trees
- Balczar, Bifet, et al.
- 2010
(Show Context)
Citation Context |

4 | Moss: a program for molecular substructure mining
- Borgelt, Meinl, et al.
- 2005
(Show Context)
Citation Context ... all possible graphs are reached, but reduces the generation of duplicate graphs. CloseGraph selects theminimum DFS code based on a DFS lexicographical order as the canonical representative. In MoSS =-=[12]-=-, Borgelt et al. present a different method to perform frequent closed graph mining using breadth-first search instead of using the right-most extension approach. CloseGraph(g, D, min sup, S) Input: A... |

4 |
ChemDB update – Full-text search and virtual chemical space
- Chen, Linstead, et al.
- 2007
(Show Context)
Citation Context ...S/MoFa algorithm, this framework also contains the gSpan [29] and CloseGraph [30] algorithms as special processing modes. For our experiments we use the following real datasets: ChemDB dataset ChemDB =-=[14, 15]-=- is a public dataset of approximately 4 million molecules built using the digital catalogs of over a hundred vendors and other public60 Number of Closed Graphs 50 40 30 20 10 0 10.000 60.000 110.000 ... |

4 |
Efficiently mining δ-tolerance closed frequent subgraphs
- Takigawa, Mamitsuka
- 2011
(Show Context)
Citation Context ...ighted support as it has. A graph g is maximal if none of its proper supergraphs is frequent. All maximal graphs are closed but not necessarily otherwise. We define a graph g to be δ-tolerance closed =-=[16, 25]-=- if none of its proper frequent supergraphs has a weighted support larger than or equal to (1 − δ) · support(g). Note that a maximal graph is a 1-tolerance closed graph, and a closed graph is a 0-tole... |