Results 1  10
of
21
Discovering informative connection subgraphs in multirelational graphs
 SIGKDD Explorations
, 2005
"... Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph dis ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph discovery algorithm away from banal paths towards more “informative ” ones. Given an RDF graph a user might pose a question of the form: “What are the most relevant ways in which entity X is related to entity Y? ” the response to which is a subgraph connecting X to Y. We use our heuristics to discover informative subgraphs within RDF graphs. Our heuristics are based on weighting mechanisms derived from edge semantics suggested by the RDF schema. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs.
Pattern mining in frequent dynamic subgraphs
 IN ICDM
, 2006
"... Graphstructured data is becoming increasingly abundant in many application domains. Graph mining aims at finding interesting patterns within this data that represent novel knowledge. While current data mining deals with static graphs that do not change over time, coming years will see the advent of ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
Graphstructured data is becoming increasingly abundant in many application domains. Graph mining aims at finding interesting patterns within this data that represent novel knowledge. While current data mining deals with static graphs that do not change over time, coming years will see the advent of an increasing number of time series of graphs. In this article, we investigate how pattern mining on static graphs can be extended to time series of graphs. In particular, we are considering dynamic graphs with edge insertions and edge deletions over time. We define frequency in this setting and provide algorithmic solutions for finding frequent dynamic subgraph patterns. Existing subgraph mining algorithms can be easily integrated into our framework to make them handle dynamic graphs. Experimental results on realworld data confirm the practical feasibility of our approach.
A Survey of Frequent Subgraph Mining Algorithms
 THE KNOWLEDGE ENGINEERING REVIEW
, 2004
"... Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplica ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining, and proposed solutions to address the main research issues.
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A Onetoone Correspondence and Mining Algorithms
, 2007
"... Maximal biclique (also known as complete bipartite) subgraphs can model many applications in web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Maximal biclique (also known as complete bipartite) subgraphs can model many applications in web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without selfloops, we prove that: (i) the number of closed patterns in the adjacency matrix of G is even; (ii) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G; and (iii) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular for enumerating those maximal bicliques with a large size, where m and n are the number of edges and vertices of the graph respectively. We evaluate the high efficiency of our algorithm by comparing it to stateoftheart algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a reallife protein interaction network. In this paper, we also prove that if selfloops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even; but the maximal bicliques are exactly the same as those of the graph after removing all the selfloops.
Mining TopK Large Structural Patterns in a Massive Network
 PVLDB
"... With evergrowing popularity of social networks, web and bionetworks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose SpiderMine, a nov ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
With evergrowing popularity of social networks, web and bionetworks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose SpiderMine, a novel algorithm to efficiently mine topK largest frequent patterns from a single massive network with any userspecified probability of 1 − ϵ. Deviating from the existing edgebyedge (i.e., incremental) patterngrowth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a probabilistic mining framework to find the topk largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multiset of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods. 1.
GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph
"... Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or proteinprotein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GRAMI, a novel framework for frequent subgraph mining in a single large graph. GRAMI undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GRAMI that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGRAMI, a version supporting structural and semantic constraints, and AGRAMI, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches. 1.
Constructing Decision Trees for GraphStructured Data by Chunkingless GraphBased
 Induction”, Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Volume 3918
, 2006
"... Abstract. A decision tree is an effective means of data classification from which one can obtain rules that are easy to understand. However, decision trees cannot be conventionally constructed for data which are not explicitly expressed with attributevalue pairs such as graphstructured data. We h ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. A decision tree is an effective means of data classification from which one can obtain rules that are easy to understand. However, decision trees cannot be conventionally constructed for data which are not explicitly expressed with attributevalue pairs such as graphstructured data. We have proposed a novel algorithm, named Chunkingless GraphBased Induction (ClGBI), for extracting typical patterns from graphstructured data. ClGBI is an improved version of GraphBased Induction (GBI) which employs stepwise pair expansion (pairwise chunking) to extract typical patterns from graphstructured data, and can find overlapping patterns that cannot not be found by GBI. In this paper, we further propose an algorithm for constructing decision trees for graphstructured data using ClGBI. This decision tree construction algorithm, now called Decision Tree Chunkingless GraphBased Induction (DTClGBI), can construct a decision tree from a graphstructured dataset while simultaneously constructing attributes useful for classification using ClGBI internally. Since patterns (subgraphs) extracted by ClGBI are considered as attributes of a graph, and their existence/nonexistence are used as attribute values in DTClGBI, DTClGBI can be conceived as a tree generator equipped with feature construction capability. Experiments were conducted on both synthetic and realworld graphstructured datasets showing the usefulness and effectiveness of the algorithm.
FREQUENT SUBGRAPH MINING ALGORITHMS  A SURVEY AND FRAMEWORK FOR CLASSIFICATION
, 2012
"... Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. Graph is a natural data structure used for modeling complex objects. Frequent subgraph mining is another active research topic in data mining. A graph is a general model to represent data and has be ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. Graph is a natural data structure used for modeling complex objects. Frequent subgraph mining is another active research topic in data mining. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees. Many frequent subgraph Mining algorithms have been proposed. SPIN, SUBDUE, g_Span, FFSM, GREW are a few to mention. In this paper we present a detailed survey on frequent subgraph mining algorithms, which are used for knowledge discovery in complex objects and also propose a frame work for classification of these algorithms. The purpose is to help user to apply the techniques in a task specific manner in various application domains and to pave wave for further research.
Motif Mining in Weighted Networks
"... Abstract—Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the gtrie data structure to find instances of ksized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of coauthorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.
Baptiste Jeudy Maître de Conférences, Université Jean Monnet de SaintÉtienne
, 2014
"... pour l'obtention du grade de ..."