Results 1  10
of
25
On Graph Query Optimization in Large Networks
"... The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NPcomplete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertexatatime fashion to a more efficient pathatatime way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the stateoftheart GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks. 1.
Efficient subgraph matching on billion node graphs
 In PVLDB
, 2012
"... The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query pr ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
(Show Context)
The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either superlinear indexing time or superlinear indexing space. Unfortunately, for very large graphs, superlinear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billionnode graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on superlinear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on webscale graph data. 1.
Sapper: Subgraph indexing and approximate matching in large graphs
 PVLDB
"... ABSTRACT With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., missing edges) in the large database graph, we investigate the problem of approximate subgraph indexing, i.e., finding the occurrences of a query graph in a large database graph with (possible) missing edges. The SAPPER method is proposed to solve this problem. Utilizing the hybrid neighborhood unit structures in the index, SAPPER takes advantage of pregenerated random spanning trees and a carefully designed graph enumeration order. Real and synthetic data sets are employed to demonstrate the efficiency and scalability of our approximate subgraph indexing method.
GSPARQL: A Hybrid Engine for Querying Large Attributed Graphs
 In CIKM
, 2012
"... We propose a SPARQLlike language, GSPARQL, for querying attributed graphs. The language expresses types of queries which of large interest for applications which model their data as large graphs such as: pattern matching, reachability and shortest path queries. Each query can combine both of struc ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
We propose a SPARQLlike language, GSPARQL, for querying attributed graphs. The language expresses types of queries which of large interest for applications which model their data as large graphs such as: pattern matching, reachability and shortest path queries. Each query can combine both of structural predicates and valuebased predicates (on the attributes of the graph nodes and edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe a hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph is stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database while the execution of other parts of the query plan are processed using memorybased algorithms, as necessary. Experimental results on real datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.
An Indepth Comparison of Subgraph Isomorphism Algorithms in Graph Databases
"... Finding subgraph isomorphisms is an important problem in many applications which deal with data modeled as graphs. While this problem is NPhard, in recent years, many algorithms have been proposed to solve it in a reasonable time for real datasets using different join orders, pruning rules, and aux ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Finding subgraph isomorphisms is an important problem in many applications which deal with data modeled as graphs. While this problem is NPhard, in recent years, many algorithms have been proposed to solve it in a reasonable time for real datasets using different join orders, pruning rules, and auxiliary neighborhood information. However, since they have not been empirically compared one another in most research work, it is not clear whether the later work outperforms the earlier work. Another problem is that reported comparisons were often done using the original authors ’ binaries which were written in different programming environments. In this paper, we address these serious problems by reimplementing five stateoftheart subgraph isomorphism algorithms in a common code base and by comparing them using many realworld datasets and their query loads. Through our indepth analysis of experimental results, we report surprising empirical findings. 1.
Efficient SimRankbased Similarity Join Over Large Graphs
, 2013
"... Graphs have been widely used to model complex data in many realworld applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank ” to evaluate the s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Graphs have been widely used to model complex data in many realworld applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank ” to evaluate the similarity of two vertices in a large graph because of its generality. Note that “SimRank ” is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRankbased join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortestpath distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called hgo cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called hgo covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partitionbased framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.
Egocentric Graph Pattern Census
"... Abstract—There is increasing interest in analyzing networks of all types including social, biological, sensor, computer, and transportation networks. Broadly speaking, we may be interested in global networkwide analysis (e.g., centrality analysis, community detection) where the properties of the en ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—There is increasing interest in analyzing networks of all types including social, biological, sensor, computer, and transportation networks. Broadly speaking, we may be interested in global networkwide analysis (e.g., centrality analysis, community detection) where the properties of the entire network are of interest, or local egocentric analysis where the focus is on studying the properties of nodes (egos) by analyzing their neighborhood subgraphs. In this paper we propose and study egocentric pattern census queries, a new type of graph analysis query, where a given structural pattern is searched for in every node’s neighborhood and the counts are reported or used in further analysis. This kind of analysis is useful in many domains in social network analysis including opinion leader identification, node classification, link prediction, and role identification. We propose an SQLbased declarative language to support this class of queries, and develop a series of efficient query evaluation algorithms for it. We evaluate our algorithms on a variety of synthetically generated graphs. We also show an application of our language in a realworld scenario for predicting future collaborations from DBLP data. I.
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
, 2014
"... There is a growing need for methods that can represent and query uncertain graphs. These uncertain graphs are often the result of an information extraction and integration system that attempts to extract an entity graph or a knowledge graph from multiple unstructured sources [25], [7]. Such an inte ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
There is a growing need for methods that can represent and query uncertain graphs. These uncertain graphs are often the result of an information extraction and integration system that attempts to extract an entity graph or a knowledge graph from multiple unstructured sources [25], [7]. Such an integration typically leads to identity uncertainty, as different data sources may use different references to the same underlying realworld entities. Integration usually also introduces additional uncertainty on node attributes and edge existence. In this paper, we propose the notion of a probabilistic entity graph (PEG), a formal model that uniformly and systematically addresses these three types of uncertainty. A PEG is a probabilistic graph model that defines a distribution over possible graphs at the entity level. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: contextaware path indexing and reduction by joincandidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude.
Detecting and Forecasting Domestic Political Crises: A Graphbased Approach
"... Forecasting a domestic political crisis (DPC) in a country of interest is a very useful tool for social scientists and policy makers. A wealth of event data is now available for historical as well as prospective analysis. Using the publicly available GDELT dataset, we illustrate the use of frequent ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Forecasting a domestic political crisis (DPC) in a country of interest is a very useful tool for social scientists and policy makers. A wealth of event data is now available for historical as well as prospective analysis. Using the publicly available GDELT dataset, we illustrate the use of frequent subgraph mining to identify signatures preceding DPCs, and the predictive utility of these signatures through both qualitative and quantitative results.