Results 1 
6 of
6
Efficient subgraph matching on billion node graphs
 In PVLDB
, 2012
"... The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query pr ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
(Show Context)
The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either superlinear indexing time or superlinear indexing space. Unfortunately, for very large graphs, superlinear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billionnode graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on superlinear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on webscale graph data. 1.
The Pursuit of a Good Possible World: Extracting Representative Instances of Uncertain Graphs
"... Data in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Even approximate pr ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Data in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Even approximate processing based on sampling is usually extremely expensive since it requires a vast number of samples to achieve reasonable quality guarantees. To overcome these problems, we propose algorithms for creating deterministic representative instances of uncertain graphs that maintain the underlying graph properties. Specifically, our algorithms aim at preserving the expected vertex degrees because they capture well the graph topology. Conventional processing techniques can then be applied on these instances to closely approximate the result on the uncertain graph. We experimentally demonstrate, with real and synthetic uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several properties such as shortest path distance, clustering coefficient and betweenness centrality.
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
, 2014
"... There is a growing need for methods that can represent and query uncertain graphs. These uncertain graphs are often the result of an information extraction and integration system that attempts to extract an entity graph or a knowledge graph from multiple unstructured sources [25], [7]. Such an inte ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
There is a growing need for methods that can represent and query uncertain graphs. These uncertain graphs are often the result of an information extraction and integration system that attempts to extract an entity graph or a knowledge graph from multiple unstructured sources [25], [7]. Such an integration typically leads to identity uncertainty, as different data sources may use different references to the same underlying realworld entities. Integration usually also introduces additional uncertainty on node attributes and edge existence. In this paper, we propose the notion of a probabilistic entity graph (PEG), a formal model that uniformly and systematically addresses these three types of uncertainty. A PEG is a probabilistic graph model that defines a distribution over possible graphs at the entity level. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: contextaware path indexing and reduction by joincandidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude.
On Uncertain Graphs Modeling and Queries
"... ABSTRACT Largescale, highlyinterconnected networks pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction mode ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Largescale, highlyinterconnected networks pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database research community. While many classical graph algorithms such as reachability and shortest path queries become #Pcomplete, and hence, more expensive in uncertain graphs; various complex queries are also emerging over uncertain networks, such as pattern matching, information diffusion, and influence maximization queries. In this tutorial, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical and emerging graph queries. We emphasize the current challenges and highlight some future research directions.
OUTLIER DETECTION FOR INFORMATION NETWORKS
, 2013
"... The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and ma ..."
Abstract
 Add to MetaCart
(Show Context)
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in highdimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers. For community based outliers, we discuss the problem in both static as well as dynamic settings.
A Survey on Efficient Clustering Methods with Effective Pruning Techniques for Probabilistic Graphs
"... This paper provides a survey on KNN queries, DCR query, agglomerative complete linkage clustering and Extension of editdistancebased definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information ..."
Abstract
 Add to MetaCart
(Show Context)
This paper provides a survey on KNN queries, DCR query, agglomerative complete linkage clustering and Extension of editdistancebased definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information into clusters per their similarities, and variety of algorithms are planned for agglomeration graphs, the pKwik Cluster algorithm, spectral agglomeration, kpath agglomeration, etc. However, very little analysis has been performed to develop efficient agglomeration algorithms for probabilistic graphs. Finally, The Graph algorithm to understand how to mining can be done efficiently. This survey introduced to design algorithm for searching and to evaluate the algorithm throw analysis.