Results 1 
8 of
8
Facilitating RealTime Graph Mining
"... Realtime data processing is increasingly gaining momentum as the preferred method for analytical applications. Many of these applications are built on top of large graphs with hundreds of millions of vertices and edges. A fundamental requirement for realtime processing is the ability to do increme ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Realtime data processing is increasingly gaining momentum as the preferred method for analytical applications. Many of these applications are built on top of large graphs with hundreds of millions of vertices and edges. A fundamental requirement for realtime processing is the ability to do incremental processing. However, graph algorithms are inherently difficult to compute incrementally due to data dependencies. At the same time, devising incremental graph algorithms is a challenging programming task. This paper introduces GraphInc, a system that builds on top of the Pregel model and provides efficient incremental processing of graphs. Importantly, GraphInc supports incremental computations automatically, hiding the complexity from the programmers. Programmers write graph analytics in the Pregel model without worrying about the continuous nature of the data. GraphInc integrates new data in realtime in a transparent manner, by automatically identifying opportunities for incremental processing. We discuss the basic mechanisms of GraphInc and report on the initial evaluation of our approach.
Distributed Graph Simulation: Impossibility and Possibility
"... This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel comput ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size Q  of query Q, and (b) data shipment if its total amount of data shipped is determined by Q and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2) However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on Q, Fm  and the number Vf  of nodes in G with edges across different fragments; and its data shipment depends on Q and the number Ef  of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. 1.
Query Optimization of Distributed Pattern Matching
"... Abstract—Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated q ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated query optimization techniques become critical to avoid explosions in query latency. In this paper, we introduce several query optimization techniques for distributed graph pattern matching. These techniques include (1) a SystemR style dynamic programmingbased optimization algorithm that considers both linear and bushy plans, (2) a cycle detectionbased algorithm that leverages cycles to reduce intermediate result set sizes, and (3) a computation reusing technique that eliminates redundant query execution and data transfer over the network. Experimental results show that these algorithms can lead to an order of magnitude improvement in query performance. I.
Techniques for Graph Analytics on Big Data
"... Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conce ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conceptually, we want to find subgraphs that match a pattern in a given graph. Much work has been done in this field with solutions like Subgraph Isomorphism and Regular Expression matching. With Big Data, scientists are frequently running into massive graphs that have amplified the challenge that this area poses. We study the speedup and communication behavior of three distributed algorithms for inexact graph pattern matching. We also study the impact of different graph partitionings on runtime and network I/O. Our extensive results show that the algorithms exhibit excellent scalable behavior and mincut partitioning can lead to improved performance under some circumstances, and can drastically reduce the network traffic as well. Keywordsgraph analytics; big data; graph simulation; parallel and distributed algorithms. I.
Towards Efficient Query Processing on Massive TimeEvolving Graphs
"... Abstract—Time evolving graph (TEG) is increasingly being used as a paradigm for modeling and analyzing dynamic relationships in many emerging domains such as online social networks, World Wide Web and evolutionary genomics. A timeevolving graph consists of a sequence of snapshots of the graph as it ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Time evolving graph (TEG) is increasingly being used as a paradigm for modeling and analyzing dynamic relationships in many emerging domains such as online social networks, World Wide Web and evolutionary genomics. A timeevolving graph consists of a sequence of snapshots of the graph as it evolves over time. The ability to scalably process various types of queries on massive TEGs is central to building powerful analytic applications for these domains. Unfortunately, indexing techniques and cluster computing schemes that have been designed for static graphs are not very effective for processing massive TEGs. Towards designing scalable mechanisms for answering TEG queries, this paper studies three important problems. The first is the distribution of TEG data on the nodes of a cluster computing framework such as Pregel or Giraph so that the computing and communication resources of the cluster are effectively harnessed. The second is the answering of reachability queries on any snapshot of a TEG and the third is that of processing pattern matching queries in TEGs. For each problem, we provide a brief literature survey and explain why trivial extensions of static graph techniques are not adequate for TEGs. We also present our preliminary ideas towards addressing these problems and discuss their benefits. I.
10.1109/TKDE.2015.2429138, IEEE Transactions on Knowledge and Data Engineering 1 Answering Pattern Queries Using Views
"... Abstract—Answering queries using views has proven effective for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on graph simulation. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern v ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Answering queries using views has proven effective for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on graph simulation. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern views. We show that a pattern query can be answered using a set of views if and only if it is contained in the views. Based on this characterization, we develop efficient algorithms to answer graph pattern queries. We also study problems for determining (minimal, minimum) containment of pattern queries. We establish their complexity (from cubictime to NPcomplete) and provide efficient checking algorithms (approximation when the problem is intractable). In addition, when a pattern query is not contained in the views, we study maximally contained rewriting to find approximate answers; we show that it is in cubictime to compute such rewriting, and present a rewriting algorithm. We experimentally verify that these methods are able to efficiently answer pattern queries on large realworld graphs. 1
Processing SPARQL Queries Over Linked Data— A Distributed Graphbased Approach
"... We propose techniques for processing SPARQL queries over linked data. We follow a graphbased approach where answering a query Q is equivalent to finding its matches over a distributed RDF data graph G. We adopt a “partial evaluation and assembly ” framework. Partial evaluation results of query Q ov ..."
Abstract
 Add to MetaCart
(Show Context)
We propose techniques for processing SPARQL queries over linked data. We follow a graphbased approach where answering a query Q is equivalent to finding its matches over a distributed RDF data graph G. We adopt a “partial evaluation and assembly ” framework. Partial evaluation results of query Q over each repository—called local partial match—are found. In the assembly stage, we propose a centralized and a distributed assembly strategy. We analyze our algorithms both theoretically and the experimentally. Extensive experiments over both real and benchmark RDF repositories with billion triples demonstrate the high performance and scalability of our methods compared with that of the existing solutions. 1.