Results 1  10
of
10
Massive graph triangulation
 In ACM SIGMOD Conference on Management of Data
, 2013
"... This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices in G. The problem has been well studied in internal memory, but remains an urgent difficult challenge when G does not fit in memory, rendering any algorithm to entail frequent I/O accesses. Although previous research has attempted to tackle the challenge, the stateoftheart solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperformed the existing competitors by a factor over an order of magnitude in our extensive experimentation.
Graph Sample and Hold: A Framework for BigGraph Analytics
"... Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in c ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for biggraph analytics,
Evolutionary Network Analysis: A Survey
"... Evolutionary network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, email networks, biological networks, and social streams. When a network evolves, the results of data mining algorithms such as community detection ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Evolutionary network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, email networks, biological networks, and social streams. When a network evolves, the results of data mining algorithms such as community detection need to be correspondingly updated. Furthermore, the specific kinds of changes to the structure of the network, such as the impact on community structure or the impact on network structural parameters, such as node degrees, also needs to be analyzed. Some dynamic networks have a much faster rate of edge arrival and are referred to as network streams or graph streams. The analysis of such networks is especially challenging, because it needs to be performed with an online approach, under the onepass constraint of data streams. The incorporation of content can add further complexity to the evolution analysis process. This survey provides an overview of the vast literature on graph evolution analysis and the numerous applications that arise in different contexts.
On Anomalous Hotspot Discovery in Graph Streams
"... Abstract—Network streams have become ubiquitous in recent years because of many dynamic applications. Such streams may show localized regions of activity and evolution because of anomalous events. This paper will present methods for dynamically determining anomalous hot spots from network streams. ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Network streams have become ubiquitous in recent years because of many dynamic applications. Such streams may show localized regions of activity and evolution because of anomalous events. This paper will present methods for dynamically determining anomalous hot spots from network streams. These are localized regions of sudden activity or change in the underlying network. We will design a localized principal component analysis algorithm, which can continuously maintain the information about the changes in the different neighborhoods of the network. We will use a fast incremental eigenvector update algorithm based on von Mises iterations in a lazy way in order to efficiently maintain local correlation information. This is used to discover local change hotspots in dynamic streams. We will finally present an experimental study to demonstrate the effectiveness and efficiency of our approach. Keywordsgraph streams; anomaly detection I.
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs
"... Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multigraph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures an ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multigraph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures and predicates on node and edge attributes to match graph paths. We introduce three algebraic operators, select, traverse, and join, and a query is compiled into an execution plan containing these operators. As reachability queries access the graph elements in a random access pattern, the graph is therefore maintained in the main memory of a cluster of servers to reduce query execution time. We develop a distributed execution engine that processes a query plan in parallel on the graph servers. Since the query language is declarative, we build a query optimizer that uses graph statistics to estimate predicate selectivity. We experimentally evaluate the system performance on a cluster of 16 graph servers using synthetic graphs as well as a real graph from an application that uses reachability queries. The evaluation shows (1) the efficiency of the optimizer in reducing query execution time, (2) system scalability with the size of the graph and with the number of servers, and (3) the convenience of using declarative queries. 1.
EAGr: Supporting Continuous Egocentric Aggregate Queries over Large Dynamic Graphs
"... In this paper, we present EAGr, a system for supporting large numbers of continuous neighborhoodbased (“egocentric”) aggregate queries over large, highly dynamic, and rapidly evolving graphs. Examples of such queries include computation of personalized, tailored trends in social networks, anomal ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present EAGr, a system for supporting large numbers of continuous neighborhoodbased (“egocentric”) aggregate queries over large, highly dynamic, and rapidly evolving graphs. Examples of such queries include computation of personalized, tailored trends in social networks, anomaly or event detection in communication or financial transaction networks, local search and alerts in spatiotemporal networks, to name a few. Key challenges in supporting such continuous queries include very high update rates typically seen in these situations, large numbers of queries that need to be executed simultaneously, and stringent low latency requirements. In this paper, we propose a flexible, general, and extensible inmemory framework for executing different types of egocentric aggregate queries over large dynamic graphs with low latencies. Our framework is built around the notion of an aggregation overlay
Continuous Similarity Computation over Streaming Graphs
"... Abstract. Large network analysis is a very important topic in data mining. A signicant body of work in the area studies the problem of node similarity. One way to express node similarity is to associate with each node the set of 1hop neighbors and compute the Jaccard similarity between these sets. ..."
Abstract
 Add to MetaCart
Abstract. Large network analysis is a very important topic in data mining. A signicant body of work in the area studies the problem of node similarity. One way to express node similarity is to associate with each node the set of 1hop neighbors and compute the Jaccard similarity between these sets. This information can be used subsequently for more complex operations like link prediction, clustering or dense subgraph discovery. In this work, we study algorithms to monitor the result of a similarity join between nodes continuously, assuming a sliding window accommodating graph edges. Since the arrival of a new edge or the expiration of an existing one may change the similarity between several node pairs, the challenge is to maintain the similarity join result as eciently as possible. Our theoretical study is validated by a thorough experimental evaluation, based on realworld as well as synthetically generated graphs, demonstrating the superiority of the proposed technique in comparison to baseline approaches.
Query Optimization for Dynamic Graphs
"... Given a query graph that represents a pattern of interest, the emerging pattern detection problem can be viewed as a continuous query problem on a dynamic graph. We present an incremental algorithm for continuous query processing on dynamic graphs. The algorithm is based on the concept of query dec ..."
Abstract
 Add to MetaCart
(Show Context)
Given a query graph that represents a pattern of interest, the emerging pattern detection problem can be viewed as a continuous query problem on a dynamic graph. We present an incremental algorithm for continuous query processing on dynamic graphs. The algorithm is based on the concept of query decomposition; we decompose a query graph into smaller subgraphs and assemble the result of subqueries to find complete matches with the specified query. The novelty of our work lies in using the subgraph distributional statistics collected from the dynamic graph to generate the decomposition. We introduce a “Lazy Search " algorithm where the search strategy is decided on a vertextovertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named “Relative Selectivity " that is used to select between different query decomposition strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10100x speedups over competing approaches. 1.
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
"... Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massiv ..."
Abstract
 Add to MetaCart
Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massivescale, highresolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a “Lazy Search " algorithmwhere the search strategy is decided on a vertextovertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named “Relative Selectivity " that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10100x speedups over selectivity agnostic approaches. 1.