Results 1  10
of
71
Outlier Detection in Graph Streams
"... Abstract—A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we wil ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Abstract—A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach. I.
On Clustering Graph Streams
"... In this paper, we will examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we will examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive diskresident graphs, we will propose a technique for creating hashcompressed microclusters from graph streams. The compressed microclusters are designed by using a hashbased compression of the edges onto a smaller domain space. We will provide theoretical results which show that the hashbased compression continues to maintain bounded accuracy in terms of distance computations. We will provide experimental results which illustrate the accuracy and efficiency of the underlying method. 1
Graph Cube: On Warehousing and OLAP Multidimensional Networks
"... We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven to be effective tools for decision support on relational data. However, they are not wellequipped to handle the new yet important multidimensional networks. In this paper, we introduce Graph Cube, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks. By taking account of both attribute aggregation and structure summarization of the networks, Graph Cube goes beyond the traditional data cube model involved solely with numeric value based groupby’s, thus resulting in a more insightful and structureenriched aggregate network within every possible multidimensional space. Besides traditional cuboid queries, a new class of OLAP queries, crossboid, is introduced that is uniquely useful in multidimensional networks and has not been studied before. We implement Graph Cube by combining special characteristics of multidimensional networks with the existing wellstudied data cube techniques. We perform extensive experimental studies on a series of real world data sets and Graph Cube is shown to be a powerful and efficient tool for decision support on large multidimensional networks.
Densest Subgraph in Streaming and MapReduce
"... The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in t ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ɛ> 0, our algorithms make O(log 1+ɛ n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ɛ) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive realworld graphs showing the performance and scalability of our algorithms in practice. 1.
Capturing Topology in Graph Pattern Matching
"... Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubictime algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using reallife data and synthetic data. 1.
Managing and mining large graphs: systems and implementations
 In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12
, 2012
"... We are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are adhoc in the sense that each of them assumes that the underlying graph data can be organized in a certain way that maximizes the perform ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are adhoc in the sense that each of them assumes that the underlying graph data can be organized in a certain way that maximizes the performance of the algorithm. In other words, there is no standard graph systems based on which graph algorithms are developed and optimized. In response to this situation, a lot of graph systems have been proposed recently. In this tutorial, we discuss several representative systems. Still, we focus on providing perspectives from a variety of standpoints on the goals and the means for developing a general purpose graph system. We highlight the challenges posed by the graph data, the constraints of architectural design, the different types of application needs, and the power of different programming models that support such needs. This tutorial is complementary to the related tutorial “Managing and Mining Large Graphs: Patterns and Algorithms”. Categories and Subject Descriptors
Mining Frequent Closed Graphs on Evolving Data Streams
"... Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for st ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for studying graph pattern mining on timevarying streams. Three new methods for mining frequent closed subgraphs are presented. All methods work on coresets of closed subgraphs, compressed representations of graph sets, and maintain these sets in a batchincremental manner, but use different approaches to address potential concept drift. An evaluation study on datasets comprising up to four million graphs explores the strength and limitations of the proposed methods. To the best of our knowledge this is the first work on mining frequent closed subgraphs in nonstationary data streams.
gSketch: On Query Estimation in Graph Streams
"... Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which edges of the underlying graph are received and updated sequ ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which edges of the underlying graph are received and updated sequentially in a form of a stream. It is often necessary and important to summarize the behavior of graph streams in order to enable effective query processing. However, the sheer size and dynamic nature of graph streams present an enormous challenge to existing graph management techniques. In this paper, we propose a new graph sketch method, gSketch, which combines well studied synopses for traditional data streams with a sketch partitioning technique, to estimate and optimize the responses to basic queries on graph streams. We consider two different scenarios for query estimation: (1) A graph stream sample is available; (2) Both a graph stream sample and a query workload sample are available. Algorithms for different scenarios are designed respectively by partitioning a global sketch to a group of localized sketches in order to optimize the query estimation accuracy. We perform extensive experimental studies on both real and synthetic data sets and demonstrate the power and robustness of gSketch in comparison with the stateoftheart global sketch method. 1.
Towards Community Detection in Locally Heterogeneous Networks
"... In recent years, the size of many social networks such as Facebook, MySpace, andLinkedIn has exploded at a rapid pace, because of its convenience in using the internet in order to connect geographically disparate users. This has lead to considerable interest in many graphtheoretical aspects of soci ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In recent years, the size of many social networks such as Facebook, MySpace, andLinkedIn has exploded at a rapid pace, because of its convenience in using the internet in order to connect geographically disparate users. This has lead to considerable interest in many graphtheoretical aspects of social networks such as the underlying communities, the graph diameter, and other structural information which can be used in order to mine useful information from the social network. The graph structure of social networks is influenced by the underlying social behavior, which can vary considerably over different groups of individuals. One of the disadvantages of existing schemes is that they attempt to determine global communities, which (implicitly) assume uniform behavior over the network. This is not very well suited to the differences in the underlying density in different regions of the social network. As a result, a global analysis over social community structure can result in either very small communities (in sparse regions), or communities which are too large and incoherent (in dense regions). In order to handle the challenge of local heterogeneity, we will explore a simple property of social networks, which we refer to as the local succinctness property. We will use this property in order to extract compressed descriptions of the underlying community representation of the social network with the use of a minhash approach. We will show that this approach creates balanced communities across a heterogeneous network in an effective way. We apply the approach to a variety of data sets, and illustrate its effectiveness over competing techniques.
Distributed Graph Pattern Matching
"... Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simula ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simulation on distributed data. To our knowledge, no such distributed algorithms are in place yet. This paper settles this question by providing evaluation algorithms and optimizations for graph simulation in a distributed setting. (1) We study the impacts of components and data locality on the evaluation of graph simulation. (2) We give an analysis of a large class of distributed algorithms, captured by a messagepassing model, for graph simulation. We also identify three complexity measures: visit times, makespan and data shipment, for analyzing the distributed algorithms, and show that these measures are essentially controversial with each other. (3) We propose distributed algorithms and optimization techniques that exploit the properties of graph simulation and the analyses of distributed algorithms. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using both reallife and synthetic data. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— graph data, data mining