Results 1  10
of
72
Outlier Detection in Graph Streams
"... Abstract—A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we wil ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Abstract—A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach. I.
On Clustering Graph Streams
"... In this paper, we will examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we will examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive diskresident graphs, we will propose a technique for creating hashcompressed microclusters from graph streams. The compressed microclusters are designed by using a hashbased compression of the edges onto a smaller domain space. We will provide theoretical results which show that the hashbased compression continues to maintain bounded accuracy in terms of distance computations. We will provide experimental results which illustrate the accuracy and efficiency of the underlying method. 1
Graph Cube: On Warehousing and OLAP Multidimensional Networks
"... We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven to be effective tools for decision support on relational data. However, they are not wellequipped to handle the new yet important multidimensional networks. In this paper, we introduce Graph Cube, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks. By taking account of both attribute aggregation and structure summarization of the networks, Graph Cube goes beyond the traditional data cube model involved solely with numeric value based groupby’s, thus resulting in a more insightful and structureenriched aggregate network within every possible multidimensional space. Besides traditional cuboid queries, a new class of OLAP queries, crossboid, is introduced that is uniquely useful in multidimensional networks and has not been studied before. We implement Graph Cube by combining special characteristics of multidimensional networks with the existing wellstudied data cube techniques. We perform extensive experimental studies on a series of real world data sets and Graph Cube is shown to be a powerful and efficient tool for decision support on large multidimensional networks.
Densest Subgraph in Streaming and MapReduce
"... The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in t ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ɛ> 0, our algorithms make O(log 1+ɛ n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ɛ) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive realworld graphs showing the performance and scalability of our algorithms in practice. 1.
Capturing Topology in Graph Pattern Matching
"... Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubictime algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using reallife data and synthetic data. 1.
Managing and mining large graphs: systems and implementations
 In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12
, 2012
"... We are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are adhoc in the sense that each of them assumes that the underlying graph data can be organized in a certain way that maximizes the perform ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are adhoc in the sense that each of them assumes that the underlying graph data can be organized in a certain way that maximizes the performance of the algorithm. In other words, there is no standard graph systems based on which graph algorithms are developed and optimized. In response to this situation, a lot of graph systems have been proposed recently. In this tutorial, we discuss several representative systems. Still, we focus on providing perspectives from a variety of standpoints on the goals and the means for developing a general purpose graph system. We highlight the challenges posed by the graph data, the constraints of architectural design, the different types of application needs, and the power of different programming models that support such needs. This tutorial is complementary to the related tutorial “Managing and Mining Large Graphs: Patterns and Algorithms”. Categories and Subject Descriptors
Mining Frequent Closed Graphs on Evolving Data Streams
"... Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for st ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for studying graph pattern mining on timevarying streams. Three new methods for mining frequent closed subgraphs are presented. All methods work on coresets of closed subgraphs, compressed representations of graph sets, and maintain these sets in a batchincremental manner, but use different approaches to address potential concept drift. An evaluation study on datasets comprising up to four million graphs explores the strength and limitations of the proposed methods. To the best of our knowledge this is the first work on mining frequent closed subgraphs in nonstationary data streams.
gSketch: On Query Estimation in Graph Streams
"... Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which edges of the underlying graph are received and updated sequ ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which edges of the underlying graph are received and updated sequentially in a form of a stream. It is often necessary and important to summarize the behavior of graph streams in order to enable effective query processing. However, the sheer size and dynamic nature of graph streams present an enormous challenge to existing graph management techniques. In this paper, we propose a new graph sketch method, gSketch, which combines well studied synopses for traditional data streams with a sketch partitioning technique, to estimate and optimize the responses to basic queries on graph streams. We consider two different scenarios for query estimation: (1) A graph stream sample is available; (2) Both a graph stream sample and a query workload sample are available. Algorithms for different scenarios are designed respectively by partitioning a global sketch to a group of localized sketches in order to optimize the query estimation accuracy. We perform extensive experimental studies on both real and synthetic data sets and demonstrate the power and robustness of gSketch in comparison with the stateoftheart global sketch method. 1.
Community Detection with Edge Content in Social Media Networks
"... Abstract—The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. T ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. These dense regions are the communities of social media in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. However, in many recent applications, edge content is available in order to provide better supervision to the community detection process. Many natural representations of edges in social interactions such as shared images and videos, user tags and comments are naturally associated with content on the edges. While some work has been done on utilizing node content for community detection, the presence of edge content presents unprecedented opportunities and flexibility for the community detection process. We will show that such edge content can be leveraged in order to greatly improve the effectiveness of the community detection process in social media networks. We present experimental results illustrating the effectiveness of our approach. Index Terms—ignore I.
Towards Community Detection in Locally Heterogeneous Networks
"... In recent years, the size of many social networks such as Facebook, MySpace, andLinkedIn has exploded at a rapid pace, because of its convenience in using the internet in order to connect geographically disparate users. This has lead to considerable interest in many graphtheoretical aspects of soci ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In recent years, the size of many social networks such as Facebook, MySpace, andLinkedIn has exploded at a rapid pace, because of its convenience in using the internet in order to connect geographically disparate users. This has lead to considerable interest in many graphtheoretical aspects of social networks such as the underlying communities, the graph diameter, and other structural information which can be used in order to mine useful information from the social network. The graph structure of social networks is influenced by the underlying social behavior, which can vary considerably over different groups of individuals. One of the disadvantages of existing schemes is that they attempt to determine global communities, which (implicitly) assume uniform behavior over the network. This is not very well suited to the differences in the underlying density in different regions of the social network. As a result, a global analysis over social community structure can result in either very small communities (in sparse regions), or communities which are too large and incoherent (in dense regions). In order to handle the challenge of local heterogeneity, we will explore a simple property of social networks, which we refer to as the local succinctness property. We will use this property in order to extract compressed descriptions of the underlying community representation of the social network with the use of a minhash approach. We will show that this approach creates balanced communities across a heterogeneous network in an effective way. We apply the approach to a variety of data sets, and illustrate its effectiveness over competing techniques.