Results 1  10
of
28
Graph homomorphism revisited for graph matching
 PVLDB
"... In a variety of emerging applications one needs to decide whether a graph G matches another Gp, i.e., whether G has a topological structure similar to that of Gp. The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applicatio ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
In a variety of emerging applications one needs to decide whether a graph G matches another Gp, i.e., whether G has a topological structure similar to that of Gp. The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applications. This paper studies revisions of these notions, providing a full treatment from complexity to algorithms. (1) We propose phomomorphism (phom) and 11 phom, which extend graph homomorphism and subgraph isomorphism, respectively, by mapping edges from one graph to paths in another, and by measuring the similarity of nodes. (2) We introduce metrics to measure graph similarity, and several optimization problems for phom and 11 phom. (3) We show that the decision problems for phom and 11 phom are NPcomplete even for DAGs, and that the optimization problems are approximationhard. (4) Nevertheless, we provide approximation algorithms with provable guarantees on match quality. We experimentally verify the effectiveness of the revised notions and the efficiency of our algorithms in Web site matching, using reallife and synthetic data. 1.
Exploiting Dynamicity in Graphbased Traffic Analysis: Techniques and Applications
"... Network traffic can be represented by a Traffic Dispersion Graph (TDG) that contains an edge between two nodes that send a particular type of traffic (e.g., DNS) to one another. TDGs have recently been proposed as an alternative way to interpret and visualize network traffic. Previous studies have f ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Network traffic can be represented by a Traffic Dispersion Graph (TDG) that contains an edge between two nodes that send a particular type of traffic (e.g., DNS) to one another. TDGs have recently been proposed as an alternative way to interpret and visualize network traffic. Previous studies have focused on static properties of TDGs using graph snapshots in isolation. In this work, we represent network traffic with a series of related graph instances that change over time. This representation facilitates the analysis of the dynamic nature of network traffic, providing additional descriptive power. For example, DNS and P2P graph instances can appear similar when compared in isolation, but the way the DNS and P2P TDGs change over time differs significantly. To quantify the changes over time, we introduce a series of novel metrics that capture changes both in the graph structure (e.g., the average degree) and the participants (i.e., IP addresses) of a TDG. We apply our new methodologies to improve graphbased traffic classification and to detect changes in the profile of legacy applications (e.g., email).
DELTACON: A Principled MassiveGraph Similarity Function
"... How much did a network change since yesterday? How different is the wiring between Bob’s brain (a lefthanded male) and Alice’s brain (a righthanded female)? Graph similarity with known node correspondence, i.e. the detection of changes in the connectivity of graphs, arises in numerous settings. In ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
How much did a network change since yesterday? How different is the wiring between Bob’s brain (a lefthanded male) and Alice’s brain (a righthanded female)? Graph similarity with known node correspondence, i.e. the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we formally state the axioms and desired properties of the graph similarity functions, and evaluate when stateoftheart methods fail to detect crucial connectivity changes in graphs. We propose DELTACON, a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g. employees of a company, customers of a mobile carrier). Experiments on various synthetic and real graphs showcase the advantages of our method over existing similarity measures. Finally, we employ DELTACON to real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity graphs, and (b) do temporal anomaly detection in the whoemailswhom Enron graph. 1
Fast Random Walk Graph Kernel
"... Random walk graph kernel has been used as an important tool for various data mining tasks including classification and similarity computation. Despite its usefulness, however, it suffers from the expensive computational cost which is at least O(n 3) or O(m 2) for graphs with n nodes and m edges. In ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Random walk graph kernel has been used as an important tool for various data mining tasks including classification and similarity computation. Despite its usefulness, however, it suffers from the expensive computational cost which is at least O(n 3) or O(m 2) for graphs with n nodes and m edges. In this paper, we propose Ark, a set of fast algorithms for random walk graph kernel computation. Ark is based on the observation that real graphs have much lower intrinsic ranks, compared with the orders of the graphs. Ark exploits the low rank structure to quickly compute random walk graph kernels in O(n 2) or O(m) time. Experimental results show that our method is up to 97,865 × faster than the existing algorithms, while providing more than 91.3 % of the accuracies.
Dynamic Network Evolution: Models, Clustering, Anomaly Detection
"... Abstract — Traditionally, research on graph theory focused on studying graphs that are static. However, almost all real networks are dynamic in nature and large in size. Quite recently, research areas for studying the topology, evolution, applications of complex evolving networks and processes occur ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Traditionally, research on graph theory focused on studying graphs that are static. However, almost all real networks are dynamic in nature and large in size. Quite recently, research areas for studying the topology, evolution, applications of complex evolving networks and processes occurring in them and governing them attracted attention from researchers. In this work, we review the significant contributions in the literature on complex evolving networks; metrics used from degree distribution to spectral graph analysis, real world applications from biology to social sciences, problem domains from anomaly detection, dynamic graph clustering to community detection. I.
Outlier Detection for Temporal Data: A Survey
"... Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science commu ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science community. In particular, advances in hardware technology have enabled the availability of various forms of temporal data collection mechanisms, and advances in software technology have enabled a variety of data management mechanisms. This has fueled the growth of different kinds of data sets such as data streams, spatiotemporal data, distributed streams, temporal networks, and time series data, generated by a multitude of applications. There arises a need for an organized and detailed study of the work done in the area of outlier detection with respect to such temporal datasets. In this survey, we provide a comprehensive and structured overview of a large set of interesting outlier definitions for various forms of temporal data, novel techniques, and application scenarios in which specific definitions and techniques have been widely used. Index Terms—temporal outlier detection, time series data, data streams, distributed data streams, temporal networks, spatiotemporal outliers 1
Evolutionary Network Analysis: A Survey
"... Evolutionary network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, email networks, biological networks, and social streams. When a network evolves, the results of data mining algorithms such as community detection ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Evolutionary network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, email networks, biological networks, and social streams. When a network evolves, the results of data mining algorithms such as community detection need to be correspondingly updated. Furthermore, the specific kinds of changes to the structure of the network, such as the impact on community structure or the impact on network structural parameters, such as node degrees, also needs to be analyzed. Some dynamic networks have a much faster rate of edge arrival and are referred to as network streams or graph streams. The analysis of such networks is especially challenging, because it needs to be performed with an online approach, under the onepass constraint of data streams. The incorporation of content can add further complexity to the evolution analysis process. This survey provides an overview of the vast literature on graph evolution analysis and the numerous applications that arise in different contexts.
Algorithms for Graph Similarity and Subgraph Matching
, 2011
"... We deal with two independent but related problems, those of graph similarity and subgraph matching, which are both important practical problems useful in several fields of science, engineering and data analysis. For the problem of graph similarity, we develop and test a new framework for solving the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We deal with two independent but related problems, those of graph similarity and subgraph matching, which are both important practical problems useful in several fields of science, engineering and data analysis. For the problem of graph similarity, we develop and test a new framework for solving the problem using belief propagation and related ideas. For the subgraph matching problem, we develop a new algorithm based on existing techniques in the bioinformatics and data mining literature, which uncover periodic or infrequent matchings. We make substantial progress compared to the existing methods for both problems.
Local Learning for Mining Outlier Subgraphs from Network Datasets
 In: SDM
"... In the real world, various systems can be modeled using entityrelationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anom ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In the real world, various systems can be modeled using entityrelationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a coauthorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of highprobability links can be a good indicator of subgraph outlierness. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers. 1
Web graph similarity for anomaly detection (poster
 In WWW
, 2008
"... Web graphs are approximate snapshots of the web, created by search engines. Their creation is an errorprone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of grap ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an errorprone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this paper we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from wellknown document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.