Results 1  10
of
10
Linearized and singlepass belief propagation
, 2014
"... How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily (“birds of a feather flock together”) or heterophily (“opposites attract”). One of the most widely used methods for this kind of inference is Belief Propagation (BP) which iteratively propagates the information from a few nodes with explicit labels throughout a network until convergence. A wellknown problem with BP, however, is that there are no known exact guarantees of convergence in graphs with loops. This paper introduces Linearized Belief Propagation (LinBP), a linearization of BP that allows a closedform solution via intuitive matrix equations and, thus, comes with exact convergence guarantees. It handles homophily, heterophily, and more general cases that arise in multiclass settings. Plus, it allows a compact implementation in SQL. The paper also introduces Singlepass Belief Propagation (SBP), a localized (or “myopic”) version of LinBP that propagates information across every edge at most once and for which the final class assignments depend only on the nearest labeled neighbors. In addition, SBP allows fast incremental updates in dynamic networks. Our runtime experiments show that LinBP and SBP are orders of magnitude faster than standard BP, while leading to almost identical node labels. 1.
Cheetah: fast graph kernel tracking on dynamic graphs
 IN SDM
, 2015
"... Graph kernels provide an expressive approach to measuring the similarity of two graphs, and are key building blocks behind many realworld applications, such as bioinformatics, brain science and social networks. However, current methods for computing graph kernels assume the input graphs are static, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Graph kernels provide an expressive approach to measuring the similarity of two graphs, and are key building blocks behind many realworld applications, such as bioinformatics, brain science and social networks. However, current methods for computing graph kernels assume the input graphs are static, which is often not the case in reality. It is highly desirable to track the graph kernels on dynamic graphs evolving over time in a timely manner. In this paper, we propose a family of Cheetah algorithms to deal with the challenge. Cheetah leverages the low rank structure of graph updates and incrementally updates the eigendecomposition or SVD of the adjacency matrices of graphs. Experimental evaluations on real world graphs validate our algorithms (1) are significantly faster than alternatives with high accuracy and (b) scale sublinearly.
Anomaly detection in dynamic networks: a survey
 Wiley Interdisciplinary Reviews: Computational Statistics
, 2015
"... Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressivene ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressiveness and their natural ability to represent complex relationships. Originally, techniques focused on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data. As realworld networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. In this survey, we aim to provide a comprehensive overview of anomaly detection in dynamic networks, concentrating on the stateoftheart methods. We first describe four types of anomalies that arise in dynamic networks, providing an intuitive explanation, applications, and a concrete example for each. Having established an idea for what constitutes an anomaly, a general twostage approach to anomaly detection in dynamic networks that is common among the methods is presented. We then construct a twotiered taxonomy, first partitioning the methods based on the intuition behind their approach, and subsequently subdividing them based on the types of anomalies they detect. Within each of the tier one categoriescommunity, compression, decomposition, distance, and probabilistic model basedwe highlight the major similarities and differences, showing the wealth of techniques derived from similar conceptual approaches. © 2015 The Authors. financial systems connecting banks across the world, electric power grids connecting geographically distributed areas, and social networks that connect users, businesses, or customers using relationships such as friendship, collaboration, or transactional interactions. These are examples of dynamic networks, which, unlike static networks, are constantly undergoing changes to their structure or attributes. Possible changes include insertion and deletion of vertices (objects), insertion and deletion of edges (relationships), and modification of attributes (e.g., vertex or edge labels). WIREs Computational Statistics An important problem over dynamic networks is anomaly detectionfinding objects, relationships, or
NED: An InterGraph Node Metric Based On Edit Distance
"... ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The intergraph node similarity is important in learning a new graph based on the knowledge extracted from an existing gra ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The intergraph node similarity is important in learning a new graph based on the knowledge extracted from an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. In this paper, we propose a novel distance function for measuring intergraph node similarity with edit distance, called NED. In NED, two nodes are compared according to their local neighborhood topologies which are represented as unordered kadjacent trees, without relying on any extra information. Due to the hardness of computing tree edit distance on unordered trees which is NPComplete, we propose a modified tree edit distance, called TED*, for comparing unordered and unlabeled kadjacent trees. TED* is a metric distance, as the original tree edit distance, but more importantly, TED* is polynomially computable. As a metric distance, NED admits efficient indexing, provides interpretable results, and shows to perform better than existing approaches on a number of data analysis tasks, including graph deanonymization. Finally, the efficiency and effectiveness of NED are empirically demonstrated using realworld graphs.
ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank
"... In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as PRank and SimRank++, fail to capture similar node pairs in certain conditions, especially ..."
Abstract
 Add to MetaCart
(Show Context)
In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as PRank and SimRank++, fail to capture similar node pairs in certain conditions, especially when two nodes can only reach each other through paths of odd lengths. We present new similarity measures ASCOS and ASCOS++ to address the problem. ASCOS outputs a more complete similarity score than SimRank and SimRank’s families. ASCOS++ enriches ASCOS to include edge weight into the measure, giving all edges and network weights an opportunity to make their contribution. We show that both ASCOS++ and ASCOS can be reformulated and applied on a distributed environment for parallel contribution. Experimental results show that ASCOS++ reports a better score than SimRank and several famous similarity measures. Finally, we reexamine previous use cases of SimRank, and explain appropriate and inappropriate use cases. We suggest future SimRank users following the rules proposed here before naı̈vely applying it. We also discuss the relationship between ASCOS++ and PageRank.
Anomaly Detection in Dynamic Networks of Varying Size
"... Dynamic networks, also called network streams, are an important data representation that applies to many realworld domains. Many sets of network data such as email networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of ..."
Abstract
 Add to MetaCart
(Show Context)
Dynamic networks, also called network streams, are an important data representation that applies to many realworld domains. Many sets of network data such as email networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Here the task is to identify points in time where the network exhibits behavior radically different from a typical time, either due to some event (like the failure of machines in a computer network) or a shift in the network properties. This problem is made more difficult by the fluid nature of what is considered ”normal ” network behavior. The volume of traffic on a network, for example, can change over the course of a month or even vary based on the time of the day without being considered unusual. Anomaly detection tests using traditional network statistics have difficulty in these scenarios due to their Density Dependence: as the volume of edges changes the value of the statistics changes as well making it difficult to determine if the change in signal is due to the traffic volume or due to some fundamental shift in the behavior of the network. To more accurately detect anomalies in dynamic networks, we introduce the concept of DensityConsistent network statistics. These statistics are designed to produce results that reflect the state of the network independent of the volume of edges. On synthetically generated graphs anomaly detectors using these statistics show a a 20400 % improvement in the recall when distinguishing graphs drawn from different distributions. When applied to several real datasets DensityConsistent statistics recover multiple network events which standard statistics failed to find, and the times flagged as anomalies by DensityConsistent statistics have subgraphs with radically different structure from normal time steps. 1.
Linearized and Turbo Belief Propagation
"... How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label the class of a node in a network based on its neighbors and appropriate assumptions of homophily (“ ..."
Abstract
 Add to MetaCart
How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label the class of a node in a network based on its neighbors and appropriate assumptions of homophily (“birds of a feather flock together”) or heterophily (“opposites attract”). One of the most widely used methods for this kind of reasoning is Belief Propagation (BP) which iteratively propagates the information from a few nodes with explicit beliefs throughout the network until it converges. However, one main problem with BP is that there are no guarantees of convergence in general graphs with loops. This paper introduces Linearized Belief Propagation (LPB), a linearization of BP that allows a closedform solution via intuitive matrix calculations and, thus, comes with convergence guarantees. It handles homophily, heterophily, and more general cases that arise in multiclass settings. The paper also introduces Turbo Belief Propagation (TBP), a “localized ” version of LBP for which the final class assignments depend only on the nearest labeled neighbors. TBP (in contrast to standard BP and LBP) allows fast incremental updates in case of new explicit labels or new edges in the graph. We show an intuitive connection between LBP and TBP by proving that the labeling assignments for both are identical in the limit of decreasing coupling strengths between nodes in the graph. Importantly, the linearized matrix equations of both new methods allow compact implementations in SQL. Finally, our runtime experiments show that both new methods are orders of magnitude faster than standard BP while leading to almost identical node labels. 1.
A Guide to Selecting a Network Similarity Method∗
"... We consider the problem of determining how similar two networks (without known nodecorrespondences) are. This problem occurs frequently in realworld applications such as transfer learning and change detection. Many networksimilarity methods exist; and it is unclear how one should select from amon ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the problem of determining how similar two networks (without known nodecorrespondences) are. This problem occurs frequently in realworld applications such as transfer learning and change detection. Many networksimilarity methods exist; and it is unclear how one should select from amongst them. We provide the first empirical study on the relationships between different networksimilarity methods. Specifically, we present (1) an approach for identifying groups of comparable networksimilarity methods and (2) an approach for computing the consensus among a given set of networksimilarity methods. We compare and contrast twenty networksimilarity methods by applying our approaches to a variety of real datasets spanning multiple domains. Our experiments demonstrate that (1) different networksimilarity methods are surprisingly well correlated, (2) some complex networksimilarity methods can be closely approximated by a much simpler method, and (3) a few networksimilarity methods produce rankings that are very close to the consensus ranking. 1
Graph based Anomaly Detection and Description: A Survey
"... Detecting anomalies in data is a vital task, with numerous highimpact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multidimensional ..."
Abstract
 Add to MetaCart
Detecting anomalies in data is a vital task, with numerous highimpact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multidimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have longrange correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the stateoftheart methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sensemaking. Finally, we present several realworld applications of graphbased anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.
Noname manuscript No. (will be inserted by the editor) Secure PeertoPeer 3D Streaming
"... Abstract In recent years, interactive virtual environments such as Second Life, and virtual globe applications such as Google Earth, have become very popular. However, delivering massive amounts of interactive content to millions of potential users pose challenges for the processing and network capa ..."
Abstract
 Add to MetaCart
Abstract In recent years, interactive virtual environments such as Second Life, and virtual globe applications such as Google Earth, have become very popular. However, delivering massive amounts of interactive content to millions of potential users pose challenges for the processing and network capacities of the content providers. Distributed peertopeer (P2P) approach has thus been proposed to provide more affordable scalability. However, building content delivery systems based on P2P approaches create security concerns for commercial adoptions. This paper identifies three practical obstacles that must be addressed, in order for subscriptionbased service providers to adopt P2Pbased, nonlinear interactive streaming: 1) the detection of double playing, where a subscriber of the service can only login with one presence anywhere in the system; 2) the authentication of content, where paying customers can be sure that the content they retrieve from other users are authentic; and 3) the update of content, where new versions of the content can be distributed to users timely and securely. We then present the basic solutions to each of these challenges, and also present the respective security analysis to our approach.