Results 1  10
of
22
Beyond random walk and metropolishastings samplers: Why you should not backtrack for unbiased graph sampling
, 2012
"... ar ..."
(Show Context)
GAIA: Graph Classification Using Evolutionary Computation
"... Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an efficient method GAIA for mining discriminative subgraphs for graph classification in large databases. Our method employs a novel subgraph encoding approach to support an arbitrary subgraph pattern exploration order and explores the subgraph pattern space in a process resembling biological evolution. In this manner, GAIA is able to find discriminative subgraph patterns much faster than other algorithms. Additionally, we take advantage of parallel computing to further improve the quality of resulting patterns. In the end, we employ sequential coverage to generate association rules as graph classifiers using patterns mined by GAIA. Extensive experiments have been performed to analyze the performance of GAIA and to compare it with two other stateoftheart approaches. GAIA outperforms the other approaches both in terms of classification accuracy and runtime efficiency.
Network Sampling: From Static to Streaming Graphs
, 2013
"... Network sampling is integral to the analysis of social, information, and biological networks. Since many realworld networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorou ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Network sampling is integral to the analysis of social, information, and biological networks. Since many realworld networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topologybased sampling. Experimental results indicate that our proposed family of sampling methods more accurately preserve the underlying properties of the graph in both static and streaming domains. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms.
Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly Detection
"... Given a large social graph, like a scientific collaboration network, what can we say about its robustness? Can we estimate a robustness index for a graph quickly? If the graph evolves over time, how these properties change? In this work, we are trying to answer the above questions studying the expan ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Given a large social graph, like a scientific collaboration network, what can we say about its robustness? Can we estimate a robustness index for a graph quickly? If the graph evolves over time, how these properties change? In this work, we are trying to answer the above questions studying the expansion properties of large social graphs. First, we present a measure which characterizes the robustness properties of a graph, and serves as global measure of the community structure (or lack thereof). We study how these properties change over time and we show how to spot outliers and anomalies over time. We apply our method on several diverse real networks with millions of nodes. We also show how to compute our measure efficiently by exploiting the special spectral properties of realworld networks.
Ring: An integrated method for frequent representative subgraph mining
 In ICDM
, 2009
"... Abstract—We propose a novel representative based subgraph mining model. A series of standards and methods are proposed to select invariants. Patterns are mapped into invariant vectors in a multidimensional space. To find qualified patterns, only a subset of frequent patterns is generated as represen ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a novel representative based subgraph mining model. A series of standards and methods are proposed to select invariants. Patterns are mapped into invariant vectors in a multidimensional space. To find qualified patterns, only a subset of frequent patterns is generated as representatives, such that every frequent pattern is close to one of the representative patterns while representative patterns are distant from each other. We devise the RING algorithm, integrating the representative selection into the pattern mining process. Meanwhile, we use Rtrees to assist this mining process. Last but not least, a large number of real and synthetic datasets are employed for the empirical study, which show the benefits of the representative model and the efficiency of the RING algorithm.
Graph Sample and Hold: A Framework for BigGraph Analytics
"... Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in c ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for biggraph analytics,
Mining frequent graph patterns with differential privacy
 In KDD 2013
, 2013
"... Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phonecall graphs and webclick graphs, releasing discovered frequent patterns may present a threat to ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phonecall graphs and webclick graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. Differential privacy has recently emerged as the de facto standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent itemsets cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a nonprivate mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision. 1
Approximate Graph Mining with Label Costs ∗
"... Many realworld graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to define a cost (or distance) between different labels. Using this information, it become ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Many realworld graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to define a cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns, which preserve the topology but allow bounded label mismatches. We present novel and scalable methods to efficiently solve the approximate isomorphism problem. We show that approximate mining yields interesting patterns in several realworld graphs ranging from IT and protein interaction networks to protein structures.