Results 11  20
of
83
A hybrid spacefilling and forcedirected layout method for visualizing multiplecategory graphs
 In IEEE Pacific Visualization
, 2009
"... Many graphs used in realworld applications consist of nodes belonging to more than one category. We call such graph ”multiplecategory graphs”. Social networks are typical examples of multiplecategory graphs: nodes are persons, links are friendships, and categories are communities that the persons b ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
Many graphs used in realworld applications consist of nodes belonging to more than one category. We call such graph ”multiplecategory graphs”. Social networks are typical examples of multiplecategory graphs: nodes are persons, links are friendships, and categories are communities that the persons belong to. It is often helpful to visualize both connectivity and categories of the graphs simultaneously. In this paper, we present a new visualization technique for multiplecategory graphs. The technique firstly constructs hierarchical clusters of the nodes based on both connectivity and categories. It then places the nodes by a new hybrid spacefilling and forcedirected layout algorithm to clearly display both connectivity and category information. We show layout results using our hybrid method and compare it with other methods, and present a case study using an active biological network dataset.
PICS: Parameterfree Identification of Cohesive Subgroups in Large Attributed Graphs
"... Given a graph with node attributes, how can we find meaningful patterns such as clusters, bridges, and outliers? Attributed graphs appear in real world in the form of social networks with user interests, gene interaction networks with gene expression information, phone call networks with customer de ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Given a graph with node attributes, how can we find meaningful patterns such as clusters, bridges, and outliers? Attributed graphs appear in real world in the form of social networks with user interests, gene interaction networks with gene expression information, phone call networks with customer demographics, and many others. In effect, we want to group the nodes into clusters with similar connectivity and homogeneous attributes. Most existing graph clustering algorithms either consider only the connectivity structure of the graph and ignore the node attributes, or require several userdefined parameters such as the number of clusters. We propose PICS, a novel, parameterfree method for mining attributed graphs. Two key advantages of our method are that (1) it requires no userspecified parameters such as the number of clusters and similarity functions, and (2) its running time scales linearly with total graph and attribute size. Our experiments show that PICS reveals meaningful and insightful patterns and outliers in both synthetic and real datasets, including call networks, political books, political blogs, and collections from Twitter and YouTube which have more than 70K nodes and 30K attributes. 1
TopK aggregation queries over large networks
 In ICDE
, 2010
"... Abstract — Searching and mining large graphs today is critical to a variety of application domains, ranging from personalized recommendation in social networks, to searches for functional associations in biological pathways. In these domains, there is a need to perform aggregation operations on larg ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Searching and mining large graphs today is critical to a variety of application domains, ranging from personalized recommendation in social networks, to searches for functional associations in biological pathways. In these domains, there is a need to perform aggregation operations on largescale networks. Unfortunately the existing implementation of aggregation operations on relational databases does not guarantee superior performance in network space, especially when it involves edge traversals and joins of gigantic tables. In this paper, we investigate the neighborhood aggregation queries: Find nodes that have topk highest aggregate values over their hhop neighbors. While these basic queries are common in a wide range of search and recommendation tasks, surprisingly
Efficient Community Detection in Large Networks using Content and Links
"... In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that con ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that content information can help strengthen the community signal. This enables ones to eliminate the impact of noise (false positives and false negatives), which is particularly prevalent in online social networks and Webscale information networks. Specifically we introduce a measure of signal strength between two nodes in the network by fusing their link strength with content similarity. Link strength is estimated based on whether the link is likely (with high probability) to reside within a community. Content similarity is estimated through cosine similarity or Jaccard coefficient. We discuss a simple mechanism for fusing content and link similarity. We then present a biased edge sampling procedure which retains edges that are locally relevant for each graph node. The resulting backbone graph can be clustered using standard community discovery algorithms such as Metis and Markov clustering. Through extensive experiments on multiple realworld datasets (Flickr, Wikipedia and CiteSeer) with varying sizes and characteristics, we demonstrate the effectiveness and efficiency of our methods over stateoftheart learning and mining approaches several of which also attempt to combine link and content analysis for the purposes of community discovery. Specifically we always find a qualitative benefit when combining content with link analysis. Additionally our biased graph sampling approach realizes a quantitative benefit in that it is typically several orders of magnitude faster than competing approaches.
Compression of Weighted Graphs
"... We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedg ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the ’simple weighted graph compression problem’. We then propose a much wider class of tasks under the name of ’generalized weighted graph compression problem’. The generalized task extends the optimization to preserve longerrange connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with different balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.
Community Detection in Incomplete Information Networks
, 2012
"... With the recent advances in information networks, the problem of community detection has attracted much attention in the last decade. While network community detection has been ubiquitous, the task of collecting complete network data remains challenging in many realworld applications. Usually the c ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
With the recent advances in information networks, the problem of community detection has attracted much attention in the last decade. While network community detection has been ubiquitous, the task of collecting complete network data remains challenging in many realworld applications. Usually the collected network is incomplete with most of the edges missing. Commonly, in such networks, all nodes with attributes are available while only the edges within a few local regions of the network can be observed. In this paper, we study the problem of detecting communities in incomplete information networks with missing edges. We first learn a distance metric to reproduce the linkbased distance between nodes from the observed edges in the local information regions. We then use the learned distance metric to estimate the distance between any pair of nodes in the network. A hierarchical clustering approach is proposed to detect communities within the incomplete information networks. Empirical studies on realworld information networks demonstrate that our proposed method can effectively detect community structures within incomplete information networks.
An Experimental Comparison of Pregellike Graph Processing Systems∗
"... The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregel ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregellike systems perform, we conduct a study to experimentally compare Giraph, GPS, Mizan, and GraphLab on equal ground by considering graph and algorithm agnostic optimizations and by using several metrics. The systems are compared with four different algorithms (PageRank, single source shortest path, weakly connected components, and distributed minimum spanning tree) on up to 128 Amazon EC2 machines. We find that the system optimizations present in Giraph and GraphLab allow them to perform well. Our evaluation also shows Giraph 1.0.0’s considerable improvement since Giraph 0.1 and identifies areas of improvement for all systems. 1.
Vog: Summarizing and understanding large graphs
, 2014
"... How can we succinctly describe a millionnode graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary ’ of subgraphtypes that often occur in re ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
How can we succinctly describe a millionnode graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary ’ of subgraphtypes that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a wellfounded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are threefold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop VOG, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multimillionedge real graphs, including Flickr and the Notre Dame web graph. 1
Mining graph patterns efficiently via randomized summaries
 PVLDB
"... Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cybersecurity. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing gr ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cybersecurity. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs. The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called SummarizeMine, suggests a third path by compressing within transactions. SummarizeMine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that SummarizeMine is very efficient, which can find interesting malware fingerprints that were not revealed previously.
Social Influence Based Clustering of Heterogeneous Information Networks
"... Social networks continue to grow in size and the type of information hosted. We witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, we present a social influence ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Social networks continue to grow in size and the type of information hosted. We witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, we present a social influence based clustering framework for analyzing heterogeneous information networks with three unique features. First, we introduce a novel social influence based vertex similarity metric in terms of both selfinfluence similarity and coinfluence similarity. We compute selfinfluence and coinfluence based similarity based on social graph and its associated activity graphs and influence graphs respectively. Second, we compute the combined social influence based similarity between each pair of vertices by unifying the selfsimilarity and multiple coinfluence similarity scores through a weight function with an iterative update method. Third, we design an iterative learning algorithm, SICluster, to dynamically refine the K clusters by continuously quantifying and adjusting the weights on selfinfluence similarity and on multiple coinfluence similarity scores towards the clustering convergence. To make SICluster converge fast, we transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable. Our experiment results show that SICluster not only achieves a better balance between selfinfluence and coinfluence similarities but also scales extremely well for large graph clustering.