Results 1  10
of
21
Graph Cube: On Warehousing and OLAP Multidimensional Networks
"... We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the socalled multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven to be effective tools for decision support on relational data. However, they are not wellequipped to handle the new yet important multidimensional networks. In this paper, we introduce Graph Cube, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks. By taking account of both attribute aggregation and structure summarization of the networks, Graph Cube goes beyond the traditional data cube model involved solely with numeric value based groupby’s, thus resulting in a more insightful and structureenriched aggregate network within every possible multidimensional space. Besides traditional cuboid queries, a new class of OLAP queries, crossboid, is introduced that is uniquely useful in multidimensional networks and has not been studied before. We implement Graph Cube by combining special characteristics of multidimensional networks with the existing wellstudied data cube techniques. We perform extensive experimental studies on a series of real world data sets and Graph Cube is shown to be a powerful and efficient tool for decision support on large multidimensional networks.
Compression of Weighted Graphs
"... We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedg ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the ’simple weighted graph compression problem’. We then propose a much wider class of tasks under the name of ’generalized weighted graph compression problem’. The generalized task extends the optimization to preserve longerrange connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with different balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.
Vog: Summarizing and understanding large graphs
, 2014
"... How can we succinctly describe a millionnode graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary ’ of subgraphtypes that often occur in re ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
How can we succinctly describe a millionnode graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary ’ of subgraphtypes that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a wellfounded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are threefold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop VOG, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multimillionedge real graphs, including Flickr and the Notre Dame web graph. 1
Social Influence Based Clustering of Heterogeneous Information Networks
"... Social networks continue to grow in size and the type of information hosted. We witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, we present a social influence ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Social networks continue to grow in size and the type of information hosted. We witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, we present a social influence based clustering framework for analyzing heterogeneous information networks with three unique features. First, we introduce a novel social influence based vertex similarity metric in terms of both selfinfluence similarity and coinfluence similarity. We compute selfinfluence and coinfluence based similarity based on social graph and its associated activity graphs and influence graphs respectively. Second, we compute the combined social influence based similarity between each pair of vertices by unifying the selfsimilarity and multiple coinfluence similarity scores through a weight function with an iterative update method. Third, we design an iterative learning algorithm, SICluster, to dynamically refine the K clusters by continuously quantifying and adjusting the weights on selfinfluence similarity and on multiple coinfluence similarity scores towards the clustering convergence. To make SICluster converge fast, we transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable. Our experiment results show that SICluster not only achieves a better balance between selfinfluence and coinfluence similarities but also scales extremely well for large graph clustering.
Graph Summarization in Annotated Data Using Probabilistic Soft Logic
"... Abstract. Annotation graphs, made available through the Linked Data initiative and Semantic Web, have significant scientific value. However, their increasing complexity makes it difficult to fully exploit this value. Graph summaries, which group similar entities and relations for a more abstract vie ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Annotation graphs, made available through the Linked Data initiative and Semantic Web, have significant scientific value. However, their increasing complexity makes it difficult to fully exploit this value. Graph summaries, which group similar entities and relations for a more abstract view on the data, can help alleviate this problem, but new methods for graph summarization are needed that handle uncertainty present within and across these sources. Here, we propose the use of probabilistic soft logic (PSL) [1] as a general framework for reasoning about annotation graphs, similarities, and the possibly confounding evidence arising from these. We show preliminary results using two simple graph summarization heuristics in PSL for a plant biology domain. 1
Summarizing Answer Graphs Induced by Keyword Queries
"... Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and su ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and summarize answer graphs that share similar structures and contents for better query interpretation and result understanding. This paper studies the summarization problem for the answer graphs induced by a keyword query Q. (1) A notion of summary graph is proposed to characterize the summarization of answer graphs. Given Q and a set of answer graphs G, a summary graph preserves the relation
Large Graph Analysis in the GMine System
"... Current applications have produced graphs on the order of hundreds of thousands of nodes and millions of edges. To take advantage of such graphs, one must be able to find patterns, outliers and communities. These tasks are better performed in an interactive environment, where human expertise can gu ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Current applications have produced graphs on the order of hundreds of thousands of nodes and millions of edges. To take advantage of such graphs, one must be able to find patterns, outliers and communities. These tasks are better performed in an interactive environment, where human expertise can guide the process. For large graphs, though, there are some challenges: the excessive processing requirements are prohibitive, and drawing hundredthousand nodes results in cluttered images hard to comprehend. To cope with these problems, we propose an innovative framework suited for any kind of treelike graph visual design. GMine integrates (a) a representation for graphs organized as hierarchies of partitions the concepts of SuperGraph and GraphTree; and (b) a graph summarization methodology CEPS. Our graph representation deals with the problem of tracing the connection aspects of a graph hierarchy with sub linear complexity, allowing one to grasp the neighborhood of a single node or of a group of nodes in a single click. As a proof of concept, the visual environment of GMine is instantiated as a system in which large graphs can be investigated globally and locally.
Entity Summarisation with Limited Edge Budget on Knowledge Graphs
"... Abstract—We formulate a novel problem of summarising entities with limited presentation budget on entityrelationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented together with a visualising tool. Experimental user evaluation of the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We formulate a novel problem of summarising entities with limited presentation budget on entityrelationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented together with a visualising tool. Experimental user evaluation of the algorithm wasconductedonreallarge semanticknowledgegraphsextracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm. I.
Distributed Graph Summarization
"... Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the underlying data. However, all existing works in graph summarization are singleprocess solutions, and as a result cannot scale to large graphs. In this paper, we introduce three distributed graph summarization algorithms to address this problem. Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes. To the best of our knowledge, this is the first work to study distributed graph summarization methods. 1.
Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs
"... Subgraph Isomorphism is a fundamental problem in graph data processing. Most existing subgraph isomorphism algorithms are based on a backtracking framework which computes the solutions by incrementally matching all query vertices to candidate data vertices. However, we observe that extensive dupl ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Subgraph Isomorphism is a fundamental problem in graph data processing. Most existing subgraph isomorphism algorithms are based on a backtracking framework which computes the solutions by incrementally matching all query vertices to candidate data vertices. However, we observe that extensive duplicate computation exists in these algorithms, and such duplicate computation can be avoided by exploiting relationships between data vertices. Motivated by this, we propose a novel approach, BoostIso, to reduce duplicate computation. Our extensive experiments with real datasets show that, after integrating our approach, most existing subgraph isomorphism algorithms can be speeded up significantly, especially for some graphs with intensive vertex relationships, where the improvement can be up to several orders of magnitude.