Results 1  10
of
77
Authoritybased keyword search in databases
 TODS
"... The ObjectRank system applies authoritybased ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authori ..."
Abstract

Cited by 220 (13 self)
 Add to MetaCart
The ObjectRank system applies authoritybased ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authority with respect to the particular
Fast random walk with restart and its applications
 In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract

Cited by 179 (19 self)
 Add to MetaCart
(Show Context)
How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic precomputation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, communitylike structure. We exploit the linearity by using lowrank matrix approximation, and the community structure by graph partitioning, followed by the ShermanMorrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in precomputation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
Supervised Random Walks: Predicting and Recommending Links in Social Networks
"... Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Althoug ..."
Abstract

Cited by 147 (3 self)
 Add to MetaCart
(Show Context)
Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms stateoftheart unsupervised approaches as well as approaches that are based on feature extraction.
Graph Clustering Based on Structural/Attribute Similarities
"... The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph cl ..."
Abstract

Cited by 99 (7 self)
 Add to MetaCart
(Show Context)
The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SACluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SACluster is converging. Extensive experimental results demonstrate the effectiveness of SACluster through comparison with the stateoftheart graph clustering and summarization methods. 1.
Fast besteffort pattern matching in large attributed graphs
 In KDD
, 2007
"... We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person’s job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a ‘star ’ query would be, “find a CEO who has strong interactions with ..."
Abstract

Cited by 53 (14 self)
 Add to MetaCart
(Show Context)
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person’s job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a ‘star ’ query would be, “find a CEO who has strong interactions with a Manager, a Lawyer, and an Accountant, or another structure as close to that as possible”. Similarly, a ‘loop ’ query could help spot a money laundering ring. Traditional SQLbased methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. Our method can find exact, as well as nearmatches, and it will present them to the user in our proposed ‘goodness ’ order. For example, our method tolerates indirect paths between, say, the ‘CEO ’ and the ‘Accountant ’ of the above sample query, when direct paths do not exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n nq), which is prohibitive. Our GRay (“Graph XRay”) method finds highquality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP authorpublication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average for a 4node query on the DBLP graph.
Fast DirectionAware Proximity for Graph Mining
, 2007
"... In this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and connection subgraph discovery. Our proximity measure is ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
In this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and connection subgraph discovery. Our proximity measure is based on the concept of escape probability. This way, we strive to summarize the multiple facets of nodesproximity, while avoiding some of the pitfalls to which alternative proximity measures are susceptible. A unique feature of the measures is accounting for the underlying directional information. We put a special emphasis on computational efficiency, and develop fast solutions that are applicable in several settings. Our experimental study shows the usefulness of our proposed directionaware proximity method for several applications, and that our algorithms achieve a significant speedup (up to 50,000x) over straightforward implementations.
Proximity Tracking on TimeEvolving Bipartite Graphs
"... Given an authorconference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large timeevolving bipartite graphs appear in many settings, such as social networks, cocitations, marketbasket analysis, and co ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
(Show Context)
Given an authorconference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large timeevolving bipartite graphs appear in many settings, such as social networks, cocitations, marketbasket analysis, and collaborative filtering. Our goal is to monitor (i) the centrality of an individual node (e.g., who are the most important authors?); and (ii) the proximity of two nodes or sets of nodes (e.g., who are the most important authors with respect to a particular conference?) Moreover, we want to do this efficiently and incrementally, and to provide “anytime ” answers. We propose pTrack and cTrack, which are based on random walk with restart, and use powerful matrix tools. Experiments on real data show that our methods are effective and efficient: the mining results agree with intuition; and we achieve up to 15∼176 times speedup, without any quality loss. 1
Measuring and extracting proximity graphs in networks
 in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
"... Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting proximity in networks called “cycle free effect ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting proximity in networks called “cycle free effective conductance” (CFEC). Importantly, the measured proximity is accompanied with a proximity subgraph, which allows assessing and understanding measured values. Our proximity calculation can handle more than two endpoints, directed edges, is statistically wellbehaved, and produces an effectiveness score for the computed subgraphs. We provide an efficient algorithm to measure and extract proximity. Also, we report experimental results and show examples for four large network data sets: a telecommunications calling graph, the IMDB actors graph, an academic coauthorship network, and a movie recommendation system.
REX: Explaining Relationships between Entity Pairs
, 2011
"... Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft 1, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) rela ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft 1, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) relationships, and various studies have been conducted on how to leverage them to answer entity seeking queries. Meanwhile, in a complementary direction, analyses over the query logs have enabled researchers to identify entity pairs that are statistically correlated. Such entity relationships are then presented to search users through the “related searches ” feature in modern search engines. However, entity relationships thus discovered can often be “puzzling ” to the users because why the entities are connected is often indescribable. In this paper, we propose a novel problem called entity relationship explanation, which seeks to explain why a pair of entities are connected, and solve this challenging problem by integrating the above two complementary approaches, i.e., we leverage the knowledge base to “explain ” the connections discovered between entity pairs. More specifically, we present REX, a system that takes a pair of entities in a given knowledge base as input and efficiently identifies a ranked list of relationship explanations. We formally define relationship explanations and analyze their desirable properties. Furthermore, we design and implement algorithms to efficiently enumerate and rank all relationship explanations based on multiple measures of “interestingness.” We perform extensive experiments over real webscale data gathered from DBpedia and a commercial search engine, demonstrating the efficiency and scalability of REX. We also perform user studies to corroborate the effectiveness of explanations generated by REX.
Fast and Exact Topk Search for Random Walk with Restart
"... Graphs are fundamental data structures and have been employed for centuries to model realworld systems and phenomena. Random walk with restart (RWR) provides a good proximity score between two nodes in a graph, and it has been successfully used in many applications such as automatic image captionin ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Graphs are fundamental data structures and have been employed for centuries to model realworld systems and phenomena. Random walk with restart (RWR) provides a good proximity score between two nodes in a graph, and it has been successfully used in many applications such as automatic image captioning, recommender systems, and link prediction. The goal of this work is to find nodes that have topk highest proximities for a given node. Previous approaches to this problem find nodes efficiently at the expense of exactness. The main motivation of this paper is to answer, in the affirmative, the question, ‘Is it possible to improve the search time without sacrificing the exactness?’. Our solution, Kdash, is based on two ideas: (1) It computes the proximity of a selected node efficiently by sparse matrices, and (2) It skips unnecessary proximity computations when searching for the topk nodes. Theoretical analyses show that Kdash guarantees result exactness. We perform comprehensive experiments to verify the efficiency of Kdash. The results show that Kdash can find topk nodes significantly faster than the previous approaches while it guarantees exactness. 1.