Results 1  10
of
30
CollabSeer: A Search Engine for Collaboration Discovery
, 2011
"... Collaborative research has been increasingly popular and important in academic circles. However, there is no open platform available for scholars or scientists to effectively discover potential collaborators. This paper discusses CollabSeer, an open system to recommend potential research collaborat ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
Collaborative research has been increasingly popular and important in academic circles. However, there is no open platform available for scholars or scientists to effectively discover potential collaborators. This paper discusses CollabSeer, an open system to recommend potential research collaborators for scholars and scientists. CollabSeer discovers collaborators based on the structure of the coauthor network and a user’s research interests. Currently, three different network structure analysis methods that use vertex similarity are supported in CollabSeer: Jaccard similarity, cosine similarity, and our relation strength similarity measure. Users can also request a recommendation by selecting a topic of interest. The topic of interest list is determined by CollabSeer’s lexical analysis module, which analyzes the key phrases of previous publications. The CollabSeer system is highly modularized making it easy to add or replace the network analysis module or users ’ topic of interest analysis module. CollabSeer integrates the results of the two modules to recommend collaborators to users. Initial experimental results over the a subset of the CiteSeerX database shows that CollabSeer can efficiently discover prospective collaborators.
Parallel simrank computation on large graphs with iterative aggregation
 KDD'10
, 2010
"... Recently there has been a lot of interest in graphbased analysis. One of the most important aspects of graphbased analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing method ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Recently there has been a lot of interest in graphbased analysis. One of the most important aspects of graphbased analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a firstorder Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the linkupdating problem but also the nodeupdating problem. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.
Relevance search in heterogeneous networks
 Proceedings of the 15th International Conf. on Extending Database Technology
, 2012
"... ABSTRACT Conventional research on similarity search focuses on measuring the similarity between objects with the same type. However, in many realworld applications, we need to measure the relatedness between objects with different types. For example, in automatic expert profiling, people are inter ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
ABSTRACT Conventional research on similarity search focuses on measuring the similarity between objects with the same type. However, in many realworld applications, we need to measure the relatedness between objects with different types. For example, in automatic expert profiling, people are interested in finding the most relevant objects to an expert, where the objects can be of various types, such as research areas, conferences and papers, etc. With the surge of study on heterogeneous networks, the relatedness measure on objects with different types becomes increasingly important. In this paper, we study the relevance search problem in heterogeneous networks, where the task is to measure the relatedness of heterogeneous objects (including objects with the same type or different types). We propose a novel measure, called HeteSim, with the following attributes: (1) a pathconstrained measure: the relatedness of object pairs are defined based on the search path that connect two objects through following a sequence of node types; (2) a uniform measure: it can measure the relatedness of objects with the same or different types in a uniform framework; (3) a semimetric measure: HeteSim has some good properties (e.g., selfmaximum and symmetric), that are crucial to many tasks. Empirical studies show that HeteSim can effectively evaluate the relatedness of heterogeneous objects. Moreover, in the query and clustering tasks, it can achieve better performances than conventional measures.
Recommendation of similar users, resources and social networks in a Social Internetworking Scenario
 Information Sciences
"... In this paper we propose an approach to recommend to a user similar users, resources and social networks in a Social Internetworking Scenario. Our approach presents some interesting novelties with respect to the existing ones. First of all, it operates on a Social Internetworking context and not on ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
In this paper we propose an approach to recommend to a user similar users, resources and social networks in a Social Internetworking Scenario. Our approach presents some interesting novelties with respect to the existing ones. First of all, it operates on a Social Internetworking context and not on a single social network. Moreover, it considers not only explicit relationships among users but also the implicit ones, connecting users on the basis of shared interests and behavior; the latter is derived from the analysis of user actions in the considered Social Internetworking Scenario. In addition, it considers the presence of possible semantic anomalies involving the description of available users, resources and social networks. Finally, it takes into account not only the local information regarding involved users, resources and social networks but also the global one, i.e.,
Fast Random Walk Graph Kernel
"... Random walk graph kernel has been used as an important tool for various data mining tasks including classification and similarity computation. Despite its usefulness, however, it suffers from the expensive computational cost which is at least O(n 3) or O(m 2) for graphs with n nodes and m edges. In ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Random walk graph kernel has been used as an important tool for various data mining tasks including classification and similarity computation. Despite its usefulness, however, it suffers from the expensive computational cost which is at least O(n 3) or O(m 2) for graphs with n nodes and m edges. In this paper, we propose Ark, a set of fast algorithms for random walk graph kernel computation. Ark is based on the observation that real graphs have much lower intrinsic ranks, compared with the orders of the graphs. Ark exploits the low rank structure to quickly compute random walk graph kernels in O(n 2) or O(m) time. Experimental results show that our method is up to 97,865 × faster than the existing algorithms, while providing more than 91.3 % of the accuracies.
More is Simpler: Effectively and Efficiently Assessing NodePair Similarities Based on Hyperlinks
"... Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a wellstudied measure of similarity between two nodes in a graph. It recursively follows the philosophy that “two no ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a wellstudied measure of similarity between two nodes in a graph. It recursively follows the philosophy that “two nodes are similar if they are referenced (have incoming edges) from similar nodes”, which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., “zerosimilarity”: It only accommodates paths with equal length from a common “center ” node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counterintuitive “zerosimilarity ” issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank * can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixedpoint iterative paradigm of SimRank * in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NPhardness, we devise an efficient and effective heuristic to speed up SimRank * computation to O(Kn ˜m) time, where ˜m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency. 1.
Taming computational complexity: Efficient and parallel simrank optimizations on undirected graphs
 In: WAIM. (2010
"... Abstract. SimRank has been considered as one of the promising linkbased ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We first present a ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Abstract. SimRank has been considered as one of the promising linkbased ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We first present a novel algorithm to estimate the SimRank between vertices in O ( n 3 + K · n 2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efficient implementation of SimRank algorithm in [1] takes O ( K · n 3) time in the worst case. To efficiently handle largescale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and reallife data sets demonstrate the better computational time and parallel efficiency of our proposed techniques. 1
Efficient SimRankbased Similarity Join Over Large Graphs
, 2013
"... Graphs have been widely used to model complex data in many realworld applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank ” to evaluate the s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Graphs have been widely used to model complex data in many realworld applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank ” to evaluate the similarity of two vertices in a large graph because of its generality. Note that “SimRank ” is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRankbased join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortestpath distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called hgo cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called hgo covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partitionbased framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.
Emerging Graph Queries In Linked Data
"... Abstract—In a wide array of disciplines, data can be modeled as an interconnected network of entities, where various attributes could be associated with both the entities and the relations among them. Knowledge is often hidden in the complex structure and attributes inside these networks. While que ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In a wide array of disciplines, data can be modeled as an interconnected network of entities, where various attributes could be associated with both the entities and the relations among them. Knowledge is often hidden in the complex structure and attributes inside these networks. While querying and mining these linked datasets are essential for various applications, traditional graph queries may not be able to capture the rich semantics in these networks. With the advent of complex information networks, new graph queries are emerging, including graph pattern matching and mining, similarity search, ranking and expert finding, graph aggregation and OLAP. These queries require both the topology and content information of the network data, and hence, different from classical graph algorithms such as shortest path, reachability and minimum cut, which depend only on the structure of the network. In this tutorial, we shall give an introduction of the emerging graph queries, their indexing and resolution techniques, the current challenges and the future research directions. I.