Results 1  10
of
13
Relevance search in heterogeneous networks
 Proceedings of the 15th International Conf. on Extending Database Technology
, 2012
"... ABSTRACT Conventional research on similarity search focuses on measuring the similarity between objects with the same type. However, in many realworld applications, we need to measure the relatedness between objects with different types. For example, in automatic expert profiling, people are inter ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
ABSTRACT Conventional research on similarity search focuses on measuring the similarity between objects with the same type. However, in many realworld applications, we need to measure the relatedness between objects with different types. For example, in automatic expert profiling, people are interested in finding the most relevant objects to an expert, where the objects can be of various types, such as research areas, conferences and papers, etc. With the surge of study on heterogeneous networks, the relatedness measure on objects with different types becomes increasingly important. In this paper, we study the relevance search problem in heterogeneous networks, where the task is to measure the relatedness of heterogeneous objects (including objects with the same type or different types). We propose a novel measure, called HeteSim, with the following attributes: (1) a pathconstrained measure: the relatedness of object pairs are defined based on the search path that connect two objects through following a sequence of node types; (2) a uniform measure: it can measure the relatedness of objects with the same or different types in a uniform framework; (3) a semimetric measure: HeteSim has some good properties (e.g., selfmaximum and symmetric), that are crucial to many tasks. Empirical studies show that HeteSim can effectively evaluate the relatedness of heterogeneous objects. Moreover, in the query and clustering tasks, it can achieve better performances than conventional measures.
Measuring Tie Strength in Implicit Social Networks ⇤
"... Given a set of people and a set of events attended by them, we address the problem of measuring connectedness or tie strength between each pair of persons. The underlying assumption is that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Given a set of people and a set of events attended by them, we address the problem of measuring connectedness or tie strength between each pair of persons. The underlying assumption is that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to this problem. Starting from a list of axioms, which a measure of tie strength must satisfy, we characterize functions that satisfy all the axioms. We then show that there is a range of tiestrength measures that satisfy this characterization. A measure of tie strength induces a ranking on the edges of the social network (and on the set of neighbors for every person). We show that for applications where the ranking, and not the absolute value of the tie strength, is the important thing about the measure, the axioms are equivalent to a natural partial order. To settle on a particular measure, we must make a nonobvious decision about extending this partial order to a total order. This decision is best left to particular applications. We also classify existing tiestrength measures according to the axioms that they satisfy; and observe that none of the “selfreferential ” tiestrength measures satisfy the axioms. In our experiments, we demonstrate the efficacy of our approach; show the completeness and soundness of our axioms, and present Kendall Tau Rank Correlation between various tiestrength measures. Author Keywords Social networks, tie strength, axiomatic approach
More is Simpler: Effectively and Efficiently Assessing NodePair Similarities Based on Hyperlinks
"... Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a wellstudied measure of similarity between two nodes in a graph. It recursively follows the philosophy that “two no ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a wellstudied measure of similarity between two nodes in a graph. It recursively follows the philosophy that “two nodes are similar if they are referenced (have incoming edges) from similar nodes”, which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., “zerosimilarity”: It only accommodates paths with equal length from a common “center ” node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counterintuitive “zerosimilarity ” issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank * can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixedpoint iterative paradigm of SimRank * in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NPhardness, we devise an efficient and effective heuristic to speed up SimRank * computation to O(Kn ˜m) time, where ˜m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency. 1.
Entity Role Discovery in Hierarchical Topical Communities
"... People and social communities are often characterized by the topics and themes they are working on, or communicating about. Discovering the roles played by different entities in these communities are of great interest in many realworld contexts in social network analysis. We are also often interest ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
People and social communities are often characterized by the topics and themes they are working on, or communicating about. Discovering the roles played by different entities in these communities are of great interest in many realworld contexts in social network analysis. We are also often interested in discovering such roles at different levels of granularity. In this paper we study a new problem of mining entity roles in hierarchical topical communities. We first detect topical communities from the text component of a social or information network. Since we mine phrases from the network, and represent topical communities by ranked lists of mixedlength phrases, the communities have a good interpretation at multiple levels of the hierarchy. We are therefore able to discover topical roles of different types of entities in both large communities encompassing more general topics, and small, focused subcommunities. We demonstrate our method on a bibliographic information network dataset, which we use to discover the roles of authors and publication venues in the context of the hierarchical topical communities.
Efficient PartialPairs SimRank Search on Large Networks
"... The assessment of nodetonode similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their inneighbors are similar”. While most existing work retrieving SimRank only con ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The assessment of nodetonode similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their inneighbors are similar”. While most existing work retrieving SimRank only considers allpairs SimRank s(⋆, ⋆) and singlesource SimRank s(⋆, j) (scores between every node and query j), there are appealing applications for partialpairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partialpairs SimRank assessment aims to retrieve only {s(a, b)}∀a∈A,∀b∈B. However, the bestknown solution appears not selfcontained since it hinges on the premise that the SimRank scores with nodepairs in an hgo cover set must be given beforehand. This paper focuses on efficient assessment of partialpairs SimRank in a selfcontained manner. (1) We devise a novel “seed germination ” model that computes partialpairs SimRank in O(kEmin{A, B}) time and O(E+kV ) memory for k iterations on a graph of V  nodes and E  edges. (2) We further eliminate unnecessary edge access to improve the time of partialpairs SimRank to O(mmin{A, B}), where m ≤ min{kE,∆2k}, and ∆ is the maximum degree. (3) We show that our partialpairs SimRank model also can handle the computations of allpairs and singlesource SimRanks. (4) We empirically verify that our algorithms are (a) 38x faster than the bestknown competitors, and (b) memoryefficient, allowing scores to be assessed accurately on graphs with tens of millions of links. 1.
Panther: Fast Topk Similarity Search on Large Networks
, 2015
"... Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scal ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path. Specifically, given a network, we perform R random walks, each starting from a randomly picked vertex and walking T steps. Theoretically, the algorithm guarantees that the sampling size R = O(2ε−2 log2 T) depends on the errorbound ε, the confidence level (1 − δ), and the path length T of each random walk. We perform extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return topk similar vertices for any vertex in a network 300 × faster than the stateoftheart methods. We also use two applications—identity resolution and structural hole spanner finding—to evaluate the accuracy of the estimated similarities. Our results demonstrate that the proposed algorithm achieves clearly better performance than several alternative methods.
Automatic Weight Generation and Class Predicate Stability in RDF Summary Graphs
"... Abstract. In this current study, we use graph localities and neighborhood similarity to enhance the summary graph generation approach for building a summary graph structure for intelligent exploration of semantic data. The key improvements to what we have previously proposed include the addition of ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. In this current study, we use graph localities and neighborhood similarity to enhance the summary graph generation approach for building a summary graph structure for intelligent exploration of semantic data. The key improvements to what we have previously proposed include the addition of a string similarity measure for the literal neighbors, development of a stability measure to evaluate the accuracy of class relations, the addition of autogenerated property weights, and the detection of noise properties.
Mining QueryBased Subnetwork Outliers in Heterogeneous Information Networks
"... Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting ..."
Abstract
 Add to MetaCart
Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting class of outliers, querybased subnetwork outliers: Given a heterogeneous network, a user raises a query to retrieve a set of taskrelevant subnetworks, among which, subnetwork outliers are those that significantly deviate from others (e.g., outliers of author groups among those studying “topic modeling”). We formalize this problem and propose a general framework, where one can query for finding subnetwork outliers with respect to different semantics. We introduce the notion of subnetwork similarity that captures the proximity between two subnetworks by their membership distributions. We propose an outlier detection algorithm to rank all the subnetworks according to their outlierness without tuning parameters. Our quantitative and qualitative experiments on both synthetic and real data sets show that the proposed method outperforms other baselines.
NED: An InterGraph Node Metric Based On Edit Distance
"... ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The intergraph node similarity is important in learning a new graph based on the knowledge extracted from an existing gra ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The intergraph node similarity is important in learning a new graph based on the knowledge extracted from an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. In this paper, we propose a novel distance function for measuring intergraph node similarity with edit distance, called NED. In NED, two nodes are compared according to their local neighborhood topologies which are represented as unordered kadjacent trees, without relying on any extra information. Due to the hardness of computing tree edit distance on unordered trees which is NPComplete, we propose a modified tree edit distance, called TED*, for comparing unordered and unlabeled kadjacent trees. TED* is a metric distance, as the original tree edit distance, but more importantly, TED* is polynomially computable. As a metric distance, NED admits efficient indexing, provides interpretable results, and shows to perform better than existing approaches on a number of data analysis tasks, including graph deanonymization. Finally, the efficiency and effectiveness of NED are empirically demonstrated using realworld graphs.