Results 1 -
5 of
5
Sig-SR: SimRank search over singular graphs.
- In Proceedings of the 37th ACM SIGIR International Conference on Research & Development in Information Retrieval (SIGIR 2014),
, 2014
"... ABSTRACT SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for calculating SimRank, however ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
ABSTRACT SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for calculating SimRank, however, implies an assumption that the graph is non-singular, i.e., its adjacency matrix is invertible. In reality, non-singular graphs are very rare; such an assumption in [1] is too restrictive in practice. In this paper, we provide a treatment of [1], by supporting similarity assessment on non-invertible adjacency matrices. Assume that a singular graph G has n nodes, with r (< n) being the rank of its adjacency matrix. (1) We show that SimRank matrix S on G has an elegant structure: S can be represented as a rank r matrix plus a scaled identity matrix. (2) By virtue of this, an efficient algorithm over singular graphs, Sig-SR, is proposed for calculating all-pairs SimRank in O(r(n 2 + Kr 2 )) time for K iterations. In contrast, the only known matrix-based algorithm that supports singular graphs
Efficient PartialPairs SimRank Search on Large Networks
"... The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their in-neighbors are similar”. While most existing work retrieving SimRank only con ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their in-neighbors are similar”. While most existing work retrieving SimRank only considers all-pairs SimRank s(⋆, ⋆) and single-source SimRank s(⋆, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}∀a∈A,∀b∈B. However, the best-known solution appears not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand. This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel “seed germination ” model that computes partial-pairs Sim-Rank in O(k|E|min{|A|, |B|}) time and O(|E|+k|V |) mem-ory for k iterations on a graph of |V | nodes and |E | edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(mmin{|A|, |B|}), where m ≤ min{k|E|,∆2k}, and ∆ is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source Sim-Ranks. (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links. 1.
NED: An Inter-Graph Node Metric Based On Edit Distance
"... ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge extracted from an existing gra ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge extracted from an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. In this paper, we propose a novel distance function for measuring inter-graph node similarity with edit distance, called NED. In NED, two nodes are compared according to their local neighborhood topologies which are represented as unordered k-adjacent trees, without relying on any extra information. Due to the hardness of computing tree edit distance on unordered trees which is NP-Complete, we propose a modified tree edit distance, called TED*, for comparing unordered and unlabeled k-adjacent trees. TED* is a metric distance, as the original tree edit distance, but more importantly, TED* is polynomially computable. As a metric distance, NED admits efficient indexing, provides interpretable results, and shows to perform better than existing approaches on a number of data analysis tasks, including graph deanonymization. Finally, the efficiency and effectiveness of NED are empirically demonstrated using real-world graphs.
Efficient Top-K SimRank-based Similarity Join
"... SimRank is a popular and widely-adopted similarity mea-sure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most s ..."
Abstract
- Add to MetaCart
(Show Context)
SimRank is a popular and widely-adopted similarity mea-sure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most similar pairs. To address this problem, in this paper we study the top-k SimRank-based similarity join problem, which finds k most similar pairs of nodes with the largest SimRank similarities among all possible pairs. To the best of our knowledge, this is the first attempt to address this problem. We encode each node as a vector by summariz-ing its neighbors and transform the calculation of the Sim-Rank similarity between two nodes to computing the dot product between the corresponding vectors. We devise an efficient two-step framework to compute top-k similar pairs using the vectors. For large graphs, exact algorithms cannot meet the high-performance requirement, and we also devise an approximate algorithm which can efficiently identify top-k similar pairs under user-specified accuracy requirement. Experiments on both real and synthetic datasets show our method achieves high performance and good scalability. 1.