• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. (2013)

by W Yu, X Lin, W Zhang, L Chang, J Pei
Venue:PVLDB,
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Sig-SR: SimRank search over singular graphs.

by Weiren Yu , Julie A Mccann - In Proceedings of the 37th ACM SIGIR International Conference on Research & Development in Information Retrieval (SIGIR 2014), , 2014
"... ABSTRACT SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for calculating SimRank, however ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
ABSTRACT SimRank is an attractive structural-context measure of similarity between two objects in a graph. It recursively follows the intuition that "two objects are similar if they are referenced by similar objects". The best known matrix-based method [1] for calculating SimRank, however, implies an assumption that the graph is non-singular, i.e., its adjacency matrix is invertible. In reality, non-singular graphs are very rare; such an assumption in [1] is too restrictive in practice. In this paper, we provide a treatment of [1], by supporting similarity assessment on non-invertible adjacency matrices. Assume that a singular graph G has n nodes, with r (< n) being the rank of its adjacency matrix. (1) We show that SimRank matrix S on G has an elegant structure: S can be represented as a rank r matrix plus a scaled identity matrix. (2) By virtue of this, an efficient algorithm over singular graphs, Sig-SR, is proposed for calculating all-pairs SimRank in O(r(n 2 + Kr 2 )) time for K iterations. In contrast, the only known matrix-based algorithm that supports singular graphs
(Show Context)

Citation Context

... 4) 2. RELATED WORK SimRank is arguably one of the most successful link-based similarity measures in recent years. It was initially proposed by Jeh andWidom [3], who adopted an iterative paradigm to compute SimRank scores of all-pairs in O(Kn2d2) time. Since then, there has been a surge of papers looking at various problems in efficient SimRank computing as the naive algorithm [3] has high time complexity. Recent results include matrix-based methods [1, 2], iterative optimization [7–9], random walk sampling [10, 11], incremental updating [12], parallel computing [13], and semantic improvement [14]. In comparison to the work on iterative optimization (resp. random walk sampling) that will produce deterministic (resp. probabilistic) errors, the work on matrix-based methods [1,2] may accurately calculate SimRank without loss of exactness. The pioneering work of Li et al. [2] proposed a very elegant closed-form for SimRank: vec(S) = (1−C)(I−C · (WT ⊗WT )) −1 vec(I), (3) where ⊗ is tensor product, and vec(⋆) matrix vectorization. Based on Eq.(3), [2] utilized rank r (resp. low-rank α (< r)) singular value decomposition to compute all-pairs SimRank in O(r4n2) time (resp. O(α4n2) with a littl...

Efficient PartialPairs SimRank Search on Large Networks

by Weiren Yu, Julie A. Mccann
"... The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their in-neighbors are similar”. While most existing work retrieving SimRank only con ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their in-neighbors are similar”. While most existing work retrieving SimRank only considers all-pairs SimRank s(⋆, ⋆) and single-source SimRank s(⋆, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}∀a∈A,∀b∈B. However, the best-known solution appears not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand. This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel “seed germination ” model that computes partial-pairs Sim-Rank in O(k|E|min{|A|, |B|}) time and O(|E|+k|V |) mem-ory for k iterations on a graph of |V | nodes and |E | edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(mmin{|A|, |B|}), where m ≤ min{k|E|,∆2k}, and ∆ is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source Sim-Ranks. (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links. 1.
(Show Context)

Citation Context

...Recently, [6] has also employed position probability to reduce the time of [12] to O(k|E|2−|E|). All the bounds of these deterministic singlepair SimRank are still expensive. There has also been work =-=[1, 8, 18]-=- on SimRank variants. Antonellis et al. [1] extended SimRank for query rewriting. Jin et al. [8] integrated automorphism (role similarity) into SimRank. Yu et al. [18] devised SimRank*, by tallying mo...

NED: An Inter-Graph Node Metric Based On Edit Distance

by Haohan Zhu , Xianrui Meng , George Kollios
"... ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge extracted from an existing gra ..."
Abstract - Add to MetaCart
ABSTRACT Node similarity is fundamental in graph analytics. However, node similarity between nodes in different graphs (intergraph nodes) has not received enough attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge extracted from an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. In this paper, we propose a novel distance function for measuring inter-graph node similarity with edit distance, called NED. In NED, two nodes are compared according to their local neighborhood topologies which are represented as unordered k-adjacent trees, without relying on any extra information. Due to the hardness of computing tree edit distance on unordered trees which is NP-Complete, we propose a modified tree edit distance, called TED*, for comparing unordered and unlabeled k-adjacent trees. TED* is a metric distance, as the original tree edit distance, but more importantly, TED* is polynomially computable. As a metric distance, NED admits efficient indexing, provides interpretable results, and shows to perform better than existing approaches on a number of data analysis tasks, including graph deanonymization. Finally, the efficiency and effectiveness of NED are empirically demonstrated using real-world graphs.
(Show Context)

Citation Context

...snetwork classification) [7]. Finally, another important application of inter-graph node similarity is to use it for de-anonymization. As an example, given an anonymous social network and certain nonanonymized information in the same domain, we could compare pairwise nodes to re-identify nodes in the anonymous social network by using the structural information from the non-anonymized corresponding graphs [7, 22]. In recent years, many node similarity measures have been proposed but most of them work only for nodes in the same graph (intra-graph). Examples include SimRank [8], SimRank variants [30, 2, 10], random walks with restart [27], influence-based methods [13], set matching methods [29, 11, 12] and so on. Unfortunately these methods cannot be used to measure the similarity between inter-graph nodes. The existing methods that can be used for inter-graph node similarity [1, 3, 7, 4] have their own issues. OddBall [1] and NetSimile [3] only consider the features in the ego-net (instant neighbors) which limits the neighborhood information. On the other hand, although ReFeX [7] and HITSbased similarity [4] consider larger neighborhood, they are not metrics and the absolute distance values bet...

Efficient SimRank Computation via Linearization1

by unknown authors
"... ar ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...imRank score is obtained by s(i, j) = E[cτi,j ]. (1.4) SimRank and its related measures (e.g., SimRank++ [Antonellis et al. 2008], SSimRank [Cai et al. 2008], P-Rank [Zhao et al. 2009], and SimRank∗ [=-=Yu et al. 2013-=-]) give high-quality scores in activities such as natural language processing [Scheible 2010], computational advertisement [Antonellis et al. 2008], collaborative filtering [Yu et al. 2010], and web a...

Efficient Top-K SimRank-based Similarity Join

by Wenbo Tao, Minghe Yu, Guoliang Li
"... SimRank is a popular and widely-adopted similarity mea-sure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most s ..."
Abstract - Add to MetaCart
SimRank is a popular and widely-adopted similarity mea-sure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most similar pairs. To address this problem, in this paper we study the top-k SimRank-based similarity join problem, which finds k most similar pairs of nodes with the largest SimRank similarities among all possible pairs. To the best of our knowledge, this is the first attempt to address this problem. We encode each node as a vector by summariz-ing its neighbors and transform the calculation of the Sim-Rank similarity between two nodes to computing the dot product between the corresponding vectors. We devise an efficient two-step framework to compute top-k similar pairs using the vectors. For large graphs, exact algorithms cannot meet the high-performance requirement, and we also devise an approximate algorithm which can efficiently identify top-k similar pairs under user-specified accuracy requirement. Experiments on both real and synthetic datasets show our method achieves high performance and good scalability. 1.
(Show Context)

Citation Context

...l database operation, namely, the traditional similarity join problem, where a user inputs a threshold t and requires the system to return all pairs of nodes whose SimRank values exceed t. Yu et. al. =-=[17]-=- addressed a “zero-SimRank” issue and proposed methods to improve the quality of the SimRank metric. Antonellis et. al. [1] proposed a refined SimRank similarity called SimRank++. Different from exist...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University