Results 1  10
of
35
A survey on pagerank computing
 Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Robust and Scalable Linked Data Reasoning Incorporating Provenance and Trust Annotations
, 2011
"... In this paper, we leverage annotated logic programs for tracking indicators of provenance and trust during reasoning, specifically focussing on the usecase of applying a scalable subset of OWL 2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three facet ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
In this paper, we leverage annotated logic programs for tracking indicators of provenance and trust during reasoning, specifically focussing on the usecase of applying a scalable subset of OWL 2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three facets of information: (i) blacklist: a (possibly manually generated) boolean annotation which indicates that the referent data are known to be harmful and should be ignored during reasoning; (ii) ranking: a numeric value derived by a PageRankinspired technique—adapted for Linked Data—which determines the centrality of certain data artefacts (such as RDF documents and statements); (iii) authority: a boolean value which uses Linked Data principles to conservatively determine whether or not some terminological information can be trusted. We formalise a logical framework which annotates inferences with the strength of derivation along these dimensions of trust and provenance; we formally demonstrate some desirable properties of the deployment of annotated logic programming in our setting, which guarantees (i) a unique minimal model (least fixpoint); (ii) monotonicity; (iii) finitariness; and (iv) finally decidability. In so doing, we also give some formal results which reveal strategies for scalable and efficient implementation of various reasoning tasks one might consider. Thereafter, we discuss scalable and distributed implementation strategies for applying our ranking and reasoning methods over a cluster of commodity hardware; throughout, we provide evaluation of our methods over 1 billion Linked Data quadruples crawled from approximately 4 million individual Web documents, empirically demonstrating the scalability of our approach, and how our
Efficient parallel computation of PageRank
 In Proc. 28th ECIR
, 2006
"... Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict hostbased link locality. In this paper we show that the GaußSeidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improv ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict hostbased link locality. In this paper we show that the GaußSeidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improve convergence. By introducing a twodimensional web model and by adapting the PageRank to this environment, we present and evaluate efficient methods to compute the exact rank vector even for largescale web graphs in only a few minutes and iteration steps, with intrinsic support for incremental web crawling, and without the need for page sorting/reordering or for sharing global information. 1
Exploiting RDFS and OWL for Integrating Heterogeneous, LargeScale, Linked Data Corpora
, 2011
"... The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the ..."
Abstract

Cited by 17 (11 self)
 Add to MetaCart
The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the form of an agreedupon data model and set of syntaxes, as well as metalanguages for publishing schemalevel information, offering a highlyinteroperable means of publishing and interlinking structured data on the Web. Thanks to the Linked Data community, an unprecedented lode of such data has now been published on the Web—by individuals, academia, communities, corporations and governmental organisations alike—on a medley of often overlapping topics. This new publishing paradigm has opened up a range of new and interesting research topics with respect to how this emergent “Web of Data” can be harnessed and exploited by consumers. Indeed, although Semantic
Distributed pagerank computation based on iterative aggregationdisaggregation methods
 Proceedings of the 14th ACM international conference on Information and knowledge management
, 2005
"... PageRank has been widely used as a major factor in search engine ranking systems. However, global link graph information is required when computing PageRank, which causes prohibitive communication cost to achieve accurate results in distributed solution. In this paper, we propose a distributed PageR ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
PageRank has been widely used as a major factor in search engine ranking systems. However, global link graph information is required when computing PageRank, which causes prohibitive communication cost to achieve accurate results in distributed solution. In this paper, we propose a distributed PageRank computation algorithm based on iterative aggregationdisaggregation (IAD) method with Block Jacobi smoothing. The basic idea is divideandconquer. We treat each web site as a node to explore the block structure of hyperlinks. Local PageRank is computed by each node itself and then updated with a low communication cost with a coordinator. We prove the global convergence of the Block Jacobi method and then analyze the communication overhead and major advantages of our algorithm. Experiments on three real web graphs show that our method converges 5–7 times faster than the traditional Power method. We believe our work provides an efficient and practical distributed solution for PageRank on large scale Web graphs.
PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES
"... Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can b ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can be computed separately from that of the dangling nodes. The algorithm applies the power method only to the smaller lumped matrix, but the convergence rate is the same as that of the power method applied to the full matrix G. The efficiency of the algorithm increases as the number of dangling nodes increases. We also extend the expression for PageRank and the algorithm to more general Google matrices that have several different dangling node vectors, when it is required to distinguish among different classes of dangling nodes. We also analyze the effect of the dangling node vector on the PageRank, and show that the PageRank of the dangling nodes depends strongly on that of the nondangling nodes but not vice versa. At last we present a Jordan decomposition of the Google matrix for the (theoretical) extreme case when all web pages are dangling nodes.
Asynchronous iterative computations with Web information retrieval structures: The PageRank case
, 2005
"... ..."
(Show Context)
Parallel Multilevel Algorithms for Hypergraph Partitioning
, 2007
"... In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoeffi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a stateoftheart serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multiphase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.
Hypergraph partitioning for faster parallel PageRank computation
 LECTURE NOTES IN COMPUTER SCIENCE 3670
, 2005
"... The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded a ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded and as demand for usertailored web page ordering metrics has grown, scalable parallel computation of PageRank has become a focus of considerable research effort. In this paper, we seek a scalable problem decomposition for parallel PageRank computation, through the use of stateoftheart hypergraphbased partitioning schemes. These have not been previously applied in this context. We consider both one and twodimensional hypergraph decomposition models. Exploiting the recent availability of the Parkway 2.1 parallel hypergraph partitioner, we present empirical results on a gigabit PC cluster for three publicly available web graphs. Our results show that hypergraphbased partitioning substantially reduces communication volume over conventional partitioning schemes (by up to three orders of magnitude), while still maintaining computational load balance. They also show a halving of the periteration runtime cost when compared to the most effective alternative approach used to date.
Google’s PageRank: The math behind the search engine
 Math. Intelligencer
, 2006
"... Approximately 94 million American adults use the internet on a typical day [24]. The number one internet activity is reading and writing email. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use sear ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Approximately 94 million American adults use the internet on a typical day [24]. The number one internet activity is reading and writing email. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though