• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks 33(1-6 (2000)

by R Lempel, S Moran
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 128
Next 10 →

Deeper inside pagerank

by Amy N. Langville, Carl D. Meyer - Internet Mathematics , 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract - Cited by 107 (4 self) - Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.

Ranking the Web Frontier

by Nadav Eiron, Kevin S. McCurley, John A. Tomlin , 2004
"... The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze f ..."
Abstract - Cited by 85 (0 self) - Add to MetaCart
The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.

Algorithms for estimating relative importance in networks

by Scott White, Padhraic Smyth - In Proceedings of KDD 2003 , 2003
"... Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which e ..."
Abstract - Cited by 78 (4 self) - Add to MetaCart
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which entities are most important in the network relative to a particular individual or set of individuals? ” We investigate the problem of answering such queries in this paper, focusing in particular on defining and computing the importance of nodes in a graph relative to one or more root nodes. We define a general framework and a number of different algorithms, building on ideas from social networks, graph theory, Markov models, and Web graph analysis. We experimentally evaluate the different properties of these algorithms on toy graphs and demonstrate how our approach can be used to study relative importance in real-world networks including a network of interactions among September 11th terrorists, a network of collaborative research in biotechnology among companies and universities, and a network of co-authorship relationships among computer science researchers.

Identifying Link Farm Spam Pages

by Baoning Wu, Brian D. Davison - Proceedings of the 14th International World Wide Web Conference , 2005
"... With the increasing importance of search in guiding today’s web traffic, more and more effort has been spent to create search engine spam. Since link analysis is one of the most important factors in current commercial search engines’ ranking systems, new kinds of spam aiming at links have appeared. ..."
Abstract - Cited by 73 (10 self) - Add to MetaCart
With the increasing importance of search in guiding today’s web traffic, more and more effort has been spent to create search engine spam. Since link analysis is one of the most important factors in current commercial search engines’ ranking systems, new kinds of spam aiming at links have appeared. Building link farms is one technique that can deteriorate link-based ranking algorithms. In this paper, we present algorithms for detecting these link farms automatically by first generating a seed set based on the common link set between incoming and outgoing links of Web pages and then expanding it. Links between identified pages are reweighted, providing a modified web graph to use in ranking page importance. Experimental results show that we can identify most link farm spam pages and the final ranking results are improved for almost all tested queries.

Finding Authorities and Hubs From Link Structures on the World Wide Web

by Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Panayiotis Tsaparas - In Proceedings of the 10th International World Wide Web Conference, Hong Kong , 2001
"... Recently, there have been a number of algorithms proposed for analyzing hypertext link structure so as to determine the best "authorities" for a given topic or query. While such analysis is usually combined with content analysis, there is a sense in which some algorithms are deemed to be "more balan ..."
Abstract - Cited by 63 (7 self) - Add to MetaCart
Recently, there have been a number of algorithms proposed for analyzing hypertext link structure so as to determine the best "authorities" for a given topic or query. While such analysis is usually combined with content analysis, there is a sense in which some algorithms are deemed to be "more balanced" and others "more focused". We undertake a comparative study of hypertext link analysis algorithms. Guided by some experimental queries, we propose some formal criteria for evaluating and comparing link analysis algorithms. Keywords: link analysis, web searching, hubs, authorities, SALSA, Kleinberg's algorithm, threshold, Bayesian. 1

Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction

by Soumen Chakrabarti , 2001
"... Topic distillation is the process of finding authoritative Web pages a comprehensive "hubs" which reciprocally endorse each other and are relevant to a given query. Hyperlink-based topic distillation has been traditionally applied to a macroscopic Web model where documents are nodes in a directed gr ..."
Abstract - Cited by 57 (2 self) - Add to MetaCart
Topic distillation is the process of finding authoritative Web pages a comprehensive "hubs" which reciprocally endorse each other and are relevant to a given query. Hyperlink-based topic distillation has been traditionally applied to a macroscopic Web model where documents are nodes in a directed graph and hyperlinks are edges.Mas.M::[KP models miss va lua44 clues such aba4'::M na viga::M paa els,as templa]M2'0]K inclusions, whicha: embedded in HTML paLM using ma0KP taKP Consequently, results of ma:]6:1M2' distillaKP] atillaKP have been deterioraKP] inqua:1 ya s Webpa0: a becoming more complex. We propose a uniformfine-gra'K] model for the Web in which pa:] a represented by theirta trees (aes caesM their Document Object Models or DOMs)aM these DOM trees ar interconnected by ordinaM hyperlinks. Surprisingly, ma]6:[M2K' distillaKKP atillaKK do not work in the finegra -M: scena:]6 We present a new awM0PK1P suitaK1 for the fine-gra2K0 model. It can dis-aggregate hubs into coherent regions by segmenting their DO trees.utua endorsement between hubs as aM0[1['M2K involve these regions, rans, tha single nodes representing complete hubs. Anecdotesae meatesMP' ts using a 28-query, 366000-document benchmark suite, used in ea0]K4 topic distilla[M2 reseai h, reveal two benefits from the new aM:0KK6M2 distillastion quati y improves, a,a by-product of distillation is the aeM14 y to extra0 relevat snippets from hubs which a: nonly payM40[K relevant to the query.

SpamRank - Fully Automatic Link Spam Detection

by Andras A. Benczur, Karoly Csalogany, Tamas Sarlos, Mate Uher, Máté Uher - In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb , 2005
"... Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists ..."
Abstract - Cited by 57 (4 self) - Add to MetaCart
Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages that originate a suspicious PageRank share and personalizing PageRank on the penalties. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page stratified random sample with bias towards large PageRank values.

A Survey of Web Metrics

by Devanshu Dhyani , Wee Keong Ng, Sourav S. Bhowmick - ACM COMPUTING SURVEYS , 2002
"... ... this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search a ..."
Abstract - Cited by 46 (0 self) - Add to MetaCart
... this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties. We also discuss how these metrics can be applied for improving Web information access and use.

A survey of eigenvector methods of web information retrieval

by Amy N. Langville, Carl, D. Meyer - SIAM Rev
"... Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has bee ..."
Abstract - Cited by 46 (5 self) - Add to MetaCart
Abstract. Web information retrieval is significantly more challenging than traditional wellcontrolled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web’s hyperlink structure. This structure has been exploited by several of today’s leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.

Optimized Query Execution in Large Search Engines with Global Page Ordering

by Xiaohui Long, Torsten Suel , 2003
"... Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the ran ..."
Abstract - Cited by 45 (7 self) - Add to MetaCart
Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University