• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Combating web spam with trustrank (2004)

Cached

  • Download as a PDF

Download Links

  • [www-db.stanford.edu]
  • [ilpubs.stanford.edu:8090]
  • [www.acroterion.ca]
  • [fravia.2113.ch]
  • [infolab.stanford.edu]
  • [i.stanford.edu]
  • [www.woodmann.com]
  • [i.stanford.edu]
  • [infolab.stanford.edu]
  • [citeseerx.ist.psu.edu]
  • [www.search3w.com]
  • [www.actulligence.com]
  • [cs.wellesley.edu]
  • [www.network-defense.de]
  • [ilpubs.stanford.edu:8090]
  • [www.vldb.org]
  • [www.kibazen.com]
  • [internetstudios.typepad.com]
  • [www.cs.toronto.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Zoltán Gyöngyi , Hector Garcia-molina , Jan Pedersen
Venue:In VLDB
Citations:412 - 3 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Gyöngyi04combatingweb,
    author = {Zoltán Gyöngyi and Hector Garcia-molina and Jan Pedersen},
    title = {Combating web spam with trustrank},
    booktitle = {In VLDB},
    year = {2004},
    pages = {576--587},
    publisher = {Morgan Kaufmann}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites. 1

Keyphrases

web spam    good page    higher-than-deserved ranking    world wide web    significant fraction    search engine result    seed selection    small set    link structure    web spam page    various technique    large number    reputable seed page    semi-automatically separate reputable    possible way    good seed    human expert    present result    seed page   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University