See this document in CiteSeerX!

Link-Based Characterization and Detection of Web Spam (2006)  (Make Corrections)  (1 citation)
Luca Becchetti, Carlos Castillo, Debora Donato, Stefano Leonardi, Ricardo Baeza-Yates



  Home/Search   Context   Related

 
View or download:
lehigh.edu/2006/becchetti.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  lehigh.edu/2006/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false ... (Update)

Cited by:   More
Link-Based Similarity Search to Fight Web Spam - Benczur, Csalogany, Sarlos (2006)   (Correct)

Active bibliography (related documents):   More   All
2.2:   Using Rank Propagation and Probabilistic Counting.. - Becchetti.. (2006)   (Correct)
1.1:   Propagating Trust and Distrust to Demote Web Spam - Baoning Wu Vinay (2006)   (Correct)
0.7:   Generalizing PageRank: Damping Functions for.. - Baeza-Yates, Boldi.. (2006)   (Correct)

Similar documents based on text:
6.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of web spam. In Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2006. http://citeseer.ist.psu.edu/becchetti06linkbased.html   More

@misc{ becchetti06linkbased,
  author = "L. Becchetti and C. Castillo and D. Donato and S. Leonardi and R. Baeza-Yates",
  title = "Link-based characterization and detection of web spam",
  text = "L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates.
    Link-based characterization and detection of web spam. In Proceedings of
    the 2nd International Workshop on Adversarial Information Retrieval on the
    Web (AIRWeb), 2006.",
  year = "2006",
  url = "citeseer.ist.psu.edu/becchetti06linkbased.html" }
Citations (may not include all citations):
500   Experiments with a new boosting algorithm - Freund, Schapire - 1996
372   Modern Information Retrieval - Baeza-Yates, Ribeiro-Neto - 1999
344   The PageRank citation ranking: bringing order to the Web - Page, Brin et al. - 1998
262   Data Mining: Practical Machine Learning Tools and Techniques.. (context) - Witten, Frank - 1999
37   External memory algorithms and data structures - Vitter - 2001
23   ANF: a fast and scalable tool for data mining in massive gra.. - Palmer, Gibbons et al. - 2002
19   Ranking the web frontier - Eiron, Curley et al. - 2004
15   Estimating the size of generalized transitive closures (context) - Lipton, Naughton - 1989
14   Combating web spam with trustrank (context) - Gyongyi, Garcia-Molina et al. - 2004
14   and statistics: Using statistical analysis to locate spam we.. (context) - Fetterly, Manasse et al. - 2004
14   Recognizing nepotistic links on the web - Davison - 2000
11   Web spam taxonomy - Gyongyi, Garcia-Molina - 2005
11   Dimacs Series In Discrete Mathematics And Theoretical Comput.. (context) - Henzinger, Raghavan et al. - 1999
10   Making eigenvector-based reputation systems robust to collus.. (context) - Zhang, Goel et al. - 2004
5   The indexable Web is more than (context) - Gulli, Signorini - 2005
4   Spamrank: fully automatic link spam detection - Benczur, Csalogany et al. - 2005
3   The bubble of web visibility (context) - Gori, Witten - 2005
3   Detecting spam web pages through content analysis (context) - Ntoulas, Najork et al. - 2006
3   Thwarting the nigritude ultramarine: learning to identify li.. (context) - Drost, Sche - 2005
2   Networks of sexual contacts: implications for the pattern of.. (context) - Gupta, Anderson et al. - 1989
2   Discovering large dense subgraphs in massive graphs - Gibson, Kumar et al. - 2005
2   Using rank propagation and probabilistic counting for link-b.. - Becchetti, Castillo et al. - 2006
2   On graph problems in a semi-streaming model - Feigenbaum, Kannan et al. - 2004
1   Characterization of complex networks: A survey of measuremen.. (context) - Costa, Rodrigues et al. - 2005
1   Generalizing PageRank: Damping functions for link-based rank.. (context) - Baeza-Yates, Boldi et al. - 2006
1   space for passes in graph streaming problems (context) - Demetrescu, Finocchi et al. - 2006

Documents on the same site (http://airweb.cse.lehigh.edu/2006/):   More
Link-Based Similarity Search to Fight Web Spam - Benczur, Csalogany, Sarlos (2006)   (Correct)
Web Spam Detection with Anti-Trust Rank - Vijay Krishnan Stanford (2006)   (Correct)
Adversarial Information Retrieval Aspects of Sponsored Search - Jansen (2006)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC