(Enter summary)
Abstract: We perform a statistical analysis of a large collection of Web
pages, focusing on spam detection. We study several metrics
such as degree correlations, number of neighbors, rank
propagation through links, TrustRank and others to build
several automatic web spam classifiers. This paper presents
a study of the performance of each of these classifiers alone,
as well as their combined performance. Using this approach
we are able to detect 80.4% of the Web spam in our sample,
with only 1.1% of false ... (Update)
Cited by: More
Link-Based Similarity Search to Fight Web Spam - Benczur, Csalogany, Sarlos (2006)
(Correct)
Active bibliography (related documents): More All
2.2: Using Rank Propagation and Probabilistic Counting.. - Becchetti.. (2006)
(Correct)
1.1: Propagating Trust and Distrust to Demote Web Spam - Baoning Wu Vinay (2006)
(Correct)
0.7: Generalizing PageRank: Damping Functions for.. - Baeza-Yates, Boldi.. (2006)
(Correct)
Similar documents based on text:
6.0: Unknown -
(Correct)
BibTeX entry: (Update)
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of web spam. In Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2006. http://citeseer.ist.psu.edu/becchetti06linkbased.html More
@misc{ becchetti06linkbased,
author = "L. Becchetti and C. Castillo and D. Donato and S. Leonardi and R. Baeza-Yates",
title = "Link-based characterization and detection of web spam",
text = "L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates.
Link-based characterization and detection of web spam. In Proceedings of
the 2nd International Workshop on Adversarial Information Retrieval on the
Web (AIRWeb), 2006.",
year = "2006",
url = "citeseer.ist.psu.edu/becchetti06linkbased.html" }
Citations (may not include all citations):
500
Experiments with a new boosting algorithm
- Freund, Schapire - 1996
372
Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto - 1999
344
The PageRank citation ranking: bringing order to the Web
- Page, Brin et al. - 1998
262
Data Mining: Practical Machine Learning Tools and Techniques.. (context) - Witten, Frank - 1999
37
External memory algorithms and data structures
- Vitter - 2001
23
ANF: a fast and scalable tool for data mining in massive gra..
- Palmer, Gibbons et al. - 2002
19
Ranking the web frontier
- Eiron, Curley et al. - 2004
15
Estimating the size of generalized transitive closures (context) - Lipton, Naughton - 1989
14
Combating web spam with trustrank (context) - Gyongyi, Garcia-Molina et al. - 2004
14
and statistics: Using statistical analysis to locate spam we.. (context) - Fetterly, Manasse et al. - 2004
14
Recognizing nepotistic links on the web
- Davison - 2000
11
Web spam taxonomy
- Gyongyi, Garcia-Molina - 2005
11
Dimacs Series In Discrete Mathematics And Theoretical Comput.. (context) - Henzinger, Raghavan et al. - 1999
10
Making eigenvector-based reputation systems robust to collus.. (context) - Zhang, Goel et al. - 2004
5
The indexable Web is more than (context) - Gulli, Signorini - 2005
4
Spamrank: fully automatic link spam detection
- Benczur, Csalogany et al. - 2005
3
The bubble of web visibility (context) - Gori, Witten - 2005
3
Detecting spam web pages through content analysis (context) - Ntoulas, Najork et al. - 2006
3
Thwarting the nigritude ultramarine: learning to identify li.. (context) - Drost, Sche - 2005
2
Networks of sexual contacts: implications for the pattern of.. (context) - Gupta, Anderson et al. - 1989
2
Discovering large dense subgraphs in massive graphs
- Gibson, Kumar et al. - 2005
2
Using rank propagation and probabilistic counting for link-b..
- Becchetti, Castillo et al. - 2006
2
On graph problems in a semi-streaming model
- Feigenbaum, Kannan et al. - 2004
1
Characterization of complex networks: A survey of measuremen.. (context) - Costa, Rodrigues et al. - 2005
1
Generalizing PageRank: Damping functions for link-based rank.. (context) - Baeza-Yates, Boldi et al. - 2006
1
space for passes in graph streaming problems (context) - Demetrescu, Finocchi et al. - 2006
Documents on the same site (http://airweb.cse.lehigh.edu/2006/): More
Link-Based Similarity Search to Fight Web Spam - Benczur, Csalogany, Sarlos (2006)
(Correct)
Web Spam Detection with Anti-Trust Rank - Vijay Krishnan Stanford (2006)
(Correct)
Adversarial Information Retrieval Aspects of Sponsored Search - Jansen (2006)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC