(Enter summary)
Abstract: We propose new features and algorithms for automating Web-page
classification tasks such as content recommendation and ad blocking.
We show that the automated classification of Web pages can
be much improved if, instead of looking at their textual content, we
consider each links's URL and the visual placement of those links
on a referring page. These features are unusual: rather than being
scalar measurements like word counts they are tree structured---
describing the position of the item in a... (Update)
Cited by: More
Thresher: Automating the Unwrapping of Semantic - Content From The (2005)
(Correct)
Thresher: Automating the Unwrapping of Semantic Content from.. - Hogue, Karger (2005)
(Correct)
Active bibliography (related documents): More All
0.7: Classification Techniques for Categorization of Hypertext Documents - Arumugam
(Correct)
0.3: Sentence Extraction by tf/idf and Position Weighting from.. - Seki
(Correct)
0.3: Sentence Alignment for Monolingual Comparable Corpora - Regina Barzilay Cornell (2003)
(Correct)
Similar documents based on text: More All
0.2: Belief Layer for Haystack - Zhurakhinskaya (2002)
(Correct)
0.1: An Experimental Study of Poly-Logarithmic.. - Iyer, Jr., Karger.. (2000)
(Correct)
0.1: Experimental Study of Minimum Cut Algorithms - Chekuri, Goldberg, Karger.. (1997)
(Correct)
Related documents from co-citation: More All
3: Annotea: An Open RDF Infrastructure for Shared Web Annotations
- Kahan, Koivunen et al. - 2001
3: Wrapper induction for information extraction
- Kushmerick, Weld et al. - 1997
3: New Tools for the Semantic Web (context) - Golbeck, Grove et al. - 2002
BibTeX entry: (Update)
L. K. Shih and D. Karger. Using URLs and table layout for web classification tasks. In Proceedings of the 13th International Conference on the World Wide Web, pages 193--202, New York, NY, 2004. http://citeseer.ist.psu.edu/shih04using.html More
@misc{ shih04using,
author = "L. Shih and D. Karger",
title = "Using URLs and table layout for web classification tasks",
text = "L. K. Shih and D. Karger. Using URLs and table layout for web classification
tasks. In Proceedings of the 13th International Conference on the World
Wide Web, pages 193--202, New York, NY, 2004.",
year = "2004",
url = "citeseer.ist.psu.edu/shih04using.html" }
Citations (may not include all citations):
641
The anatomy of a large-scale hypertextual Web search engine
- Brin, Page - 1998
431
A tutorial on support vector machines for pattern recognitio..
- Burges - 1998
228
Wrapper induction for information extraction
- Kushmerick, Weld et al. - 1997
149
Quantifying inductive bias: AI learning algorithms and Valia.. (context) - Haussler - 1988
135
Hierarchically classifying documents using very few words
- Koller, Sahami - 1997
130
A probabilistic analysis of the rocchio algorithm with tfidf..
- Joachims - 1997
91
Learning and revising user profiles: The identification of i..
- Pazzani, Billsus - 1997
65
On integrating catalogs
- Agrawal, Srikant - 2001
61
Improving text classification by shrinkage in a hierarchy of..
- McCallum, Rosenfeld et al. - 1998
49
Wiley and Sons (context) - Duda, Hart et al. - 1973
42
Nonparametric Statistical Methods (context) - Hollander, Wolfe - 1973
41
Using reinforcement learning to spider the Web efficiently
- Rennie, McCallum - 1999
26
A hybrid user model for news story classification
- Billsus, Pazzani - 1999
19
generative classifiers: A comparison of logistic regression .. (context) - Ng, Jordan et al. - 2002
15
Accelerated focused crawling through online relevance feedba..
- Chakrabarti, Punera et al. - 2002
13
Web montage: a dynamic personalized start page (context) - Anderson, Horvitz - 2002
12
Learning to remove internet advertisement
- Kushmerick - 1999
9
httpfive percent nation (context) - http, nation et al. - 2000
3
Inferring strategies for sentence ordering in multidocument ..
- Barzilay, Elhadad et al. - 2002
1
Massachusetts Institute of Technology (context) - Shih, of et al. - 2004
Documents on the same site (http://www.www2004.org/proceedings/docs/contents.htm): More
A Community-Aware Search Engine - Almeida, Almeida (2004)
(Correct)
HuskySim: A Simulation Toolkit for Application Scheduling.. - Kerasha, Greenshields (2004)
(Correct)
Reactive Rules Inference from Dynamic Dependency Models - Adi, Etzion, Gilat.. (2004)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC