(Enter summary)
Abstract: this paper we will discuss some of the more important representational
issues for wrapper learners, focusing on the specific problem of
extracting text from web pages. We argue that pure DOM- or token-based
representations of web pages are inadequate for the purpose of learning
wrappers (Update)
Cited by: More
Automatic Discovery of Semantic Structures in HTML Documents - Saikat Mukherjee Guizhen
(Correct)
Learning to Extract Entities from Labeled and Unlabeled Text - Jones (2005)
(Correct)
Table Extraction Using Spatial Reasoning on the CSS2.. - Gatterbauer, Bohunsky (2006)
(Correct)
Similar documents (at the sentence level):
13.5%: A Structured Wrapper Induction System for Extracting.. - Cohen, Jensen (2001)
(Correct)
Active bibliography (related documents): More All
0.4: Improving A Page Classifier with Anchor Extraction And Link Analysis - Cohen (2002)
(Correct)
0.3: Why Table Ground-Truthing is Hard - Hu, Kashi, Lopresti, Nagy, Wilfong (2001)
(Correct)
0.3: Layout Language: Preliminary experiments in assigning logical .. - Hurst, Douglas
(Correct)
Similar documents based on text: More All
0.1: Small Screen Access to Digital Libraries - Gary Marsden Robert (2001)
(Correct)
0.1: Learning Page-Independent Heuristics for Extracting Data from.. - Cohen, Fan (1999)
(Correct)
0.1: Web-Collaborative Filtering: Recommending Music by Spidering The .. - Cohen, Fan (2000)
(Correct)
Related documents from co-citation: More All
6: Generating finite-state transducers for semistructured data extraction from the ..
- Hsu, Dung - 1998
6: A hierarchical approach to wrapper induction
- Muslea, Minton et al. - 1999
6: Wrapper induction for information extraction
- Kushmerick, Weld et al. - 1997
BibTeX entry: (Update)
William Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In The Eleventh International World Wide Web Conference WWW-2002, 2002. http://citeseer.ist.psu.edu/cohen02flexible.html More
@misc{ cohen02flexible,
author = "W. Cohen and M. Hurst and L. Jensen",
title = "A flexible learning system for wrapping tables and lists in html documents",
text = "William Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system
for wrapping tables and lists in html documents. In The Eleventh International
World Wide Web Conference WWW-2002, 2002.",
year = "2002",
url = "citeseer.ist.psu.edu/cohen02flexible.html" }
Citations (may not include all citations):
2177
programs for machine learning (context) - Quinlan - 1994
492
Learning logical definitions from relations (context) - Quinlan - 1990
317
Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
313
Inductive logic programming: Theory and methods
- Muggleton, De Raedt - 1994
140
A comparison of event models for naive bayes text classifica..
- McCallum, Nigam - 1998
106
FOIL: A midterm report
- Quinlan, Cameron-Jones - 1993
103
at forty: The independence assumption in information retriev.. (context) - Lewis, bayes - 1998
68
Empirical support for WINNOW and weighted majority algorithm..
- Blum - 1995
62
A hierarchical approach to wrapper induction
- Muslea, Minton et al. - 1999
45
Grammatically biased learning: learning logic programs using.. (context) - Cohen - 1994
43
Boosted wrapper induction
- Freitag, Kushmeric - 2000
41
Using maximum entropy for text classification
- Nigam, La et al. - 1999
34
Available from http:- //www (context) - language, version - 1999
26
Multistrategy learning for information extraction
- Freitag - 1998
20
Initial results on wrapping semistructured web pages with fi..
- Hsu - 1998
19
Recognizing structure in web pages using similarity queries
- Cohen - 1999
15
Building light weight wrapper legacy web datasource using WF (context) - Azavant, weight et al. - 1999
15
Learning page-independent heuristics for extracting data fro..
- Cohen, Fan - 1999
13
A structured wrapper induction system for extracting informa..
- Jensen, Cohen - 2001
12
ciency and expressiveness (context) - Kushmeric - 2000
10
The Interpretation of Tables in Texts (context) - Hurst - 2000
9
Advances in Inductive Logic Programming (context) - De Raedt - 1995
4
Wrapper generation by k-reversible grammar induction
- Chidlovskii - 2000
3
Declarative bias for general-tospecific ILP systems (context) - Ade, de Raedt et al. - 1995
3
Wrapper induction for semistructured information sources (context) - Muslea, Minton et al. - 1999
3
Tabular Abstraction (context) - Wang - 1996
2
Regression testing for wrapper maintenance (context) - Kushmeric - 1999
1
A Flexible Learning System for Wrapping Tables and Lists in .. (context) - specification, www et al. - 1999
1
Inducing deterministic Prolog parsers from 26 Cohen et al tr.. (context) - Zelle, Mooney - 1994
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www-2.cs.cmu.edu/~wcohen/postscript/): More
Probabilistic Hill-Climbing - Cohen, Greiner, Dale (1991)
(Correct)
Desiderata for Generalization-to-N Algorithms - Cohen (1994)
(Correct)
Polynomial Learnability and Inductive Logic Programming.. - Cohen, Page, Jr. (1995)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC