See this document in CiteSeerX!

A Flexible Learning System for Wrapping Tables and Lists in HTML Documents (2002)  (Make Corrections)  (19 citations)
William W. Cohen, Matthew Hurst, Lee S. Jensen



  Home/Search   Context   Related

 
View or download:
cmu.edu/~wcohen/post...wschap2002.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  cmu.edu/~wcohen/postscript/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: this paper we will discuss some of the more important representational issues for wrapper learners, focusing on the specific problem of extracting text from web pages. We argue that pure DOM- or token-based representations of web pages are inadequate for the purpose of learning wrappers (Update)

Cited by:   More
Automatic Discovery of Semantic Structures in HTML Documents - Saikat Mukherjee Guizhen   (Correct)
Learning to Extract Entities from Labeled and Unlabeled Text - Jones (2005)   (Correct)
Table Extraction Using Spatial Reasoning on the CSS2.. - Gatterbauer, Bohunsky (2006)   (Correct)

Similar documents (at the sentence level):
13.5%:   A Structured Wrapper Induction System for Extracting.. - Cohen, Jensen (2001)   (Correct)

Active bibliography (related documents):   More   All
0.4:   Improving A Page Classifier with Anchor Extraction And Link Analysis - Cohen (2002)   (Correct)
0.3:   Why Table Ground-Truthing is Hard - Hu, Kashi, Lopresti, Nagy, Wilfong (2001)   (Correct)
0.3:   Layout Language: Preliminary experiments in assigning logical .. - Hurst, Douglas   (Correct)

Similar documents based on text:   More   All
0.1:   Small Screen Access to Digital Libraries - Gary Marsden Robert (2001)   (Correct)
0.1:   Learning Page-Independent Heuristics for Extracting Data from.. - Cohen, Fan (1999)   (Correct)
0.1:   Web-Collaborative Filtering: Recommending Music by Spidering The .. - Cohen, Fan (2000)   (Correct)

Related documents from co-citation:   More   All
6:   Generating finite-state transducers for semistructured data extraction from the .. - Hsu, Dung - 1998
6:   A hierarchical approach to wrapper induction - Muslea, Minton et al. - 1999
6:   Wrapper induction for information extraction - Kushmerick, Weld et al. - 1997

BibTeX entry:   (Update)

William Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In The Eleventh International World Wide Web Conference WWW-2002, 2002. http://citeseer.ist.psu.edu/cohen02flexible.html   More

@misc{ cohen02flexible,
  author = "W. Cohen and M. Hurst and L. Jensen",
  title = "A flexible learning system for wrapping tables and lists in html documents",
  text = "William Cohen, Matthew Hurst, and Lee S. Jensen. A flexible learning system
    for wrapping tables and lists in html documents. In The Eleventh International
    World Wide Web Conference WWW-2002, 2002.",
  year = "2002",
  url = "citeseer.ist.psu.edu/cohen02flexible.html" }
Citations (may not include all citations):
2177   programs for machine learning (context) - Quinlan - 1994
492   Learning logical definitions from relations (context) - Quinlan - 1990
317   Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
313   Inductive logic programming: Theory and methods - Muggleton, De Raedt - 1994
140   A comparison of event models for naive bayes text classifica.. - McCallum, Nigam - 1998
106   FOIL: A midterm report - Quinlan, Cameron-Jones - 1993
103   at forty: The independence assumption in information retriev.. (context) - Lewis, bayes - 1998
68   Empirical support for WINNOW and weighted majority algorithm.. - Blum - 1995
62   A hierarchical approach to wrapper induction - Muslea, Minton et al. - 1999
45   Grammatically biased learning: learning logic programs using.. (context) - Cohen - 1994
43   Boosted wrapper induction - Freitag, Kushmeric - 2000
41   Using maximum entropy for text classification - Nigam, La et al. - 1999
34   Available from http:- //www (context) - language, version - 1999
26   Multistrategy learning for information extraction - Freitag - 1998
20   Initial results on wrapping semistructured web pages with fi.. - Hsu - 1998
19   Recognizing structure in web pages using similarity queries - Cohen - 1999
15   Building light weight wrapper legacy web datasource using WF (context) - Azavant, weight et al. - 1999
15   Learning page-independent heuristics for extracting data fro.. - Cohen, Fan - 1999
13   A structured wrapper induction system for extracting informa.. - Jensen, Cohen - 2001
12   ciency and expressiveness (context) - Kushmeric - 2000
10   The Interpretation of Tables in Texts (context) - Hurst - 2000
9   Advances in Inductive Logic Programming (context) - De Raedt - 1995
4   Wrapper generation by k-reversible grammar induction - Chidlovskii - 2000
3   Declarative bias for general-tospecific ILP systems (context) - Ade, de Raedt et al. - 1995
3   Wrapper induction for semistructured information sources (context) - Muslea, Minton et al. - 1999
3   Tabular Abstraction (context) - Wang - 1996
2   Regression testing for wrapper maintenance (context) - Kushmeric - 1999
1   A Flexible Learning System for Wrapping Tables and Lists in .. (context) - specification, www et al. - 1999
1   Inducing deterministic Prolog parsers from 26 Cohen et al tr.. (context) - Zelle, Mooney - 1994



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www-2.cs.cmu.edu/~wcohen/postscript/):   More
Probabilistic Hill-Climbing - Cohen, Greiner, Dale (1991)   (Correct)
Desiderata for Generalization-to-N Algorithms - Cohen (1994)   (Correct)
Polynomial Learnability and Inductive Logic Programming.. - Cohen, Page, Jr. (1995)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC