Alternate document:   Details   Learning Text Analysis Rules for Domain-Specific Natural Language Processing (97) Stephen G. Soderland

See this document in CiteSeerX!

Learning to Extract Text-based Information from the World Wide Web (1997)  (Make Corrections)  (68 citations)
Stephen Soderland
Knowledge Discovery and Data Mining



  Home/Search   Context   Related

 
View or download:
umass.edu/pubs/SoderlandKDD97.ps
washington.edu/homes/soderl...kdd97.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  umass.edu/info/psfiles/t...tepubs (more)
From:  jprc.com/Artificial_Intel...index
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper 1 introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues. Output from Webfoot is then passed on to CRYSTAL, an NLP system that learns... (Update)

Cited by:   More
Hierarchies in HTML Documents: Linking Text to Concepts - Burget (2004)   (Correct)
LiveClassifier: Creating Hierarchical Text Classifiers.. - Huang, Chuang, Chien (2004)   (Correct)
Wrapper Generation via Grammar Induction - Chidlovskii, Ragetli, de Rijke (2000)   (Correct)

Active bibliography (related documents):   More   All
0.3:   Information Extraction: Techniques and Challenges - Grishman (1997)   (Correct)
0.2:   Applying Machine Learning for High Performance.. - Baluja, Mittal.. (2000)   (Correct)
0.2:   Towards Automatic Acquisition of Patterns for Information.. - Nobata, Sekine (1999)   (Correct)

Similar documents based on text:   More   All
0.4:   Information Extraction from Tree Documents by Learning Subtree.. - Chidlovskii (2003)   (Correct)
0.1:   The "ebay Of Blank": Digital Intermediation In Electronic.. - Chircu, Kauffman (2000)   (Correct)
0.1:   Maximizing the Value of Internet-Based Corporate Travel.. - Chircu, Kauffman, al. (2000)   (Correct)

Related documents from co-citation:   More   All
51:   Wrapper induction for information extraction - Kushmerick, Weld et al. - 1997
31:   A scalable comparison-shopping agent for the world-wide web - Doorenbos, Etzioni et al. - 1997
22:   Wrapper Generation for Semi-structured Internet Sources - Ashish, Knoblock

BibTeX entry:   (Update)

Stephen Soderland. Learning to extract text-based information from the world wide web. In Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997. http://citeseer.ist.psu.edu/soderland97learning.html   More

@inproceedings{ soderland97learning,
    author = "Stephen Soderland",
    title = "Learning to Extract Text-Based Information from the World Wide Web",
    booktitle = "Knowledge Discovery and Data Mining",
    pages = "251-254",
    year = "1997",
    url = "citeseer.ist.psu.edu/soderland97learning.html" }
Citations (may not include all citations):
392   A Theory and Methodology of Inductive Learning (context) - Michalski - 1983
233   The CN2 Induction Algorithm - Clark, Niblett - 1989
228   Wrapper Induction for Information Extraction - Kushmerick, Weld et al. - 1997
171   A Scalable Comparison-Shopping Agent for the World-Wide Web - Doorenbos, Etzioni et al. - 1997
79   CRYSTAL: Inducing a Conceptual Dictionary - Soderland, Fisher et al. - 1995
19   The NYU System for MUC-6 or Where's the Syntax - Grishman - 1995
9   BBN: Description of the PLUM System as Used for MUC (context) - Weischedel - 1995
7   SRA: Description of the SRA System as Used for MUC (context) - Krupka - 1995



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.html):   More
Automatically Learned vs. Hand-crafted Text Analysis Rules - Soderland, Fisher, Lehnert   (Correct)
Description of the UMass System as Used for MUC-6 - Fisher, Soderland.. (1995)   (Correct)
Machine Learning of Text Analysis Rules for Clinical.. - Soderland, Aronow.. (1995)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC