| Alternate document: Details Learning Text Analysis Rules for Domain-Specific Natural Language Processing (97) Stephen G. Soderland |
(Enter summary)
Abstract: There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper 1 introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues. Output from Webfoot is then passed on to CRYSTAL, an NLP system that learns... (Update)
Cited by: More
Hierarchies in HTML Documents: Linking Text to Concepts - Burget (2004)
(Correct)
LiveClassifier: Creating Hierarchical Text Classifiers.. - Huang, Chuang, Chien (2004)
(Correct)
Wrapper Generation via Grammar Induction - Chidlovskii, Ragetli, de Rijke (2000)
(Correct)
Active bibliography (related documents): More All
0.3: Information Extraction: Techniques and Challenges - Grishman (1997)
(Correct)
0.2: Applying Machine Learning for High Performance.. - Baluja, Mittal.. (2000)
(Correct)
0.2: Towards Automatic Acquisition of Patterns for Information.. - Nobata, Sekine (1999)
(Correct)
Similar documents based on text: More All
0.4: Information Extraction from Tree Documents by Learning Subtree.. - Chidlovskii (2003)
(Correct)
0.1: The "ebay Of Blank": Digital Intermediation In Electronic.. - Chircu, Kauffman (2000)
(Correct)
0.1: Maximizing the Value of Internet-Based Corporate Travel.. - Chircu, Kauffman, al. (2000)
(Correct)
Related documents from co-citation: More All
51: Wrapper induction for information extraction
- Kushmerick, Weld et al. - 1997
31: A scalable comparison-shopping agent for the world-wide web
- Doorenbos, Etzioni et al. - 1997
22: Wrapper Generation for Semi-structured Internet Sources
- Ashish, Knoblock
BibTeX entry: (Update)
Stephen Soderland. Learning to extract text-based information from the world wide web. In Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997. http://citeseer.ist.psu.edu/soderland97learning.html More
@inproceedings{ soderland97learning,
author = "Stephen Soderland",
title = "Learning to Extract Text-Based Information from the World Wide Web",
booktitle = "Knowledge Discovery and Data Mining",
pages = "251-254",
year = "1997",
url = "citeseer.ist.psu.edu/soderland97learning.html" }
Citations (may not include all citations):
392
A Theory and Methodology of Inductive Learning (context) - Michalski - 1983
233
The CN2 Induction Algorithm
- Clark, Niblett - 1989
228
Wrapper Induction for Information Extraction
- Kushmerick, Weld et al. - 1997
171
A Scalable Comparison-Shopping Agent for the World-Wide Web
- Doorenbos, Etzioni et al. - 1997
79
CRYSTAL: Inducing a Conceptual Dictionary
- Soderland, Fisher et al. - 1995
19
The NYU System for MUC-6 or Where's the Syntax
- Grishman - 1995
9
BBN: Description of the PLUM System as Used for MUC (context) - Weischedel - 1995
7
SRA: Description of the SRA System as Used for MUC (context) - Krupka - 1995
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.html): More
Automatically Learned vs. Hand-crafted Text Analysis Rules - Soderland, Fisher, Lehnert
(Correct)
Description of the UMass System as Used for MUC-6 - Fisher, Soderland.. (1995)
(Correct)
Machine Learning of Text Analysis Rules for Clinical.. - Soderland, Aronow.. (1995)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC