See this document in CiteSeerX!

Generalizing wrapper induction across multiple Web sites using named entities and post-processing (2003)  (Make Corrections)  
Georgios Sigletos, Georgios Paliouras, Constantine D. Spyropoulos, Michalis Hatzopoulos



  Home/Search   Context   Related

 
View or download:
iit.demokritos.gr/~pali...EWMF2003a.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  faure.isti.cnr....CP2(Google)pdf (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: This paper presents a novel method for extracting information from collections of Web pages across different sites. Our method uses a standard wrapper induction algorithm and exploits named entity information. We introduce the idea of post-processing the extraction results for resolving ambiguous facts and improve the overall extraction performance. Postprocessing involves the exploitation of two additional sources of information: fact transition probabilities, based on a trained bigram... (Update)

Active bibliography (related documents):   More   All
1.3:   Mining Web sites using wrapper induction, named.. - Sigletos.. (2003)   (Correct)
0.3:   On Precision and Recall of Multi-Attribute Data.. - Yang, Mukherjee.. (2003)   (Correct)
0.2:   Active Learning with Strong and Weak Views: A Case Study .. - Muslea, Minton, Knoblock (2003)   (Correct)

Similar documents based on text:
0.5:   Meta-learning beyond classification: A framework.. - Sigletos.. (2003)   (Correct)

BibTeX entry:   (Update)

@misc{ sigletos-generalizing,
  author = "Georgios Sigletos and Georgios Paliouras and Constantine D. Spyropoulos
    and Michalis Hatzopoulos",
  title = "Generalizing wrapper induction across multiple Web sites using named entities
    and post-processing",
  url = "citeseer.ist.psu.edu/sigletos03generalizing.html" }
Citations (may not include all citations):
1362   A tutorial on hidden Markov models and selected applications.. (context) - Rabiner - 1989
228   Wrapper induction for Information Extraction - Kushmerick - 1997
139   Machine Learning in Automated Text Categorization - Sebastiani - 2002
51   Learning stochastic regular grammars by means of a state-mer.. - Carrasco, Oncina - 1994
50   Learning hidden Markov model structure for information extra.. - Seymore, McCallum et al. - 1999
50   Hierarchical Wrapper Induction for Semistructured Informatio.. - Muslea, Minton et al. - 2001
46   Adaptive Information Extraction from Text by Rule Induction .. (context) - Ciravegna - 2001
19   A Flexible Learning System for Wrapping Tables and Lists in .. - Cohen, Hurst et al. - 2002
15   Learning page-independent heuristics for extracting data fro.. - Cohen, Fan - 1999
14   Information Extraction using HMMs and Shrinkage (context) - Freitag, McCallum - 1999
5   Active Learning with multiple views - Muslea - 2002
3   Extraction Techniques for Mining Services from Web Sources - Davulcu, Mukherjee et al. - 2002
2   Annotating Web pages for the needs of Web Information Extrac.. - Sigletos, Farmakiotou et al. - 2003
http://www.itl.nist.gov/iaui/894.02/related_pr

Documents on the same site (http://faure.isti.cnr.it/~fabrizio/CP2(Google)-pdf.html):   More
Deliverable Identification Sheet - Project Ref No   (Correct)
How Weak Text Categorizers Can Strengthen Performance.. - Uren, Addis (2001)   (Correct)
Low level information extraction: a Bayesian network based.. - Bouckaert   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC