See this document in CiteSeerX!

Automatic Web News Extraction Using Tree Edit Distance (2004)  (Make Corrections)  (5 citations)
Davi De Castro Reis, Paulo B. Golgher, Alberto H.F. Laender, Altigran S. da Silva



  Home/Search   Context   Related

 
View or download:
www2004.org/proceedings/doc...1p502.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  www2004.org/proceeding...contents (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within this huge repository of data. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results. In this paper, we present a domain-oriented approach to... (Update)

Cited by:   More
Thresher: Automating the Unwrapping of Semantic - Content From The (2005)   (Correct)
Template Detection for Large Scale Search Engines - Chen, Ye, Li (2006)   (Correct)
The Anatomy of a News Search Engine - Gulli (2005)   (Correct)

Active bibliography (related documents):   More   All
0.8:   Classification Techniques for Categorization of Hypertext Documents - Arumugam   (Correct)
0.5:   Fine-grain Web Site Structure Discovery - Valter Crescenzi Crescenz   (Correct)
0.5:   Tree Edit Distance, Alignment Distance and Inclusion - Bille (2003)   (Correct)

Similar documents based on text:   More   All
0.7:   CoBWeb - A Crawler for the Brazilian Web - Silva, Veloso, Golgher.. (1999)   (Correct)
0.5:   CAPPLES - A Capacity Planning and Performance.. - Silva, Laender.. (1999)   (Correct)
0.5:   Characterizing a Synthetic Workload for Performance.. - Silva, Laender.. (2000)   (Correct)

Related documents from co-citation:   More   All
3:   Sticky notes for the semantic web - Karger, Katz et al. - 2003
3:   New Tools for the Semantic Web (context) - Golbeck, Grove et al. - 2002
3:   Tree pattern inference and matching for wrapper induction on the World Wide Web (context) - Hogue

BibTeX entry:   (Update)

D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web news extraction using tree edit distance. In Proceedings of the 13th International Conference on the World Wide Web, pages 502--511, New York, NY, 2004. http://citeseer.ist.psu.edu/reis04automatic.html   More

@misc{ reis04automatic,
  author = "D. Reis and P. Golgher and A. Silva and A. Laender",
  title = "Automatic web news extraction using tree edit distance",
  text = "D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web
    news extraction using tree edit distance. In Proceedings of the 13th International
    Conference on the World Wide Web, pages 502--511, New York, NY, 2004.",
  year = "2004",
  url = "citeseer.ist.psu.edu/reis04automatic.html" }
Citations (may not include all citations):
372   Modern Information Retrieval - Baeza-Yates, Ribeiro-Neto - 1999
198   Database techniques for the world-wide web: a survey - Florescu, Levy et al. - 1998
64   extensible web crawler (context) - Heydon, Najork et al. - 1999
57   The tree-to-tree correction problem (context) - Tai - 1979
48   RoadRunner: Towards automatic data extraction from large Web.. - Crescenzi, Mecca et al. - 2001
46   Recent trends in hierarchic document clustering: a critical .. (context) - Willett - 1988
44   Xtract: a system for extracting document type descriptors fr.. - Garofalakis, Gionis et al. - 2000
36   Mining data records in web pages - Liu, Grossman et al. - 2003
28   The tree-to-tree editing problem (context) - Selkow - 1977
25   A brief survey of Web data extraction tools - Laender, Ribeiro-Neto et al. - 2002
25   the editing distance between unordered labeled trees (context) - Zhang, Statman et al. - 1992
24   Extracting structured data from web pages (context) - Arasu, Garcia-Molina et al. - 2003
23   Approximate tree matching in the presence of variable length.. - Zhang, Shasha et al. - 1994
17   Identifying syntactic differences between two programs - Yang - 1991
14   Comparing hierarchical data in external memory - Chawathe - 1999
7   Evaluating structural similarity in XML documents - Nierman, Jagadish - 2002
6   An algorithm for finding the largest approximately common su.. - Wang, Shapiro et al. - 1998
3   New algorithm for ordered tree-to-tree correction problem (context) - Chen - 2001
2   Wrapping-oriented classification of Web pages (context) - Crescenzi, Mecca et al. - 2002
2   Changedetector : a site-level monitoring tool for the WWW (context) - Boyapati, Chevrier et al. - 2002
1   the complexity of schema inference from web pages in the pre.. (context) - Yang, Ramakrishnan et al. - 2003
1   Efficient extraction of schemas for xml documents (context) - Min, Ahn et al. - 2003
1   Automatic annotation of data extraction from large Web sites (context) - Arllota, Crescenzi et al. - 2003
1   Finding similar consensus between trees: an algorithm and a .. (context) - Wang, Zhang - 2001
1   An efficient bottom-up distance between trees (context) - Valiente - 2001
1   Tree edit distance and common subtrees (context) - Valiente - 2002



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.www2004.org/proceedings/docs/contents.htm):   More
A Community-Aware Search Engine - Almeida, Almeida (2004)   (Correct)
HuskySim: A Simulation Toolkit for Application Scheduling.. - Kerasha, Greenshields (2004)   (Correct)
Reactive Rules Inference from Dynamic Dependency Models - Adi, Etzion, Gilat.. (2004)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC