(Enter summary)
Abstract: The Web poses itself as the largest data repository ever available in
the history of humankind. Major efforts have been made in order
to provide efficient access to relevant information within this huge
repository of data. Although several techniques have been developed
to the problem of Web data extraction, their use is still not
spread, mostly because of the need for high human intervention and
the low quality of the extraction results. In this paper, we present
a domain-oriented approach to... (Update)
Cited by: More
Thresher: Automating the Unwrapping of Semantic - Content From The (2005)
(Correct)
Template Detection for Large Scale Search Engines - Chen, Ye, Li (2006)
(Correct)
The Anatomy of a News Search Engine - Gulli (2005)
(Correct)
Active bibliography (related documents): More All
0.8: Classification Techniques for Categorization of Hypertext Documents - Arumugam
(Correct)
0.5: Fine-grain Web Site Structure Discovery - Valter Crescenzi Crescenz
(Correct)
0.5: Tree Edit Distance, Alignment Distance and Inclusion - Bille (2003)
(Correct)
Similar documents based on text: More All
0.7: CoBWeb - A Crawler for the Brazilian Web - Silva, Veloso, Golgher.. (1999)
(Correct)
0.5: CAPPLES - A Capacity Planning and Performance.. - Silva, Laender.. (1999)
(Correct)
0.5: Characterizing a Synthetic Workload for Performance.. - Silva, Laender.. (2000)
(Correct)
Related documents from co-citation: More All
3: Sticky notes for the semantic web
- Karger, Katz et al. - 2003
3: New Tools for the Semantic Web (context) - Golbeck, Grove et al. - 2002
3: Tree pattern inference and matching for wrapper induction on the World Wide Web (context) - Hogue
BibTeX entry: (Update)
D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web news extraction using tree edit distance. In Proceedings of the 13th International Conference on the World Wide Web, pages 502--511, New York, NY, 2004. http://citeseer.ist.psu.edu/reis04automatic.html More
@misc{ reis04automatic,
author = "D. Reis and P. Golgher and A. Silva and A. Laender",
title = "Automatic web news extraction using tree edit distance",
text = "D. C. Reis, P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic web
news extraction using tree edit distance. In Proceedings of the 13th International
Conference on the World Wide Web, pages 502--511, New York, NY, 2004.",
year = "2004",
url = "citeseer.ist.psu.edu/reis04automatic.html" }
Citations (may not include all citations):
372
Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto - 1999
198
Database techniques for the world-wide web: a survey
- Florescu, Levy et al. - 1998
64
extensible web crawler (context) - Heydon, Najork et al. - 1999
57
The tree-to-tree correction problem (context) - Tai - 1979
48
RoadRunner: Towards automatic data extraction from large Web..
- Crescenzi, Mecca et al. - 2001
46
Recent trends in hierarchic document clustering: a critical .. (context) - Willett - 1988
44
Xtract: a system for extracting document type descriptors fr..
- Garofalakis, Gionis et al. - 2000
36
Mining data records in web pages
- Liu, Grossman et al. - 2003
28
The tree-to-tree editing problem (context) - Selkow - 1977
25
A brief survey of Web data extraction tools
- Laender, Ribeiro-Neto et al. - 2002
25
the editing distance between unordered labeled trees (context) - Zhang, Statman et al. - 1992
24
Extracting structured data from web pages (context) - Arasu, Garcia-Molina et al. - 2003
23
Approximate tree matching in the presence of variable length..
- Zhang, Shasha et al. - 1994
17
Identifying syntactic differences between two programs
- Yang - 1991
14
Comparing hierarchical data in external memory
- Chawathe - 1999
7
Evaluating structural similarity in XML documents
- Nierman, Jagadish - 2002
6
An algorithm for finding the largest approximately common su..
- Wang, Shapiro et al. - 1998
3
New algorithm for ordered tree-to-tree correction problem (context) - Chen - 2001
2
Wrapping-oriented classification of Web pages (context) - Crescenzi, Mecca et al. - 2002
2
Changedetector : a site-level monitoring tool for the WWW (context) - Boyapati, Chevrier et al. - 2002
1
the complexity of schema inference from web pages in the pre.. (context) - Yang, Ramakrishnan et al. - 2003
1
Efficient extraction of schemas for xml documents (context) - Min, Ahn et al. - 2003
1
Automatic annotation of data extraction from large Web sites (context) - Arllota, Crescenzi et al. - 2003
1
Finding similar consensus between trees: an algorithm and a .. (context) - Wang, Zhang - 2001
1
An efficient bottom-up distance between trees (context) - Valiente - 2001
1
Tree edit distance and common subtrees (context) - Valiente - 2002
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.www2004.org/proceedings/docs/contents.htm): More
A Community-Aware Search Engine - Almeida, Almeida (2004)
(Correct)
HuskySim: A Simulation Toolkit for Application Scheduling.. - Kerasha, Greenshields (2004)
(Correct)
Reactive Rules Inference from Dynamic Dependency Models - Adi, Etzion, Gilat.. (2004)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC