MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Web Ecology: Recycling HTML pages as XML documents using W4F (1999) [25 citations — 1 self]

Download:
pdf | ps
by Arnaud Sahuguet, Fabien Azavant
In ACM SIGMOD Workshop on the Web and Databases (WebDB
http://www-rocq.inria.fr/~cluet/WEBDB/sahuguet.ps
Add To MetaCart

Abstract:

In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully declarative. As an illustration, we demonstrate how to use W4F to create XML gateways, that serve transparently and on-the-fly HTML pages as XML documents with their DTDs. 1

Citations

612 The Lorel Query Language for Semistructured Data – Abiteboul, Quass, et al. - 1997
286 A query language for XML – Deutsch, Fernandez, et al. - 1999
163 Extracting Semistructured Information from the Web – Hammer, Garcia-Molina, et al. - 1997
135 WebOQL: Restructuring Documents, Databases and Webs – Arocena, Mendelzon - 1998
66 Web Consortium W3C. Extensible Markup Language (XML) 1.0, 2nd edition – Wide - 2000
65 Biokleisli: A digital library for biomedical researchers – Davidson, Overton, et al. - 1996
59 Jedi: Extracting and Synthesizing Information from the Web – Huck, Fankhauser, et al. - 1998
42 Wrapper Generation for Web Accessible Data Sources – Gruser, Raschid, et al. - 1998
33 NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents – Adelberg - 1998
24 Web Consortium (W3C). Document object model (DOM) specifications – Wide
15 From Databases to Web-Bases: The ARANEUS Experience – Mecca, Atzeni, et al. - 1998
13 WIDL: Application Integration with XML – Allen - 1997
11 Web Consortium (W3C). Extensible stylesheet language (xsl – Wide - 2000
8 A Wrapper Architecture for Legacy Data Sources – Roth, Schwartz - 1997
7 Documents Structur'es et Bases de Donn'ees Objet – Christophides - 1996