by Arnaud Sahuguet, Fabien Azavant
In ACM SIGMOD Workshop on the Web and Databases (WebDB
http://www-rocq.inria.fr/~cluet/WEBDB/sahuguet.ps
Add To MetaCart
Abstract:
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully declarative. As an illustration, we demonstrate how to use W4F to create XML gateways, that serve transparently and on-the-fly HTML pages as XML documents with their DTDs. 1
Citations
|
612
|
The Lorel Query Language for Semistructured Data
– Abiteboul, Quass, et al.
- 1997
|
|
286
|
A query language for XML
– Deutsch, Fernandez, et al.
- 1999
|
|
163
|
Extracting Semistructured Information from the Web
– Hammer, Garcia-Molina, et al.
- 1997
|
|
135
|
WebOQL: Restructuring Documents, Databases and Webs
– Arocena, Mendelzon
- 1998
|
|
66
|
Web Consortium W3C. Extensible Markup Language (XML) 1.0, 2nd edition
– Wide
- 2000
|
|
65
|
Biokleisli: A digital library for biomedical researchers
– Davidson, Overton, et al.
- 1996
|
|
59
|
Jedi: Extracting and Synthesizing Information from the Web
– Huck, Fankhauser, et al.
- 1998
|
|
42
|
Wrapper Generation for Web Accessible Data Sources
– Gruser, Raschid, et al.
- 1998
|
|
33
|
NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents
– Adelberg
- 1998
|
|
24
|
Web Consortium (W3C). Document object model (DOM) specifications
– Wide
|
|
15
|
From Databases to Web-Bases: The ARANEUS Experience
– Mecca, Atzeni, et al.
- 1998
|
|
13
|
WIDL: Application Integration with XML
– Allen
- 1997
|
|
11
|
Web Consortium (W3C). Extensible stylesheet language (xsl
– Wide
- 2000
|
|
8
|
A Wrapper Architecture for Legacy Data Sources
– Roth, Schwartz
- 1997
|
|
7
|
Documents Structur'es et Bases de Donn'ees Objet
– Christophides
- 1996
|