• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Wrapit: Automated integration of web databases with extensional overlaps (2003)

by M Neiling, M Schaal, M Schumann
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

A Survey on Information Systems Interoperability

by Renato Fileto, Claudia Bauzer Medeiros , 2003
"... The interoperability of information systems has been pursued for a long time and is even more demanded in the Internet era. This paper reviews the literature in this area, from the database perspective. It covers work on interconnection of databases, classification of data integration problems, ma ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The interoperability of information systems has been pursued for a long time and is even more demanded in the Internet era. This paper reviews the literature in this area, from the database perspective. It covers work on interconnection of databases, classification of data integration problems, major standards and architectures, and the most recent developments in the fields of semantic Web, Web services and scientific workflows.

Visual Interfaces for the Development of Event-based Web Agents in the IRobot System

by Liangyou Chen
"... Abstract. Timely integration and analysis of information from the World-Wide Web is important for businesses and startups, especially to cope with the increasing global competition need. Manual Web data analysis is time consuming and error prone, while developing automated algorithms is nontrivial e ..."
Abstract - Add to MetaCart
Abstract. Timely integration and analysis of information from the World-Wide Web is important for businesses and startups, especially to cope with the increasing global competition need. Manual Web data analysis is time consuming and error prone, while developing automated algorithms is nontrivial even for expert Web programmers. We argue that an event-based architecture that couples Web-agent technology with visual-programming methodology is ideal for Web-data analysis. We demonstrate this with the IRobot system, which uses visual interfaces to facilitate the design of eventbased intelligent Web agents for data integration. The system is effective in both its ability to capture human-oriented Web procedures and its flexibility in modeling complex agents that meet human goals. Keywor ds: Web computing, rule-based systems, Web agents, data integration, Web automation
(Show Context)

Citation Context

...trying to address this issue. The research community proposes the use of intelligent agent systems to support autonomous data integration from the Web, e.g., Michalowski et al. [1] and Neiling et al. =-=[2]-=-. Industrial solutions typically use special libraries in a hosting programming language for Web automation. Examples include cURL [3] in PHP, and Scrapy [4] in Python. Utilizing these technologies st...

editors: prof.dr. P.M.E. De Bra prof.dr.ir. J.J. van Wijk

by unknown authors , 2004
"... Reports are available at: ..."
Abstract - Add to MetaCart
Reports are available at:
(Show Context)

Citation Context

...n trainings through examples as [19], or use heuristics as [31]. Others are based on patterns like Omini, Xwrap [3, 18], Lixto [17], Xtros [29]; or on extracting the data from HTML tables or lists as =-=[12, 16]-=-. According to its performance, you can distinguish among applications based on searching text information between marks (LR, string wrappers) or based on analysing the tree structure proper of the XH...

Contents

by Dr. Susanne Busse, Thomas Kabisch, Ralf Petzschmann, Forschungsberichte Der Fakultät Iv, Miweb E-learning , 2005
"... ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ew on this issue. We follow a grammar-based paradigm [Kab03]. In the MIWeb system we defined the grammar manually. In future we want to generate it automatically from the model, similar to [CMM01] or =-=[NSS02]-=-. Query Serialization / Result Integration In some cases a query needs to be divided into multiple source queries in order to get all results, for example if • the source is restricted in its operatio...

Providing Robust Access to Data in Web Pages

by Jerome Robinson Department, Jerome Robinson , 2004
"... Much useful e-commerce information is available on web pages, especially those created by queries to web servers. The problem for programs to use that information is how to `screen-scrape' the data off the web page into machine usable data structures. Wrappers for web data sources use knowledge ..."
Abstract - Add to MetaCart
Much useful e-commerce information is available on web pages, especially those created by queries to web servers. The problem for programs to use that information is how to `screen-scrape' the data off the web page into machine usable data structures. Wrappers for web data sources use knowledge of the page layout in order to extract data accurately. So they fail if page format changes. This paper describes a fast method for wrapper production and also a method to automatically detect page format change, before it causes data access to fail. The method works for pages that contain collections of items, such as lists, tables and hierarchical structures. It uses a representation of html documents, which makes repetitive features apparent. This provides fully automatic wrapper production for a class of web pages, and rapid interactive production for others.

Wrapping of Web Sources with restricted Query Interfaces by Query Tunneling

by Thomas Kabisch, Mattis Neiling
"... Information sources in the World Wide Web usually offer two different schemes to their users, an Interface Schema which the user can query and a Result Schema which the user can browse. Often the Interface schema is more restricted than the result schema, moreover many sources offer keyword-search i ..."
Abstract - Add to MetaCart
Information sources in the World Wide Web usually offer two different schemes to their users, an Interface Schema which the user can query and a Result Schema which the user can browse. Often the Interface schema is more restricted than the result schema, moreover many sources offer keyword-search interfaces only. Thus query capabilities of such sources are very small and a useful integration into a mediator-based information system using query capabilities is almost impossible. We propose the Query Tunnelling Architecture for the Wrapping of these restricted web sources. Wrapping of sources by Query Tunneling hides restrictive query in-terfaces and makes such sources fully queryable based on their result schema. The process of Query Tunneling is divided into two main steps, Query Relaxation to make a higher order query suitable to a restricted interface and Result Restriction in order to filter the results using the original query.
(Show Context)

Citation Context

...gives a good overview to this issue. We follow a grammar-based paradigm [15]. At the time the grammar has been developed by hand, but in future we plan to adopt an automated solution, similar to [10],=-=[20]-=-. Query Serialization/Result Integration The Query Serialization component is responsible for splitting of one query into multiple queries which may be caused by different reasons: Reduced Operator Se...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University