Results 1 - 10
of
33
DBpedia: A Nucleus for a Web of Open Data
- In 6th Int’l Semantic Web Conference, Busan, Korea
, 2007
"... Abstract DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the ..."
Abstract
-
Cited by 203 (19 self)
- Add to MetaCart
Abstract DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machineconsumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data. 1
Piazza: Data Management Infrastructure for Semantic Web
, 2003
"... The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data mo ..."
Abstract
-
Cited by 134 (11 self)
- Add to MetaCart
The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world's data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.
Statistical Schema Matching across Web Query Interfaces
- In SIGMOD Conference
, 2003
"... Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a di#erent approach, motivated by integrating large numbers of dat ..."
Abstract
-
Cited by 107 (18 self)
- Add to MetaCart
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a di#erent approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe two distinguishing characteristics that o#er a new view for considering schema matching: First, as the Web scales, there are ample sources that provide structured information in the same domains (e.g., books and automobiles). Second, while sources proliferate, their aggregate schema vocabulary tends to converge at a relatively small size. Motivated by these observations, we propose a new paradigm, statistical schema matching : Unlike traditional approaches using pairwise-attribute correspondence, we take a holistic approach to match all input schemas by finding an underlying generative schema model. We propose a general statistical framework MGS for such hidden model discovery, which consists of hypothesis modeling, generation, and selection. Further, we specialize the general framework to develop Algorithm MGSsd , targeting at synonym discovery, a canonical problem of schema matching, by designing and discovering a model that specifically captures synonym attributes. We demonstrate our approach over hundreds of real Web sources in four domains and the results show good accuracy.
Semantic integration research in the database community: A brief survey
- AI Magazine
, 2005
"... Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration, and disc ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration, and discuss the difficulties underlying the integration process. We then describe recent progress and identify open research issues. We will focus in particular on schema matching, a topic that has received much attention in the database community, but will also discuss data matching (e.g., tuple deduplication), and open issues beyond the match discovery context (e.g., reasoning with matches, match verification and repair, and reconciling inconsistent data values). For previous surveys of database research on semantic integration, see (Rahm & Bernstein 2001;
Haystack: A Customizable General-Purpose Information Management Tool for End Users of Semistructured Data
- In CIDR
, 2005
"... We posit that a semistructured data model o#ers the right balance of rich structure and flexible (or lack of) schema allowing naive end users to record information in whatever form makes it easy for them to manage. We describe our Haystack system, which exposes the richness and flexibility of ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
We posit that a semistructured data model o#ers the right balance of rich structure and flexible (or lack of) schema allowing naive end users to record information in whatever form makes it easy for them to manage. We describe our Haystack system, which exposes the richness and flexibility of the data model while o#ering the user natural, traditional interfaces that shield them from the specifics of schemas, tuples, and database queries. We outline research challenges that remain to be addressed.
iDM: a unified and versatile data model for personal dataspace management
- In VLDB
, 2006
"... dbis.ethz.ch | iMeMex.org ..."
Web-scale Data Integration: You Can Only Afford to Pay As You Go
- In Proc. of CIDR-07
, 2007
"... The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data m ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper, we highlight these challenges in two scenarios – the Deep Web and Google Base. We contend that traditional data integration techniques are no longer valid in the face of such heterogeneity and scale. We propose a new data integration architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes pay-as-you-go data management as means for achieving web-scale data integration. 1.
A Peer-to-peer Framework for Caching Range Queries
- In ICDE
, 2004
"... Peer-to-peer systems are mainly used for object sharing although they can provide the infrastructure for many other applications. In this paper, we extend the idea of object sharing to data sharing on a peer-to-peer system. We propose a method, which is based on the multidimensional CAN system, for ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Peer-to-peer systems are mainly used for object sharing although they can provide the infrastructure for many other applications. In this paper, we extend the idea of object sharing to data sharing on a peer-to-peer system. We propose a method, which is based on the multidimensional CAN system, for efficiently evaluating range queries. The answers of the range queries are cached at the peers and are used to answer future range queries. The scalability and efficiency of our design is shown through simulation. 1.
Almost) hands-off information integration for the life sciences
- In Conf. on Innovative Database Research (CIDR
, 2005
"... Data integration in complex domains, such as the life sciences, involves either manual data curation, offering highest information quality at highest price, or follows a schema integration and mapping approach, leading to moderate information quality at a moderate price. We suggest a radically diffe ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Data integration in complex domains, such as the life sciences, involves either manual data curation, offering highest information quality at highest price, or follows a schema integration and mapping approach, leading to moderate information quality at a moderate price. We suggest a radically different integration approach, called ALADIN, for the life sciences application domain. The predominant feature of the ALADIN system is an architecture that allows almost automatic integration of new data sources into the system, i.e., it offers data integration at almost no cost. We suggest a novel combination of data and text mining, schema matching, and duplicate detection to combat the reduction in information quality that seems inevitable when demanding a high degree of automatism. These heuristics can also lead to the detection of previously unknown or unseen relationships between objects, thus directly supporting the discovery-based work of life science researchers. We argue that such a system is a valuable contribution in two areas. First, it offers challenging and new problems for database research. Second, the ALADIN system would be a valuable knowledge resource for life science research. 1
Structured Data Meets the Web: A Few Observations
- IEEE Data Eng. Bull
"... The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data m ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper we articulate challenges based on our experience with addressing them at Google, and offer some principles for addressing them in a general fashion. 1

