Results 11 - 20
of
26
Harvesting Knowledge from Web Data and Text
- in "CIKM
, 2010
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
On Designing Archiving Policies for Evolving RDF Datasets on the Web
"... Abstract. When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is espe ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract. When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is especially true for datasets stored in the Web, where the interlinking as-pect, combined with the lack of central control, do not allow synchronized evolution of interlinked datasets. To address this requirement the obvious solution is to store all previous versions, but this could quickly increase the space requirements; an alternative solution is to store adequate deltas between versions, which are generally smaller, but this would create the overhead of generating versions at query time. This paper studies the trade-offs involved in these approaches, in the context of archiving dy-namic RDF datasets over the Web. Our main message is that a hybrid policy would work better than any of the above approaches, and describe our proposed methodology for establishing a cost model that would allow determining when each of the two standard methods (version-based or delta-based storage) should be used in the context of a hybrid policy. 1
Knowledge harvesting from text and web sources
- In: Proceedings of International Conference On Data Engineering (ICDE
"... Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications.
Storing and Indexing Massive RDF Data Sets
"... In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph-based perspective. Each of these three perspectives has drawn from ideas and results in three distinct research communities to propose solutions for managing RDF data: relational databases (for the relational perspective); information retrieval (for the entity perspective); and graph theory and graph databases (for the graph-based perspective). Our goal in this chapter is to give an up-to-date overview of represpentative solutions within each perspective.
Entity Linking on Philosophical Documents
"... Abstract. Entity Linking consists in automatically enriching a document by detecting the text fragments mentioning a given entity in an external knowledge base, e.g., Wikipedia. This problem is a hot research topic due to its impact in several text-understanding related tasks. However, its applicat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Entity Linking consists in automatically enriching a document by detecting the text fragments mentioning a given entity in an external knowledge base, e.g., Wikipedia. This problem is a hot research topic due to its impact in several text-understanding related tasks. However, its application to some specific, restricted topic domains has not received much attention. In this work we study how we can improve entity linking performance by exploiting a domain-oriented knowledge base, obtained by filtering out from Wikipedia the entities that are not relevant for the target domain. We focus on the philosophical domain, and we experiment a combination of three different entity filtering approaches: one based on the "Philosophy" category of Wikipedia, and two based on similarity metrics between philosophical documents and the textual description of the entities in the knowledge base, namely cosine similarity and Kullback-Leibler divergence. We apply traditional entity linking strategies to the domainoriented knowledge base obtained with these filtering techniques. Finally, we use the resulting enriched documents to conduct a preliminary user study with an expert in the area.
197 Towards Automatic Structured Web Data Extraction System
"... Abstract. Automatic extraction of structured data from web pages is one of the key challenges for the Web search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page e ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Automatic extraction of structured data from web pages is one of the key challenges for the Web search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to derive data extraction rules. The preliminary evaluation results of ClustVX system on three public benchmark datasets demonstrate a high efficiency and indicate a need for a much bigger up-to-date benchmark data set that reflects contemporary WEB 2.0 web pages.
Semantic Knowledge Bases from Web Sources Tutorial Proposal for IJCAI 2011 (1 day)
"... Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how ..."
Abstract
- Add to MetaCart
(Show Context)
Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how they can be maintained and extended in structure and contents, and how they serve as enabling assets for semantic applications. 2 Longer Description The advent of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically-constructed, large and rich knowledge bases of facts about named entities, their semantic classes, and their mutual relations. This 1-day tutorial discusses a) the content, organization, and potential of these Webinduced knowledge bases, b) state-of-the-art methods for constructing them from semistructured and textual Web sources, c) recent approaches to maintaining and extending them, which includes introducing a temporal dimension of knowledge, d) use cases of knowledge bases, including semantic search, reasoning for question
Author manuscript, published in "IJCAI, Barcelona: Spain (2011)" Semantic Knowledge Bases from Web Sources Tutorial Proposal for IJCAI 2011 (1 day)
, 2010
"... Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how ..."
Abstract
- Add to MetaCart
(Show Context)
Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how they can be maintained and extended in structure and contents, and how they serve as enabling assets for semantic applications.