• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

From information to knowledge: harvesting entities and relationships from web sources. (2010)

by G Weikum, M Theobald
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 26
Next 10 →

Harvesting Knowledge from Web Data and Text

by Hady Lauw, Ralf Schenkel, Fabian Suchanek, Martin Theobald, Hady Lauw, Ralf Schenkel, Fabian Suchanek, Martin Theobald, Gerhard Weikum Harvesting, Hal Id Inria, Hady W. Lauw, Ralf Schenkel, Fabian Suchanek, Martin Theobald, Gerhard Weikum - in "CIKM , 2010
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

... not buried in lots of result pages, and the human users would have to read them to extract entities and connect them to other entities. In this sense, the envisioned large-scale knowledge harvesting =-=[42, 43]-=- from Web sources may also be viewed as machine reading [13]. 2 Target Audience, Aims, and Organization of the Tutorial The tutorial is aimed towards a broad audience of researchers from the DB, IR, a...

On Designing Archiving Policies for Evolving RDF Datasets on the Web

by Kostas Stefanidis, Ioannis Chrysakis, Giorgos Flouris
"... Abstract. When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is espe ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Abstract. When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is especially true for datasets stored in the Web, where the interlinking as-pect, combined with the lack of central control, do not allow synchronized evolution of interlinked datasets. To address this requirement the obvious solution is to store all previous versions, but this could quickly increase the space requirements; an alternative solution is to store adequate deltas between versions, which are generally smaller, but this would create the overhead of generating versions at query time. This paper studies the trade-offs involved in these approaches, in the context of archiving dy-namic RDF datasets over the Web. Our main message is that a hybrid policy would work better than any of the above approaches, and describe our proposed methodology for establishing a cost model that would allow determining when each of the two standard methods (version-based or delta-based storage) should be used in the context of a hybrid policy. 1
(Show Context)

Citation Context

... triples of the form (subject, predicate, object), meaning that subject is related to object via predicate. By exploiting the advances of methods that automatically extract information from Web sites =-=[16]-=-, such datasets became extremely large and grow continuously. For example, the Billion Triples Challenge dataset of 20121 contains about 1.44B triples, while DBPedia v3.9, Freebase and Yago alone feat...

Knowledge harvesting from text and web sources

by Fabian Suchanek , Gerhard Weikum - In: Proceedings of International Conference On Data Engineering (ICDE
"... Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications.

Web-scale knowledge-base construction via . . .

by Feng Niu , 2012
"... ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract not found

Storing and Indexing Massive RDF Data Sets

by Yongming Luo, George H. L. Fletcher, Jan Hidders, Stijn Vansummeren, et al.
"... In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph-based perspective. Each of these three perspectives has drawn from ideas and results in three distinct research communities to propose solutions for managing RDF data: relational databases (for the relational perspective); information retrieval (for the entity perspective); and graph theory and graph databases (for the graph-based perspective). Our goal in this chapter is to give an up-to-date overview of represpentative solutions within each perspective.

Entity Linking on Philosophical Documents

by Diego Ceccarelli , Alberto De Francesco , Raffaele Perego , Marco Segala , Nicola Tonellotto , Salvatore Trani
"... Abstract. Entity Linking consists in automatically enriching a document by detecting the text fragments mentioning a given entity in an external knowledge base, e.g., Wikipedia. This problem is a hot research topic due to its impact in several text-understanding related tasks. However, its applicat ..."
Abstract - Add to MetaCart
Abstract. Entity Linking consists in automatically enriching a document by detecting the text fragments mentioning a given entity in an external knowledge base, e.g., Wikipedia. This problem is a hot research topic due to its impact in several text-understanding related tasks. However, its application to some specific, restricted topic domains has not received much attention. In this work we study how we can improve entity linking performance by exploiting a domain-oriented knowledge base, obtained by filtering out from Wikipedia the entities that are not relevant for the target domain. We focus on the philosophical domain, and we experiment a combination of three different entity filtering approaches: one based on the "Philosophy" category of Wikipedia, and two based on similarity metrics between philosophical documents and the textual description of the entities in the knowledge base, namely cosine similarity and Kullback-Leibler divergence. We apply traditional entity linking strategies to the domainoriented knowledge base obtained with these filtering techniques. Finally, we use the resulting enriched documents to conduct a preliminary user study with an expert in the area.
(Show Context)

Citation Context

...imilarity and Kullback-Leibler divergence. We apply traditional entity linking strategies to the domainoriented knowledge base obtained with these filtering techniques. Finally, we use the resulting enriched documents to conduct a preliminary user study with an expert in the area. Keywords: Entity Linking, Entity Filtering, Information Search and Retrieval, Document Enriching 1 Introduction In the latest years, document enriching via Entity Linking (EL) has gained increasing interests due to its impact in several text-understanding related tasks, e.g., web search, document classification, etc [11,12]. The EL problem has been introduced in 2007 by Mihalcea and Csomai [8] and consists in linking short fragments of text within a document to an entity listed in a given Knowledge Base (KB). The authors also propose to consider each Wikipedia article as an entity, and the title or the anchor text of the hyperlinks pointing to the article as potential mentions to the entity. 2 Diego Ceccarelli, et al. In Dresden→ [http://en.wikipedia.org/wiki/Dresden], Schopenhauer→ [http://en.wikipedia.org/wiki/Arthur Schopenhauer] became acquainted with the philosopher→ [http://en.wikipedia.org/wiki/Philosophe...

197 Towards Automatic Structured Web Data Extraction System

by Tomas Grigalis A
"... Abstract. Automatic extraction of structured data from web pages is one of the key challenges for the Web search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page e ..."
Abstract - Add to MetaCart
Abstract. Automatic extraction of structured data from web pages is one of the key challenges for the Web search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to derive data extraction rules. The preliminary evaluation results of ClustVX system on three public benchmark datasets demonstrate a high efficiency and indicate a need for a much bigger up-to-date benchmark data set that reflects contemporary WEB 2.0 web pages.
(Show Context)

Citation Context

...ns For the Web search engines to advance into a more expressive semantic level, we need tools that could extract the information from the Web and represent it in a machine readable format such as RDF =-=[1]-=-. Information extraction at Web scale is the first step in persuing this goal. However, current algorithmic approaches often fail to achieve satisfactory performance in real-world application scenario...

Semantic Knowledge Bases from Web Sources Tutorial Proposal for IJCAI 2011 (1 day)

by Fabian M. Suchanek, Martin Theobald, Gerhard Weikum, Hady W. Lauw, Ralf Schenkel, Max Planck
"... Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how ..."
Abstract - Add to MetaCart
Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how they can be maintained and extended in structure and contents, and how they serve as enabling assets for semantic applications. 2 Longer Description The advent of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically-constructed, large and rich knowledge bases of facts about named entities, their semantic classes, and their mutual relations. This 1-day tutorial discusses a) the content, organization, and potential of these Webinduced knowledge bases, b) state-of-the-art methods for constructing them from semistructured and textual Web sources, c) recent approaches to maintaining and extending them, which includes introducing a temporal dimension of knowledge, d) use cases of knowledge bases, including semantic search, reasoning for question
(Show Context)

Citation Context

...oriented, valued-added level of gathering, organizing, searching, and ranking entities and relations from Web sources. This tutorial uses some material from an invited tutorial presented at PODS 2010 =-=[2]-=- and from a tutorial presented at CIKM 2010 [5]. 4 Outline The tutorial is organized into four main parts: • Part 1 (2 hours) explains the content and organization of the largest ones of the publicly ...

Author manuscript, published in "IJCAI, Barcelona: Spain (2011)" Semantic Knowledge Bases from Web Sources Tutorial Proposal for IJCAI 2011 (1 day)

by Fabian M. Suchanek, Martin Theobald, Gerhard Weikum, Hady W. Lauw, Ralf Schenkel, Max Planck , 2010
"... Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how ..."
Abstract - Add to MetaCart
Recent advances in scalable information extraction from Web sources has enabled the automatic construction of large knowledge bases, such as DBpedia, EntityCube, KnowItAll, ReadTheWeb, and YAGO-NAGA. In this tutorial, we explain how these knowledge bases are organized, how they are constructed, how they can be maintained and extended in structure and contents, and how they serve as enabling assets for semantic applications.
(Show Context)

Citation Context

...oriented, valued-added level of gathering, organizing, searching, and ranking entities and relations from Web sources. This tutorial uses some material from an invited tutorial presented at PODS 2010 =-=[2]-=- and from a tutorial presented at CIKM 2010 [5]. 4 Outline The tutorial is organized into four main parts: • Part 1 (2 hours) explains the content and organization of the largest ones of the publicly ...

Deriving a Web-Scale Common Sense Fact Knowledge Base

by Niket Tandon , 2011
"... ..."
Abstract - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University