Results 1 - 10
of
158
YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages
"... We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, and Word-Net, and contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge. An enhanced ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
(Show Context)
We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, and Word-Net, and contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge. An enhanced data representation introduces time and location as first-class citizens. The wealth of spatio-temporal information in YAGO can be explored either graphically or through a special time- and space-aware query language.
Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion
- In submission
, 2014
"... Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Mi-crosoft’s Satori, and Google’s Knowledge Graph. To in-crease the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous ap-proaches have pr ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
(Show Context)
Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Mi-crosoft’s Satori, and Google’s Knowledge Graph. To in-crease the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous ap-proaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that com-bines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repos-itories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilis-tic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods. Keywords Knowledge bases; information extraction; probabilistic mod-els; machine learning 1.
Entity Linking meets Word Sense Disambiguation: A Unified Approach
- Transactions of the Association for Computational Linguistics
, 2014
"... Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while ..."
Abstract
-
Cited by 46 (17 self)
- Add to MetaCart
(Show Context)
Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (bet-ter, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate mean-ings coupled with a densest subgraph heuris-tic which selects high-coherence semantic in-terpretations. Our experiments show state-of-the-art performances on both tasks on 6 differ-ent datasets, including a multilingual setting. Babelfy is online at
Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2013
"... Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for la ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
(Show Context)
Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm. 1
AMIE: association rule mining under incomplete evidence in ontological knowledge bases
- In WWW
, 2013
"... ABSTRACT Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add missing knowledge to the KB. While IL ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
(Show Context)
ABSTRACT Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add missing knowledge to the KB. While ILP is a mature field, mining logical rules from KBs is different in two aspects: First, current rule mining systems are easily overwhelmed by the amount of data (state-of-the art systems cannot even run on today's KBs). Second, ILP usually requires counterexamples. KBs, however, implement the open world assumption (OWA), meaning that absent data cannot be used as counterexamples. In this paper, we develop a rule mining model that is explicitly tailored to support the OWA scenario. It is inspired by association rule mining and introduces a novel measure for confidence. Our extensive experiments show that our approach outperforms state-of-the-art approaches in terms of precision and coverage. Furthermore, our system, AMIE, mines rules orders of magnitude faster than state-of-the-art approaches.
Unsupervised graph-based topic labelling using DBpedia
- IN PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM ’13
, 2013
"... ..."
(Show Context)
Two is bigger (and better) than one: the wikipedia bitaxonomy project.
- In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics,
, 2014
"... Abstract We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments sho ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
(Show Context)
Abstract We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet and WikiTaxonomy. WiBi is available at http://wibitaxonomy.org.
Kore: keyphrase overlap relatedness for entity disambiguation
- In Proceedings of the 21st ACM CIKM
, 2012
"... Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.
Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0
- In Proc. of LREC
, 2014
"... Recent years have witnessed a surge in the amount of semantic information published on the Web. Indeed, the Web of Data, a subset of the Semantic Web, has been increasing steadily in both volume and variety, transforming the Web into a ‘global database ’ in which resources are linked across sites. L ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
Recent years have witnessed a surge in the amount of semantic information published on the Web. Indeed, the Web of Data, a subset of the Semantic Web, has been increasing steadily in both volume and variety, transforming the Web into a ‘global database ’ in which resources are linked across sites. Linguistic fields – in a broad sense – have not been left behind, and we observe a similar trend with the growth of linguistic data collections on the so-called ‘Linguistic Linked Open Data (LLOD) cloud’. While both Semantic Web and Natural Language Processing communities can obviously take advantage of this growing and distributed linguistic knowledge base, they are today faced with a new challenge, i.e., that of facilitating multilingual access to the Web of data. In this paper we present the publication of BabelNet 2.0, a wide-coverage multilingual encyclopedic dictionary and ontology, as Linked Data. The conversion made use of lemon, a lexicon model for ontologies particularly well-suited for this enterprise. The result is an interlinked multilingual (lexical) resource which can not only be accessed on the LOD, but also be used to enrich existing datasets with linguistic information, or to support the process of mapping datasets across languages. Keywords:Linguistic Linked Data, Multilingual Semantic Web, lexical-semantic resource, semantic network 1.
SPred: Large-scale Harvesting of Semantic Predicates
- In Proceedings of 51st Annual Meeting of the Association for Computational Linguistics
, 2013
"... We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best fit the ∗ argument. To do this, we extract all the occurrences in Wikipedia which ma ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best fit the ∗ argument. To do this, we extract all the occurrences in Wikipedia which match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach. 1