Results 1 -
8 of
8
Relation extraction from the web using distant supervision
- In Janowicz et al
"... Abstract. Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we pr ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction from the Web. Distant supervision is a method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains, as well as extracting relations across sentence boundaries. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. Our experiments show that using a more robust entity recognition approach and expanding the scope of relation extraction results in about 8 times the number of extractions, and that strategically selecting training data can result in an error reduction of about 30%. 1
Combining Generative and Discriminative Model Scores for Distant Supervision
"... Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to re-duce the noise in distant s ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to re-duce the noise in distant supervision data. The combination significantly increases the rank-ing quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpo-lation of the model scores performs better than a parameter-free scheme based on non-dominated sorting. 1
Joint Information Extraction from the Web using Linked Data
- Proceedings of ISWC (2014) Augenstein et al. / Distantly Supervised Web Relation Extraction for Knowledge Base Population 13
"... Abstract. Almost all of the big name Web companies are currently engaged in building ‘knowledge graphs ’ and these are showing significant results in improving search, email, calendaring, etc. Even the largest openly-accessible ones, such as Freebase and Wikidata, are far from complete, partly becau ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Almost all of the big name Web companies are currently engaged in building ‘knowledge graphs ’ and these are showing significant results in improving search, email, calendaring, etc. Even the largest openly-accessible ones, such as Freebase and Wikidata, are far from complete, partly because new information is emerging so quickly. Most of the missing information is available on Web pages. To access that knowledge and populate knowledge bases, information extraction methods are necessitated. The bottleneck for information extraction systems is obtaining training data to learn classifiers. In this doctoral research, we investigate how existing data in knowledge bases can be used to automatically annotate training data to learn classifiers to in turn extract more data to expand knowledge bases. We discuss our hypotheses, approach, evaluation methods and present preliminary results. 1 Problem Statement Since the emergence of the Semantic Web, many Linked datasets such as Freebase [5], Wikidata [31] and DBpedia [4] have been created, not only for research, but also com-
Unsupervised Parsing for Generating Surface-Based Relation Extraction Patterns
"... Finding the right features and patterns for identifying relations in natural language is one of the most pressing research ques-tions for relation extraction. In this pa-per, we compare patterns based on super-vised and unsupervised syntactic parsing and present a simple method for extract-ing surfa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Finding the right features and patterns for identifying relations in natural language is one of the most pressing research ques-tions for relation extraction. In this pa-per, we compare patterns based on super-vised and unsupervised syntactic parsing and present a simple method for extract-ing surface patterns from a parsed training set. Results show that the use of surface-based patterns not only increases extrac-tion speed, but also improves the quality of the extracted relations. We find that, in this setting, unsupervised parsing, besides requiring less resources, compares favor-ably in terms of extraction quality. 1
Structured Information Extraction from Natural Disaster Events on Twitter
"... As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of ..."
Abstract
- Add to MetaCart
(Show Context)
As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information. Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disas-ter events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema. We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twit-ter. Evaluation on ∼58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ∼0.6.
Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning
"... Abstract: A new method is proposed and evaluated that improves distantly supervised learning of pattern rules for n-ary relation extraction. The new method employs knowledge from a large lexical semantic repository to guide the discovery of patterns in parsed relation mentions. It extends the induce ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: A new method is proposed and evaluated that improves distantly supervised learning of pattern rules for n-ary relation extraction. The new method employs knowledge from a large lexical semantic repository to guide the discovery of patterns in parsed relation mentions. It extends the induced rules to semantically relevant material outside the minimal subtree containing the shortest paths connecting the relation entities and also discards rules without any explicit semantic content. It significantly raises both recall and precision with roughly 20% f-measure boost in comparison to the baseline system which does not consider the lexical semantic information. 1
Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning
"... Distantly supervised approaches have be-come popular in recent years as they allow training relation extractors without text-bound annotation, using instead known relations from a knowledge base and a large textual corpus from an appropri-ate domain. While state of the art dis-tant supervision appro ..."
Abstract
- Add to MetaCart
(Show Context)
Distantly supervised approaches have be-come popular in recent years as they allow training relation extractors without text-bound annotation, using instead known relations from a knowledge base and a large textual corpus from an appropri-ate domain. While state of the art dis-tant supervision approaches use off-the-shelf named entity recognition and clas-sification (NERC) systems to identify re-lation arguments, discrepancies in domain or genre between the data used for NERC training and the intended domain for the relation extractor can lead to low perfor-mance. This is particularly problematic for “non-standard ” named entities such as album which would fall into the MISC
IOS Press Distantly Supervised Web Relation Extraction for Knowledge Base Population
"... Abstract. Extracting information from Web pages for populating knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise and integrate information extracted from different Web pages. Recent approaches have us ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Extracting information from Web pages for populating knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction from the Web. Distant supervision is an unsupervised method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised co-reference resolution methods. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. To combine information extracted from multiple sources for populating knowledge bases we present and evaluate several information integration strategies and show that those benefit immensely from additional relation mentions extracted using co-reference resolution, increasing precision by 8%. We further show that strategically selecting training data can increase precision by a further 3%.