Results 1 - 10
of
67
BabelNet: The automatic construction, evaluation and application of a . . .
- ARTIFICIAL INTELLIGENCE
, 2012
"... ..."
Mining meaning from Wikipedia
, 2009
"... Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts an ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
An open-source toolkit for mining wikipedia
- In Proc. New Zealand Computer Science Research Student Conf
"... The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wik ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to easily integrate Wikipedia's rich semantics into their own applications. The Wikipedia Miner toolkit is already a mature product. In this paper we describe how it provides simplified, object-oriented access to Wikipedia’s structure and content, how it allows terms and concepts to be compared semantically, and how it can detect Wikipedia topics when they are mentioned in documents. We also describe how it has already been applied to several different research problems. However, the toolkit is not intended to be a complete, polished product; it is instead an entirely open-source project that we hope will continue to evolve.
Supervised noun phrase coreference research: The first fifteen years
- In: Association for Computational Linguistics
, 2010
"... The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago. 1 ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago. 1
Text relatedness based on a word thesaurus.
, 2010
"... Abstract The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
(Show Context)
Abstract The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.
Topic-driven multi-document summarization with encyclopedic knowledge and activation spreading
- In Proc. of EMNLP-08
, 2008
"... Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic – a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with th ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic – a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To “understand ” the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC summarization data. The system implemented ranks high compared to the participating systems in the DUC competitions, confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system. 1
Learning to simplify sentences with quasi-synchronous grammar and integer programming
- in Proceedings of the Conference on Empirical Methods in Natural Language Processing
"... Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most pre-vious work simplifies sentences using hand-crafted rules aimed at splitting long sentences, or substitutes difficult words using a prede-fined dictionary. This paper p ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
(Show Context)
Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most pre-vious work simplifies sentences using hand-crafted rules aimed at splitting long sentences, or substitutes difficult words using a prede-fined dictionary. This paper presents a data-driven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplifica-tion from the space of possible rewrites gen-erated by the grammar. We show experimen-tally that our method creates simplifications that significantly reduce the reading difficulty of the input, while maintaining grammaticality and preserving its meaning. 1
2009. Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions
- In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009
"... We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, o ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form. 1
Taxonomy induction based on a collaboratively built knowledge repository.
- Artif. Intell.
, 2011
"... The category system in Wikipedia can be taken as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. The result is a large scale taxonomy. For evaluation we propose a method which (1) manually det ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
The category system in Wikipedia can be taken as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. The result is a large scale taxonomy. For evaluation we propose a method which (1) manually determines the quality of our taxonomy, and (2) automatically compares its coverage with ResearchCyc, one of the largest manually created ontologies, and the lexical database WordNet. Additionally, we perform an extrinsic evaluation by computing semantic similarity between words in benchmarking datasets. The results show that the taxonomy compares favorably in quality and coverage with broadcoverage manually created resources.
Phrase Detectives: Utilizing Collective Intelligence for Internet-Scale Language Resource Creation
, 2013
"... We are witnessing a paradigm shift in Human Language Technology (HLT) that may well have an impact on the field comparable to the statistical revolution: acquiring large-scale resources by exploiting collective intelligence. An illustration of this new approach is Phrase Detectives, an interactive o ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
We are witnessing a paradigm shift in Human Language Technology (HLT) that may well have an impact on the field comparable to the statistical revolution: acquiring large-scale resources by exploiting collective intelligence. An illustration of this new approach is Phrase Detectives, an interactive online game with a purpose for creating anaphorically annotated resources that makes use of a highly distributed population of contributors with different levels of expertise. The purpose of this article is to first of all give an overview of all aspects of Phrase Detectives, from the design of the game and the HLT methods we used to the results we have obtained so far. It furthermore summarizes the lessons that we have learned in developing this game which should help other researchers to design and implement similar games.