Results 1 -
5 of
5
Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text
"... Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “sy ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “syntagmatic” relations. On the other hand, a statistical unsupervised association system is used to obtain a second set of pairs of “distributionally similar ” terms, that appear to occur in similar contexts, thus possibly involved in “paradigmatic” relations. The approach aims at learning ontological information by filtering the candidate relations obtained through generic lexico-syntactic patterns and by labelling the anonymous relations obtained through the statistical system. The resulting set of relations can be used to enrich existing ontologies and for semantic annotation of documents or web pages.
Refining Non-Taxonomic Relation Labels with External Structured Data to Support Ontology Learning
"... This paper presents a method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors ..."
Abstract
- Add to MetaCart
This paper presents a method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors from semantic relations identified in the corpus. It composes a knowledge base which consists of (i) verb centroids for known relations between domain concepts, (ii) mappings between concept pairs and the types of known relations, and (iii) ontological knowledge retrieved from external sources. Applying semantic inference and validation to this knowledge base yields a refined relation label suggestion. A formal evaluation compares the accuracy and average ranking precision of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources.
Semantic Relationship Extraction and Ontology Building using Wikipedia: A Comprehensive Survey
"... Semantic web as a vision of Tim Berners-Lee is highly dependable upon the availability of machine readable information. Ontologies are one of the different machine readable formats that have been widely investigated. Several studies focus on how to extract concepts and semantic relations in order to ..."
Abstract
- Add to MetaCart
Semantic web as a vision of Tim Berners-Lee is highly dependable upon the availability of machine readable information. Ontologies are one of the different machine readable formats that have been widely investigated. Several studies focus on how to extract concepts and semantic relations in order to build ontologies. Wikipedia is considered as one of the important knowledge sources that have been used to extract semantic relations due to its characteristics as a semi-structured knowledge source that would facilitate such a challenge. In this paper we will focus on the current state of this challenging field by discussing some of the recent studies about Wikipedia and semantic extraction and highlighting their main contributions and results.
Conceptual image retrieval over the Wikipedia corpus
"... Abstract. Image retrieval in large-scale databases is currently based on a textual chains matching procedure, a technique that produces good results as long as the annotations associated to pictures are accurate and detailed enough. These conditions are not met for a large majority of image corpuses ..."
Abstract
- Add to MetaCart
Abstract. Image retrieval in large-scale databases is currently based on a textual chains matching procedure, a technique that produces good results as long as the annotations associated to pictures are accurate and detailed enough. These conditions are not met for a large majority of image corpuses, such as the Wikipedia collection, and it is interesting to explore methods that go beyond chain matching. In this paper, we present our approach to image retrieval, tested in the ImageCLEF 2008 WikipediaMM. The approach is based on a query reformulation using concepts that are semantically related to those in the initial query. For each interesting entity in the query, we used Wikipedia and WordNet to extract and list of related concepts, which were further ranked in order to propose the most salient in priority. We also made a list of visual concepts which were used in order to re-rank the answers to queries that included, implicitly or explicitly, these visual concepts. The CEA submitted two automatic runs, one based on query reformulation only and one combining query reformulation and visual concepts, which were ranked 4th and 2nd using the MAP measure..
CEA
"... Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. He ..."
Abstract
- Add to MetaCart
Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames.

