Results 11 - 20
of
86
User-system cooperation in document annotation based on information extraction
- In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management, EKAW02
, 2002
"... Abstract. The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE syst ..."
Abstract
-
Cited by 49 (13 self)
- Add to MetaCart
Abstract. The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and there is still the necessity of thinking the impact of the IE system on the whole annotation process. In this paper we initially discuss a number of requirements for the use of IE as support for annotation. Then we present and discuss a model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE. Finally we present an experiment that quantifies the gain in using IE as support to human annotators. 1.
Designing Adaptive Information Extraction for the Semantic Web in Amilcare
- Annotation for the Semantic Web, Frontiers in Artificial Intelligence and Applications. IOS
, 2003
"... ..."
Yago: A Large Ontology from Wikipedia and WordNet
, 2007
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.
A maximum entropy approach to information extraction from semi-structured and free text
- In Proceedings of the Eighteenth National Conference on Artificial Intelligence
, 2002
"... In this paper, we present a classification-based approach towards single-slot as well as multi-slot information extraction (IE). For single-slot IE, we worked on the domain of Seminar Announcements, where each document contains information on only one seminar. For multi-slot IE, we worked on the dom ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
In this paper, we present a classification-based approach towards single-slot as well as multi-slot information extraction (IE). For single-slot IE, we worked on the domain of Seminar Announcements, where each document contains information on only one seminar. For multi-slot IE, we worked on the domain of Management Succession. For this domain, we restrict ourselves to extracting information sentence by sentence, in the same way as (Soderland 1999). Each sentence can contain information on several management succession events. By using a classification approach based on a maximum entropy framework, our system achieves higher accuracy than the best previously published results in both domains.
Automatic Semantic Annotation using Unsupervised Information Extraction and Integration
, 2000
"... In this paper we propose a methodology to learn to automatically annotate domain-specific information from large repositories (e.g. Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
In this paper we propose a methodology to learn to automatically annotate domain-specific information from large repositories (e.g. Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to partially annotate documents. These annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotations used to annotate more documents. It will be used to train more complex IE engines and the cycle will keep on repeating itself until the required information is obtained. The user intervention is limited to providing an initial URL and to correct information if it is the case when the computation is finished. The revised annotation can then be reused to provide further training and therefore getting more information and/or more precision.
Multi-level Boundary Classification for Information Extraction
- IN ECML
, 2004
"... We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art IE algorithms based on specialized learning algorithms. We also introduce a new technique for improving the recall of our IE algorithm. This approach uses a two-level ensemble of classifiers to improve the recall of the extracted fragments while maintaining high precision. We show that this approach outperforms current state-of-the-art IE algorithms on several benchmark IE tasks.
ESpotter: Adaptive named entity recognition for web browsing
- 3rd Conf. on Professional Knowledge Management
, 2005
"... {j.zhu, v.s.uren, e.motta} @ open.ac.uk Web users are facing information overload problems, i.e., it is hard for them to find desired information on the web. Hence the growing interest in named entity recognition (NER) for discovering relevant information on users ’ behalf. We present a browser plu ..."
Abstract
-
Cited by 24 (15 self)
- Add to MetaCart
{j.zhu, v.s.uren, e.motta} @ open.ac.uk Web users are facing information overload problems, i.e., it is hard for them to find desired information on the web. Hence the growing interest in named entity recognition (NER) for discovering relevant information on users ’ behalf. We present a browser plug-in called ESpotter which adapts lexicons and patterns to a domain hierarchy consisting of domains on the web and user preferences for accurate and efficient NER. Mappings are created from domain independent types to domain specific types. Entities are highlighted according to their types, and users are assisted by navigational functionalities between these highlighted entities.
Exploiting subjectivity classification to improve information extraction
- In AAAI-2005
, 2005
"... Information extraction (IE) systems are prone to false hits for a variety of reasons and we observed that many of these false hits occur in sentences that contain subjective language (e.g., opinions, emotions, and sentiments). Motivated by these observations, we explore the idea of using subjectivit ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Information extraction (IE) systems are prone to false hits for a variety of reasons and we observed that many of these false hits occur in sentences that contain subjective language (e.g., opinions, emotions, and sentiments). Motivated by these observations, we explore the idea of using subjectivity analysis to improve the precision of information extraction systems. In this paper, we describe an IE system that uses a subjective sentence classifier to filter its extractions. We experimented with several different strategies for using the subjectivity classifications, including an aggressive strategy that discards all extractions found in subjective sentences and more complex strategies that selectively discard extractions. We evaluated the performance of these different approaches on the MUC-4 terrorism data set. We found that indiscriminately filtering extractions from subjective sentences was overly aggressive, but more selective filtering strategies improved IE precision with minimal recall loss.
Bayesian information extraction network
- In Proc.18th Int. Joint Conf. Artifical Intelligence
, 2003
"... Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, we employ a DBN in an information extraction task. We show how to assemble wealth of emerging linguistic instruments for shallow parsing, syntactic and semantic tagging, morphological decomposition, named entity recognition etc. in order to incrementally build a robust information extraction system. Our method outperforms previously published results on an established benchmark domain.
Composition of Conditional Random Fields for Transfer Learning
- PROCEEDINGS OF HLT/EMNLP
, 2005
"... Many learning tasks have subtasks for which much training data exists. Therefore, we want to transfer learning from the old, generalpurpose subtask to a more specific new task, for which there is often less data. While work in transfer learning often considers how the old task should affect learning ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Many learning tasks have subtasks for which much training data exists. Therefore, we want to transfer learning from the old, generalpurpose subtask to a more specific new task, for which there is often less data. While work in transfer learning often considers how the old task should affect learning on the new task, in this paper we show that it helps to take into account how the new task affects the old. Specifically, we perform joint decoding of separately-trained sequence models, preserving uncertainty between the tasks and allowing information from the new task to affect predictions on the old task. On two standard text data sets, we show that joint decoding outperforms cascaded decoding.

