Results 1 -
9 of
9
Reviewing and Evaluating Automatic Term Recognition Techniques
"... Abstract. Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhoodbased approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candida ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhoodbased approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candidate term constituents. These methods have been evaluated using different, often incompatible evaluation schemes and datasets. This paper provides an overview and a thorough evaluation of state-of-the-art ATR methods, under a common evaluation framework, i.e. corpora and evaluation method. Our contributions are two-fold: (1) We compare a number of different ATR methods, showing that termhood-based methods achieve in general superior performance. (2) We show that the number of independent occurrences of a candidate term is the most effective source for estimating term nestedness, improving ATR performance. Key words: automatic term recognition, ATR, term extraction
Nested Named Entity Recognition in Historical Archive Text
"... This paper describes work on Named Entity Recognition (NER), in preparation for Relation Extraction (RE), on data from a historical archive organisation. As is often the case in the cultural heritage domain, the source text includes a high percentage of specialist terminology, and is of very variabl ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
This paper describes work on Named Entity Recognition (NER), in preparation for Relation Extraction (RE), on data from a historical archive organisation. As is often the case in the cultural heritage domain, the source text includes a high percentage of specialist terminology, and is of very variable quality in terms of grammaticality and completeness. The NER and RE tasks were carried out using a specially annotated corpus, and are themselves preliminary steps in a larger project whose aim is to transform discovered relations into a graph structure that can be queried using standard tools. Experimental results from the NER task are described, with emphasis on dealing with nested entities using a multi-word token method. The overall objective is to improve access by non-specialist users to a valuable cultural resource. 1.
H E
"... The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the Semantic Web but there is a need to incorporate ex-isting knowledge. This thesis explores ways to conv ..."
Abstract
- Add to MetaCart
The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the Semantic Web but there is a need to incorporate ex-isting knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built. Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effec-tively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives.
A Hybrid Approach for Named Entity Recognition in Indian Languages
"... In this paper we describe a hybrid system that applies Maximum Entropy model (Max-Ent), language specific rules and gazetteers to the task of Named Entity Recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with Named Entity (NE) annotated corpora and a set o ..."
Abstract
- Add to MetaCart
In this paper we describe a hybrid system that applies Maximum Entropy model (Max-Ent), language specific rules and gazetteers to the task of Named Entity Recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with Named Entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to the system to recognize some specific NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identification of rules and context patterns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applied a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13 % f-value in Hindi, 65.96 % f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respectively.
A Hybrid Approach for Named Entity Recognition in Indian Languages
"... In this paper we describe a hybrid system that applies maximum entropy model (Max-Ent), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set o ..."
Abstract
- Add to MetaCart
In this paper we describe a hybrid system that applies maximum entropy model (Max-Ent), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to the system to recognize some specific NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identification of rules and context patterns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13 % f-value in Hindi, 65.96 % f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respectively.
H E
"... The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the Semantic Web but there is a need to incorporate ex-isting knowledge. This thesis explores ways to conv ..."
Abstract
- Add to MetaCart
The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the Semantic Web but there is a need to incorporate ex-isting knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built. Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effec-tively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives.