Results 1 - 10
of
10
Studying the human translation process through the TransSearch logfiles
- In Proceedings of the AAAI Symposium on Knowledge Collection
, 2005
"... This paper presents the TransSearch log-files. These are records of interactions between human translators and TransSearch, a bilingual concordancing system. The authors show how this data can be used as experimental evidence to study the translation process. This is exem-plified by the results of a ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper presents the TransSearch log-files. These are records of interactions between human translators and TransSearch, a bilingual concordancing system. The authors show how this data can be used as experimental evidence to study the translation process. This is exem-plified by the results of a study on the nature of the text units on which human translators operate, based on this data. Finally, some enhancements to the TransSearch system are proposed, aiming both at improving its use-fulness for the end-users and the quality of the data that can be collected from its log-files.
Chinese Chunking Based on Maximum Entropy Markov Models 1
"... This paper presents a new Chinese chunking method based on maximum entropy Markov models. We firstly present two types of Chinese chunking specifications and data sets, based on which the chunking models are applied. Then we describe the hidden Markov chunking model and maximum entropy chunking mode ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a new Chinese chunking method based on maximum entropy Markov models. We firstly present two types of Chinese chunking specifications and data sets, based on which the chunking models are applied. Then we describe the hidden Markov chunking model and maximum entropy chunking model. Based on our analysis of the two models, we propose a maximum entropy Markov chunking model that combines the transition probabilities and conditional probabilities of states. Experimental results for two types of data sets show that this approach achieves impressive accuracy in terms of the F-score: 91.02 % and 92.68%, respectively. Compared with the hidden Markov chunking model and maximum entropy chunking model, based on the same data set, the new chunking model achieves better performance.
Weighted Probabilistic Sum Model based on Decision Tree Decomposition for Text Chunking
, 2001
"... ..."
Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods
"... Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify semantically ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify semantically coherent phrases within clinical reports. This is an important step towards full syntactic parsing within a clinical NLP system. We use this semantic phrase chunker to identify anatomical phrases within radiology reports related to the genitourinary domain. A discriminative classifier based on support vector machines was used to classify words into one of five phrase classification categories. Training of the classifier was performed using 1000 hand-tagged sentences from a corpus of genitourinary radiology reports. Features used by the classifier include n-grams, syntactic tags and semantic labels. Evaluation was conducted on a blind test set of 250 sentences from the same domain. The system achieved overall performance scores of 0.87 (precision), 0.91 (recall) and 0.89 (balanced f-score). Anatomical phrase extraction can be rapidly and accurately accomplished.
Rule Based Chunker for Croatian
- Proceedings of the 6th International Conference on Language Resources and Evaluation. Marrakech-Paris, ELRA
, 2008
"... Abstract In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using local regular grammars within the NooJ development environment. We describe the rules and their implementation by regular grammars and at the same time show that in NooJ environment it is ex ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using local regular grammars within the NooJ development environment. We describe the rules and their implementation by regular grammars and at the same time show that in NooJ environment it is extremely easy to fine tune their different sub-rules. Since Croatian has strong morphosyntactic features that are shared between most or all elements of a chunk, the rules are built by taking these features into account and strongly relying on them. For the evaluation of our chunker we used a extracted set of manually annotated sentences from 100 kw MSD/tagged and disambiguated Croatian corpus. Our chunker performed the best on VP-chunks (F: 97.01), while NP-chunks (F: 92.31) and PP-chunks (F: 83.08) were of lower quality. The results are comparable to chunker performance of CoNLL-2000 shared task of chunking.
Computational Intelligence and Data Mining (CIDM 2007) Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods
"... Abstract — Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify se ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify semantically coherent phrases within clinical reports. This is an important step towards full syntactic parsing within a clinical NLP system. We use this semantic phrase chunker to identify anatomical phrases within radiology reports related to the genitourinary domain. A discriminative classifier based on support vector machines was used to classify words into one of five phrase classification categories. Training of the classifier was performed using 1000 hand-tagged sentences from a corpus of genitourinary radiology reports. Features used by the classifier include n-grams, syntactic tags and semantic labels. Evaluation was conducted on a blind test set of 250 sentences from the same domain. The system achieved overall performance scores of 0.87 (precision), 0.91 (recall) and 0.89 (balanced f-score). Anatomical phrase extraction can be rapidly and accurately accomplished. Keywords- Natural language processing, shallow semantic parsing, anatomy phrases, radiology reports, support vector machines I.
Learning Computational Grammars - 3rd Annual Report
, 2001
"... This document describes the third year's progress of the TMR Project Learning Computational Grammars (LCG). In brief, LCG continues with full complement of postdocs and predocs, and work in areas as diverse as Maximum Entropy, Instance-based Learning, Neural Networks, Explanation-Based Learning ..."
Abstract
- Add to MetaCart
This document describes the third year's progress of the TMR Project Learning Computational Grammars (LCG). In brief, LCG continues with full complement of postdocs and predocs, and work in areas as diverse as Maximum Entropy, Instance-based Learning, Neural Networks, Explanation-Based Learning, Theory Refinement, Inductive Logic Programming, and Genetic Algorithms. In keeping with the original project proposal, most sites continue to targeting their various learning technologies on the task of learning noun phrases in free text. The industrial partner, Xerox, is exploring an application, and Geneva has switched focus from linguistic and psycholinguistic accounts of learning to unsupervised machine learning techniques
Translation Spotting for Translation Memories
- In Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond - Volume 3
"... The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of sourcelanguage (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential tra ..."
Abstract
- Add to MetaCart
The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of sourcelanguage (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential translation-memory system, i.e.
Grupo de Procesamiento de Lenguaje Natural
"... Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilis ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser. 1
Recognising Clauses Using . . .
, 2002
"... Clauses are important for a variety of NLP tasks such as predicting phrasing in text-tospeech synthesis and inferring text alignment for machine translation (Ejerhed 1988, Leffa 1998, Papageorgiou 1997). The Computational Natural Language Learning 2001 shared task (Sang & Déjean 2001) set the go ..."
Abstract
- Add to MetaCart
Clauses are important for a variety of NLP tasks such as predicting phrasing in text-tospeech synthesis and inferring text alignment for machine translation (Ejerhed 1988, Leffa 1998, Papageorgiou 1997). The Computational Natural Language Learning 2001 shared task (Sang & Déjean 2001) set the goal of identifying clause boundaries in text using machine learning methods. Systems created for the task predicted a label for each word specifying the number of clauses starting and ending at that position in the sentence without differentiating between clause types. This work extends that of the shared task in several ways: (1) performance bounds are explored, (2) an attempt is made to distinguish ‘main ’ and ‘subordinate’ clauses, and (3) Winnow and maximum entropy, model classes proven effective in similar domains yet not previously employed for the task, are applied to the problem.