| Pim van der Eijk. Automating the acquisition of bilingual terminology. In Proceedings of the EACL, pages 113--119, 1993. |
....to be carefully guarded by their owners, making them expensive and or not easily available. A natural alternative to the direct approach is to perform terminology extraction from corpora. There has been extensive research on the automatic recognition of terminology translations in parallel corpora [25, 9] and even some work on using non parallel domain specific corpora [20] Information extracted by these techniques could be used to supplement the transfer dictionary in an MLIR system. The error due to ambiguity could probably be reduced with proper term weighting strategies, although this is a ....
Pim van der Eijk. Automating the acquisition of bilingual terminology. In Proceedings of the EACL, pages 113--119, 1993.
....in the translations. Tanaka and Iwasaki [43] demonstrate how to use non parallel corpora to choose the best translation among a small set of candidates, while Fung [20] uses similarities in collocates of a given word to find its translation in the other language. In other work, van der Eijk [44] has compared several methods for automatic extraction and translation of technical terminology in Dutch and English. He achieves best results under the assumption that technical terms are always NPs and therefore candidate terms can be pinpointed using a combination of a pattern matcher and a ....
Pim van der Eijk. Automating the acquisition of bilingual terminology. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, the Netherlands, April 1993.
.... sentence alignment algorithms which work with a high degree of accuracy if the parallel texts are relatively clean and come from technical domains where literal translations are expected [ A number of approaches to bilingual lexicon extraction have already appeared in the literature [16, 10, 22, 25]. We will present a different approach in the section. Of course, the advanced applications described in the following two sections could work with other word and terminology alignment algorithms. The process of bilingual lexicon extraction consists of three steps. First, terminology is identified ....
Pim van der Eijk. Automating the Acquisition of Bilingual Terminology. In Proc. of the EACL, pages 113--119, 1993.
....as the total size of our corpus. Getting hold of useful data that includes Dutch is more difficult. If we compare the size of the Agenda 21 corpus to parallel corpora used in publications that use Dutch as one of the languages, the corpus is actually relatively big. For instance Van der Eijk (P. van der Eijk 1993) used approximately 25.000 parallel words from the Dutch and English version of the official announcement of the ESPRIT programme: about one sixth of the total size of Agenda 21. Although the Agenda 21 corpus is small it will be used to evaluate the method presented in this paper. We will compare ....
....the proper context) into dangerous waste; in the corpus it is hazardous wastes. 4 Djoerd Hiemstra gevaarlijke hazardous 0.74 toxic 0.20 dangerous 0.05 . Figure 1: An example entry ing approach. The disadvantage of the hypothesis testing approach (W.A. Gale and K.W. Church 1991, P. van der Eijk 1993, F. Smadja, K.R. McKeown, and V. Hatzivassiloglou 1996) is that a valid hypothesis can only be made if a certain minimum number of observations is available. Therefore only a limited amount of translation examples can be found with high accuracy. Following the estimating approach, it is ....
[Article contains additional citation context not shown here]
P. van der Eijk (1993), Automating the acquisition of bilingual terminology., In Proceedings of the sixth Conference of the European Chapterof the Association for Computational Linguistics, pages 113--119.
....techniques in the absence of a perfectly matched parallel document collection. But three European research groups have reported dramatic improvements in performance when phrases are processed in addition to individual words, presumably because the use of phrases constrains translation ambiguity [13, 26, 31] and in some initial experiments with phrase indexing I have recently obtained similar results. 3 Research in the USA Although there was cross language text retrieval work reported in Europe as early as 1964, the earliest reported work in the USA was performed by Salton at Cornell University in ....
Pim van der Eijk. Automating the acquisition of bilingual terminology. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, pages 113--119, April 1993.
....In a preprocessing step the corpus is sentence aligned and tagged with part ofspeech tags. For the identification of noun phrases, a simple pattern matching algorithm is used. According to this algorithm a noun phrase is simply a sequence of zero or more adjectives followed by one or more nouns [vdE93] np w a w n The statistical method for finding correlates is based on the following assumption: the translation equivalent is more frequent in the subset of the target language sentences, which are aligned to the source language sentences (containing the source language term under ....
.... correlates is based on the following assumption: the translation equivalent is more frequent in the subset of the target language sentences, which are aligned to the source language sentences (containing the source language term under consideration) than in the entire target language text [vdE93] The system calculates a local frequency (the frequency of the target language term candidate in the subset of the target language sentences aligned to the source language sentences containing the term under consideration) and a global frequency for the target language terms. The following ....
[Article contains additional citation context not shown here]
Pim van der Eijk. Automating the Acquisition of Bilingual Terminology. In Proceedings of the 6th Conference of the European Chapter of the ACL, Utrecht/The Netherlands, 1993. Association for Computational Linguistics.
No context found.
Pim van der Eijk. 1993. Automating the Acquisition of Bilingual Terminology. In Proceedings of the 6th Conference of the European Chapter of the ACL, Utrecht/The Netherlands. Association for Computational Linguistics.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC