Results 1 -
7 of
7
Collocation Extraction for Machine Translation
- In Proceedings of the Machine Translation Summit IX
, 2003
"... This paper reports on the development of a collocation extraction system that is designed within a commercial machine translation system in order to take advantage of the robust syntactic analysis that the system offers and to use this analysis to refine collocation extraction. Embedding the extract ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper reports on the development of a collocation extraction system that is designed within a commercial machine translation system in order to take advantage of the robust syntactic analysis that the system offers and to use this analysis to refine collocation extraction. Embedding the extraction system also addresses the need to provide information about the source language collocations in a system-specific form to support automatic generation of a collocation rulebase for analysis and translation. 1
Evaluation of a Method of Creating New Valency Entries
"... Information on subcategorization and selectional restrictions is important for natural language processing tasks such as deep parsing, rule-based machine translation and automatic summarization. In this paper we present a method of adding detailed entries to a bilingual dictionary, based on inform ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Information on subcategorization and selectional restrictions is important for natural language processing tasks such as deep parsing, rule-based machine translation and automatic summarization. In this paper we present a method of adding detailed entries to a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed from words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations.
Sharing User Dictionaries Across Multiple Systems with UTX-S
"... Careful tuning of user-created dictionaries can be very ad-vantageous when using a machine translation system for com-puter aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English ma-chine translation market. To address this issue, AAMT (the Asia-Pa ..."
Abstract
- Add to MetaCart
(Show Context)
Careful tuning of user-created dictionaries can be very ad-vantageous when using a machine translation system for com-puter aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English ma-chine translation market. To address this issue, AAMT (the Asia-Pacific Association for Machine Translation:
Electronic Dictionaries -- from Publisher Data to a Distribution Server: the
- the DicoPro, DicoEast and RERO Projects. In the Proceedings of the Third International Conference on Language Resources and Evaluation, LREC’02
, 2002
"... This article describes a set of initiatives in the domain if electronic dictionary distribution. Their basis is the DicoPro server, which enables secure access to dictionary data on a server. In the DicoEast and RERO projects, the goal is to acquire high-quality publisher data, convert it into nume ..."
Abstract
- Add to MetaCart
This article describes a set of initiatives in the domain if electronic dictionary distribution. Their basis is the DicoPro server, which enables secure access to dictionary data on a server. In the DicoEast and RERO projects, the goal is to acquire high-quality publisher data, convert it into numeric format, and provide access to dictionary entries for the participating institutions. We analyze the various problems that appear throughout this process and describe the solutions we found.
Terminology ConstructionWorkflow for Korean-English Patent MT
"... This paper addresses the workflow for terminology construction for Korean-English patent MT system. The workflow consists of the stage for setting lexical goals and the semi-automatic terminology construction stage. As there is no comparable system, it is difficult to determine how many terms are ne ..."
Abstract
- Add to MetaCart
This paper addresses the workflow for terminology construction for Korean-English patent MT system. The workflow consists of the stage for setting lexical goals and the semi-automatic terminology construction stage. As there is no comparable system, it is difficult to determine how many terms are needed. To estimate the number of the needed terms, we analyzed 45,000 patent documents. Given the limited time and budget, we resorted to the semi-automatic methods to create the bilingual term dictionary in electronics domain. We will show that parenthesis information in Korean patent documents and bilingual title corpus can be successfully used to build a bilingual term dictionary. 1
Supervisor:
, 2005
"... This thesis concerns the development of an English-Swedish core lexicon. The core lexicon is primarily developed for the corpus-based machine translation system T4F. A bilingual core lexicon is defined as a lexicon which contains the most common words of a source language and their translations in t ..."
Abstract
- Add to MetaCart
(Show Context)
This thesis concerns the development of an English-Swedish core lexicon. The core lexicon is primarily developed for the corpus-based machine translation system T4F. A bilingual core lexicon is defined as a lexicon which contains the most common words of a source language and their translations in the target language. Common words in this case are words that may appear in texts from any domain, also referred to as domain independent, or general words. One important property of T4F, as well as many other machine translation systems, is that it is implemented into different versions for different domains. This strategy reduces ambiguity and makes better translations possible. However, it is much desirable to be able to reuse the part of an implementation that is not domain-specific. For this reason, a set of core resources are being developed for T4F, and the core lexicon constructed here will be the first part of these resources. The core lexicon is mainly based on a parallel corpus that was compiled for this purpose. Different methods were used to extract common lemma pairs from this corpus to the lexicon. Other methods use the corpus indirectly to select words from alternative sources. Because of the varying properties of words with different part-of-speech, all word classes are dealt with separately. For most word classes, a