Results 1 - 10
of
10
Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario
- ACM TRANSACTIONS ON ASIAN LANGUAGE INFORMATION PROCESSING (TALIP
, 2003
"... ..."
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System
, 2004
"... We describe the rapid development of a preliminary Hebrew-to-English Machine Translation system under a transfer-based framework specifically designed for rapid MT prototyping for languages with limited linguistic resources. The task is particularly challenging due to two main reasons: the high l ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
We describe the rapid development of a preliminary Hebrew-to-English Machine Translation system under a transfer-based framework specifically designed for rapid MT prototyping for languages with limited linguistic resources. The task is particularly challenging due to two main reasons: the high lexical and morphological ambiguity of Hebrew and the dearth of available resources for the language. Existing, publicly available resources were adapted in novel ways to support the MT task. The methodology behind the system combines two separate modules: a transfer engine which produces a lattice of possible translation segments, and a decoder which searches and selects the most likely translation according to an English language model. We demonstrate that a small manually crafted set of transfer rules suffices to produce legible translations. Performance results are evaluated using state of the art measures and are shown to be encouraging.
A Framework for Interactive and Automatic Refinement of Transfer-based
- Machine Translation. European Association of Machine Translation (EAMT) 10th Annual Conference
, 2005
"... Abstract. Most current Machine Translation (MT) systems do not improve with feedback from post-editors beyond the addition of corrected translations to parallel training data (for statistical and example-base MT) or to a memory database. Rule based systems to date improve only via manual debugging. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract. Most current Machine Translation (MT) systems do not improve with feedback from post-editors beyond the addition of corrected translations to parallel training data (for statistical and example-base MT) or to a memory database. Rule based systems to date improve only via manual debugging. In contrast, we introduce a largely automated method for capturing more information from human post-editors, so that corrections may be performed automatically to translation grammar rules and lexical entries. This paper introduces a general framework for incorporating a refinement module to rule-based transfer MT systems. This framework allows to generalize post-editing efforts in an effective way, by identifying and correcting rules semi-automatically to improve coverage and overall translation quality. 1.
Unsupervised Induction of Natural Language Morphology Inflection Classes
- In Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON ’04
, 2004
"... We propose a novel language-independent framework for inducing a collection of morphological inflection classes from a monolingual corpus of full form words. Our approach involves two main stages. In the first stage, we generate a large data structure of candidate inflection classes and their interr ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We propose a novel language-independent framework for inducing a collection of morphological inflection classes from a monolingual corpus of full form words. Our approach involves two main stages. In the first stage, we generate a large data structure of candidate inflection classes and their interrelationships. In the second stage, search and filtering techniques are applied to this data structure, to identify a select collection of "true " inflection classes of the language. We describe the basic methodology involved in both stages of our approach and present an evaluation of our baseline techniques applied to induction of major inflection classes of Spanish. The preliminary results on an initial training corpus already surpass an F1 of 0.5 against ideal Spanish inflectional morphology classes. 1
A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources
- UNIVERSITY OF MALTA
, 2004
"... We describe a Machine Translation (MT) approach that is specifically designed to enable rapid development of MT for languages with limited amounts of online resources. Our approach assumes the availability of a small number of bi-lingual speakers of the two languages, but these need not be lingui ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We describe a Machine Translation (MT) approach that is specifically designed to enable rapid development of MT for languages with limited amounts of online resources. Our approach assumes the availability of a small number of bi-lingual speakers of the two languages, but these need not be linguistic experts. The bi-lingual
A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages
, 2004
"... We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferrin ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is su#ciently small that a bilingual user can translate and wordalign it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus.
Stat-XFER: A General Search-based Syntax-driven Framework for Machine Translation
"... Abstract. The CMU Statistical Transfer Framework (Stat-XFER) is a general framework for developing search-based syntax-driven machine translation (MT) systems. The framework consists of an underlying syntaxbased transfer formalism along with a collection of software components designed to facilitate ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The CMU Statistical Transfer Framework (Stat-XFER) is a general framework for developing search-based syntax-driven machine translation (MT) systems. The framework consists of an underlying syntaxbased transfer formalism along with a collection of software components designed to facilitate the development of a broad range of MT research systems. The main components are a general language-independent runtime transfer engine and decoder, along with several different tools for creating the various underlying language-pair-specific resources that are required for building a specific MT system for any given language pair. We describe the general framework, its unique properties and features, and its application to the construction of MT research prototype systems for a diverse collection of language pairs. 1
Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario
- ACM Transactions on Asian Language Information Processing (TALIP
, 2003
"... this paper. The results of the experiment indicate that under these extremely limited training data conditions, when tested on unseen data, the Xfer system significantly outperforms both EBMT and SMT. Several di#erent versions of the Xfer system were tested. Results indicated that automatically lear ..."
Abstract
- Add to MetaCart
this paper. The results of the experiment indicate that under these extremely limited training data conditions, when tested on unseen data, the Xfer system significantly outperforms both EBMT and SMT. Several di#erent versions of the Xfer system were tested. Results indicated that automatically learned transfer rules are e#ective in improving translation performance, compared with a baseline word-to-word translation version of our system. System performance with a limited number of manually written transfer rules was, however, still better than the current automatically inferred rules. Furthermore, a "multi-engine" version of our system that combined the output of the Xfer and SMT systems and optimizes translation selection outperformed both individual systems. The remainder of this paper is organized as follows. Section 2 presents an overview of the Xfer system and its components. Section 3 describes the elicited data collection for Hindi-English that we conducted during the SLE, which provided the bulk of training data for our limited data experiment. Section 4 describes the specific resources and components that were incorporated into our Hindi-to-English Xfer system. Section 5 then describes the controlled experiment for comparing the Xfer , EBMT and SMT systems under the limited data scenario, and the results of this experiment. Finally, Section 6 describes our conclusions and future research directions
Data Collection and Analysis of Mapudungun Morphology for Spelling Correction
, 2004
"... This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the Programa de Educacin Intercultural Bilinge of the Chilean Ministry of Education, and Universidad de La Frontera (Temuco, Chile). We are currently constructing a spelling ..."
Abstract
- Add to MetaCart
This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the Programa de Educacin Intercultural Bilinge of the Chilean Ministry of Education, and Universidad de La Frontera (Temuco, Chile). We are currently constructing a spelling checker for Mapudungun, a polysynthetic language spoken by the Mapuche people in Chile and Argentina. The spelling checker will be built in MySpell, the spell checking system used by the open source office suite OpenOffice. This paper also describes the spoken language corpus that is used as a source of data for developing the spelling checker.
Towards Interactive and Automatic Refinement of Translation Rules
"... Although Machine Translation (MT) has advanced recently for language pairs with large amounts of parallel data, translation quality has not yet reached satisfactory levels, especially not for resource-poor languages with little if any parallel text to train statistical or example-based MT systems. R ..."
Abstract
- Add to MetaCart
Although Machine Translation (MT) has advanced recently for language pairs with large amounts of parallel data, translation quality has not yet reached satisfactory levels, especially not for resource-poor languages with little if any parallel text to train statistical or example-based MT systems. Rule-based transfer MT systems are the only feasible solution for resourcepoor scenarios. However it can prove very costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages. If the translation rules are written manually, no matter how many rules there are, coverage and accuracy can always be increased. If they are automatically learned, they might be either too general or too specific. Either way, in the face of unseen examples, the translation rules will need to be refined to account for new data. Thus, the goal of this thesis is to generalize post-edition efforts in an effective way, by identifying and correcting rules semi-automatically to improve coverage and overall translation quality, especially for resource-poor languages.

