Extraction of Multi-word Expressions from Small Parallel Corpora
| Citations: | 2 - 1 self |
BibTeX
@MISC{Tsvetkov_extractionof,
author = {Yulia Tsvetkov},
title = {Extraction of Multi-word Expressions from Small Parallel Corpora},
year = {}
}
OpenURL
Abstract
We present a general methodology for extracting multi-word expressions (of various types), along with their translations, from small parallel corpora. We automatically align the parallel corpus and focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a noncompositional way. We then use a large monolingual corpus to rank and filter the results. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. External evaluation shows an improvement in the performance of machine translation that uses the extracted dictionary. 1







