Results 1 - 10
of
21
A Survey of Paraphrasing and Textual Entailment Methods
, 2010
"... Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads ( ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
(Show Context)
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
ACL’10 “Ask not what Textual Entailment can do for You...”
"... We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are a ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual inference phenomena in textual entailment examples. We present the results of a pilot annotation study that show this model is feasible and the results immediately useful. 1
Example-based paraphrasing for improved phrase-based statistical machine translation
, 2010
"... In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to bett ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance. 1
UAlacant: using online machine translation for crosslingual textual entailment
- In Proceedings of the First Joint Conference on Lexical and Computational Semantics
, 2012
"... This paper describes a new method for crosslingual textual entailment (CLTE) detection based on machine translation (MT). We use sub-segment translations from different MT systems available online as a source of crosslingual knowledge. In this work we describe and evaluate different features derived ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
This paper describes a new method for crosslingual textual entailment (CLTE) detection based on machine translation (MT). We use sub-segment translations from different MT systems available online as a source of crosslingual knowledge. In this work we describe and evaluate different features derived from these sub-segment translations, which are used by a support vector machine classifier to detect CLTEs. We presented this system to the SemEval 2012 task 8 obtaining an accuracy up to 59.8 % on the English–Spanish test set, the second best performing approach in the contest. 1
Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words
"... We present a general method for incorporating an “expert ” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
We present a general method for incorporating an “expert ” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resources. These candidate replacements are transformed into “dynamic biphrases”, generated at decoding time based on the context of each source sentence. Standard SMT features are enhanced with a number of new features aimed at scoring translations produced by using different replacements. Active learning is used to discriminatively train the model parameters from human assessments of the quality of translations. The learning framework yields an SMT system which is able to deal with sentences containing OOV words but also guarantees that the performance is not degraded for input sentences without OOV words. Results of experiments on English-French translation show that this method outperforms previous work addressing OOV words in terms of acceptability. 1
Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling
"... This paper is to introduce our participation in the WMT13 shared tasks on Quality Estimation for machine translation without using reference translations. We submitted the results for Task 1.1 (sentence-level quality estimation), Task 1.2 (system selection) and Task 2 (word-level quality estimation) ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This paper is to introduce our participation in the WMT13 shared tasks on Quality Estimation for machine translation without using reference translations. We submitted the results for Task 1.1 (sentence-level quality estimation), Task 1.2 (system selection) and Task 2 (word-level quality estimation). In Task 1.1, we used an enhanced version of BLEU metric without using reference translations to evaluate the translation quality. In Task 1.2, we utilized a probability model Naïve Bayes (NB) as a classification algorithm with the features borrowed from the traditional evaluation metrics. In Task 2, to take the contextual information into account, we employed a discriminative undirected probabilistic graphical model Conditional random field (CRF), in addition to the NB algorithm. The training experiments on the past WMT corpora showed that the designed methods of this paper yielded promising results especially the statistical models of CRF and NB. The official results show that our CRF model achieved the highest F-score 0.8297 in binary classification of Task 2. 1
Spell Checking Techniques for Replacement of Unknown Words and Data Cleaning for Haitian Creole SMS Translation
"... We report results on translation of SMS messages from Haitian Creole to English. We show improvements by applying spell checking techniques to unknown words and creating a lattice with the best known spelling equivalents. We also used a small cleaned corpus to train a cleaning model that we applied ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We report results on translation of SMS messages from Haitian Creole to English. We show improvements by applying spell checking techniques to unknown words and creating a lattice with the best known spelling equivalents. We also used a small cleaned corpus to train a cleaning model that we applied to the noisy corpora. 1
Vs and OOVs: Two Problems for Translation between German and
"... In this paper we report on experiments with three preprocessing strategies for improving translation output in a statistical MT system. In training, two reordering strategies were studied: (i) reorder on the basis of the alignments from Giza++, and (ii) reorder by moving all verbs to the end of segm ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we report on experiments with three preprocessing strategies for improving translation output in a statistical MT system. In training, two reordering strategies were studied: (i) reorder on the basis of the alignments from Giza++, and (ii) reorder by moving all verbs to the end of segments. In translation, out-ofvocabulary words were preprocessed in a knowledge-lite fashion to identify a likely equivalent. All three strategies were implemented for our English↔German system submitted to the WMT10 shared task. Combining them lead to improvements in both language directions. 1
Isoflavone content in
- J. Agric. Food Chem
, 1994
"... I dedicate this dissertation to my parents. iii Acknowledgments I would like to thank the following people who made this possible, my advisors, and my committee members. Without their guidance, I would never be able to finish my PhD work. First, I would like to express my deepest gratitude to my two ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
I dedicate this dissertation to my parents. iii Acknowledgments I would like to thank the following people who made this possible, my advisors, and my committee members. Without their guidance, I would never be able to finish my PhD work. First, I would like to express my deepest gratitude to my two advisors, Prof. Carlotta Domeniconi, and Prof. Kathryn B. Laskey. Prof. Domeniconi introduced me to the research area of machine learning, and gave me the environment and guidance to do research. Prof. Laskey guided me through the challenges and difficulties of Bayesian statistics. Further, both Prof. Domeniconi and Prof. Laskey helped me to develop crucial research-related abilities, writing and presenting, which greatly increased my research capability. I am very grateful to the two other members in my committee, Prof. Jana Kosecka and Prof. Huzefa Rangwala. Prof. Kosecka helped me understand computer vision problems, and convinced me that a researcher should be more data-driven than model-driven. Prof. Rangwala taught me bioinformatics, and showed me how to conduct comprehensive experiments. I obtained valuable research experience from them, and with their help I learned how to apply my knowledge to solve practical problems. Also, I am very thankful to Prof. Sean Luke. Prof. Luke kept giving me advice on programming, engineering and other aspects, such American culture. It is Prof. Luke who showed me how to be a good researcher and a good engineer at the same time. Additionally, I must thank all the members of the Data Mining lab and all the participants in the Krypton seminar. They gave me a lot of valuable comments on my dissertation and defense. Finally, I would like to thank my parents. They always supported me and encouraged me with their best wishes. iv
A Generate and Rank Approach to Sentence Paraphrasing
"... We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality o ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind. 1