Results 1 - 10
of
252
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
, 2003
"... We address the text-to-text generation problem of sentence-level paraphrasing --- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of para ..."
Abstract
-
Cited by 258 (2 self)
- Add to MetaCart
We address the text-to-text generation problem of sentence-level paraphrasing --- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences. The results of our evaluation experiments show that the system derives accurate paraphrases, outperforming baseline systems.
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
- Proceedings of the 20th International Conference on Computational Linguistics (Coling’04
, 2004
"... Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources ..."
Abstract
-
Cited by 214 (2 self)
- Add to MetaCart
(Show Context)
Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources
Paraphrasing with Bilingual Parallel Corpora
- In ACL-2005
, 2005
"... Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases ..."
Abstract
-
Cited by 193 (16 self)
- Add to MetaCart
(Show Context)
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. We evaluate our paraphrase extraction and ranking methods using a set of manual word alignments, and contrast the quality with paraphrases extracted from automatic alignments. 1
Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences
- In Proceedings of HLT/NAACL
, 2003
"... We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sente ..."
Abstract
-
Cited by 135 (5 self)
- Add to MetaCart
We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sentences that express the same meaning as the sentences in the input sets. Our FSAs can also predict the correctness of alternative semantic renderings, which may be used to evaluate the quality of translations.
Improved statistical machine translation using paraphrases
- In Proceedings of HLT/NAACL-2006
, 2006
"... Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with th ..."
Abstract
-
Cited by 117 (3 self)
- Add to MetaCart
(Show Context)
Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a stateof-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48 % to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches. 1
Monolingual machine translation for paraphrase generation
- In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
, 2004
"... We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is mea ..."
Abstract
-
Cited by 111 (6 self)
- Add to MetaCart
(Show Context)
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches. 1
Automatic paraphrase acquisition from news articles
- In Proc. of HLT
, 2002
"... Paraphrases play an important role in the variety and complexity of natural language documents. However, they add to the difficulty of natural language processing. Here we describe a procedure for ob-taining paraphrases from news articles. Articles derived from dif-ferent newspapers can contain para ..."
Abstract
-
Cited by 97 (5 self)
- Add to MetaCart
(Show Context)
Paraphrases play an important role in the variety and complexity of natural language documents. However, they add to the difficulty of natural language processing. Here we describe a procedure for ob-taining paraphrases from news articles. Articles derived from dif-ferent newspapers can contain paraphrases if they report the same event on the same day. We exploit this feature by using Named Entity recognition. Our approach is based on the assumption that Named Entities are preserved across paraphrases. We applied our method to articles of two domains and obtained notable examples. Although this is our initial attempt at automatically extracting para-phrases from a corpus, the results are promising. 1.
Generating phrasal and sentential paraphrases: A survey of data-driven methods
- Computational Linguistics
, 2010
"... The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language— words, phrases, and sentences—is an important part of natural language processing (NLP) and is being inc ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
(Show Context)
The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language— words, phrases, and sentences—is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation. 1.
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
"... ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alo ..."
Abstract
-
Cited by 65 (10 self)
- Add to MetaCart
ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. 1
Measuring semantic similarity by latent relational analysis.
- In Proceedings of the 19th International Joint Conference on Artificial Intelligence,
, 2005
"... ..."