• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Learning to paraphrase: an unsupervised approach using multiple-sequence alignment (0)

by R Barzilay, L Lee
Venue:In Proceedings of HLT-NAACL
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 258
Next 10 →

Opinion Mining and Sentiment Analysis

by Bo Pang, Lillian Lee , 2008
"... An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, active ..."
Abstract - Cited by 749 (3 self) - Add to MetaCart
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include materialon summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

LexRank: Graph-based lexical centrality as salience in text summarization

by Dragomir R. Radev - Journal of Artificial Intelligence Research , 2004
"... We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a doc ..."
Abstract - Cited by 266 (9 self) - Add to MetaCart
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically dened in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in rst place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. 1.
(Show Context)

Citation Context

...anguage processing (NLP) has moved to a very firm mathematical foundation. Many problems in NLP, e.g., parsing (Collins, 1997), word sense disambiguation (Yarowsky, 1995), and automatic paraphrasing (=-=Barzilay & Lee, 2003-=-) have benefited significantly by the introduction of robust statistical techniques. Recently, robust graphbased methods for NLP have also been gaining a lot of interest, e.g., in word clustering (Bre...

Vector-based models of semantic composition

by Jeff Mitchell, Mirella Lapata - In Proceedings of ACL-08: HLT , 2008
"... This paper proposes a framework for representing the meaning of phrases and sentences in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which ..."
Abstract - Cited by 220 (5 self) - Add to MetaCart
This paper proposes a framework for representing the meaning of phrases and sentences in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which we evaluate empirically on a sentence similarity task. Experimental results demonstrate that the multiplicative models are superior to the additive alternatives when compared against human judgments.
(Show Context)

Citation Context

...esented to judges who are asked to decide whether they are semantically equivalent, i.e., whether they can be generally substituted for one another in the same context without great information loss (=-=Barzilay & Lee, 2003-=-; Bannard & Callison-Burch, 2005). Participants are usually asked to rate the paraphrase pairs using a nominal scale (e.g., definitely similar, sometimes similar, never similar). In our experiments, w...

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

by Chris Quirk, Chris Brockett, See Profile, Bill Dolan, Chris Quirk, Chris Brockett - Proceedings of the 20th International Conference on Computational Linguistics (Coling’04 , 2004
"... Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources ..."
Abstract - Cited by 214 (2 self) - Add to MetaCart
Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources
(Show Context)

Citation Context

... from thousands of news sources over an extended period. While the idea of exploiting multiple news reports for paraphrase acquisition is not new, previous efforts (for example, Shinyama et al. 2002; =-=Barzilay and Lee 2003-=-) have been restricted to at most two news sources. Our work represents what we believe to be the first attempt to exploit the explosion of news coverage on the Web, where a single event can generate ...

Paraphrasing with Bilingual Parallel Corpora

by Colin Bannard, Chris Callison-burch - In ACL-2005 , 2005
"... Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases ..."
Abstract - Cited by 193 (16 self) - Add to MetaCart
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. We evaluate our paraphrase extraction and ranking methods using a set of manual word alignments, and contrast the quality with paraphrases extracted from automatic alignments. 1
(Show Context)

Citation Context

...nce that an answer is correct (Ibrahim et al., 2003). In this paper we introduce a novel method for extracting paraphrases that uses bilingual parallel corpora. Past work (Barzilay and McKeown, 2001; =-=Barzilay and Lee, 2003-=-; Pang et al., 2003; Ibrahim et al., 2003) has examined the use of monolingual parallel corpora for paraphrase extraction. Examples of monolingual parallel corpora that have been used are multiple tra...

Composition in distributional models of semantics

by Jeffrey Mitchell , 2010
"... Distributional models of semantics have proven themselves invaluable both in cog-nitive modelling of semantic phenomena and also in practical applications. For ex-ample, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffit ..."
Abstract - Cited by 148 (3 self) - Add to MetaCart
Distributional models of semantics have proven themselves invaluable both in cog-nitive modelling of semantic phenomena and also in practical applications. For ex-ample, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have been shown to achieve human level performance on synonymy tests (Landuaer and Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as Foreign Language (TOEFL). This ability has been put to practical use in automatic the-saurus extraction (Grefenstette, 1994). However, while there has been a considerable amount of research directed at the most effective ways of constructing representations for individual words, the representation of larger constructions, e.g., phrases and sen-tences, has received relatively little attention. In this thesis we examine this issue of how to compose meanings within distributional models of semantics to form represen-tations of multi-word structures. Natural language data typically consists of such complex structures, rather than
(Show Context)

Citation Context

...in semantics can often be investigated experimentally by simply asking subjects to make a comparison of the meanings of two expressions. For example, we can ask whether two sentences are paraphrases (=-=Barzilay and Lee, 2003-=-), i.e. share the same meaning, without requiring a philosophical theory of what meaning is. Similarly, we could study the extent of semantic similarity between words (Rubenstein and Goodenough, 1965)...

Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences

by Bo Pang, Kevin Knight, Daniel Marcu - In Proceedings of HLT/NAACL , 2003
"... We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sente ..."
Abstract - Cited by 135 (5 self) - Add to MetaCart
We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sentences that express the same meaning as the sentences in the input sets. Our FSAs can also predict the correctness of alternative semantic renderings, which may be used to evaluate the quality of translations.

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

by Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, Christopher D. Manning
"... Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive a ..."
Abstract - Cited by 122 (6 self) - Add to MetaCart
Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn feature vectors for phrases in syntactic trees. These features are used to measure the word- and phrase-wise similarity between two sentences. Since sentences may be of arbitrary length, the resulting matrix of similarity measures is of variable size. We introduce a novel dynamic pooling layer which computes a fixed-sized representation from the variable-sized matrices. The pooled representation is then used as input to a classifier. Our method outperforms other state-of-the-art approaches on the challenging MSRP paraphrase corpus. 1
(Show Context)

Citation Context

...ation. This is also captured by our model. 5 Related Work The field of paraphrase detection has progressed immensely in recent years. Early approaches were based purely on lexical matching techniques =-=[22, 23, 19, 24]-=-. Since these methods are often based on exact string matches of n-grams, they fail to detect similar meaning that is conveyed by synonymous words. Several approaches [17, 18] overcome this problem by...

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

by Regina Barzilay, Lillian Lee , 2004
"... We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. ..."
Abstract - Cited by 122 (7 self) - Add to MetaCart
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear.

Monolingual machine translation for paraphrase generation

by Chris Quirk, Chris Brockett, William Dolan - In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , 2004
"... We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is mea ..."
Abstract - Cited by 111 (6 self) - Add to MetaCart
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches. 1
(Show Context)

Citation Context

...n, dialog, and question answering. Recent research has treated paraphrase acquisition and generation as a machine learning problem (Barzilay & McKeown, 2001; Lin & Pantel, 2002; Shinyama et al, 2002, =-=Barzilay & Lee, 2003-=-, Pang et al., 2003). We approach this problem as one of statistical machine translation (SMT), within the noisy channel model of Brown et al. (1993). That is, we seek to identify the optimal paraphra...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University