Results 1 - 10
of
210
A PDTB-Styled End-to-End Discourse Parser
, 2012
"... Since the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full p ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
(Show Context)
Since the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser to parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, nonexplicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and nondiscourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F1 score of 46.80 % for partial matching utilizing gold standard parses, and 38.18 % with full automation.
Revisiting Readability: A Unified Framework for Predicting Text Quality
"... We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongl ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
(Show Context)
We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks. 1
Easily identifiable discourse relations
, 2008
"... We present a corpus study of local discourse relations based on the Penn Discourse Tree Bank, a large manually annotated corpus of explicitly or implicitly realized relations. We show that while there is a large degree of ambiguity in temporal explicit discourse connectives, overall connectives are ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
(Show Context)
We present a corpus study of local discourse relations based on the Penn Discourse Tree Bank, a large manually annotated corpus of explicitly or implicitly realized relations. We show that while there is a large degree of ambiguity in temporal explicit discourse connectives, overall connectives are mostly unambiguous and allow high-accuracy prediction of discourse relation type. We achieve 93.09 % accuracy in classifying the explicit relations and 74.74 % accuracy overall. In addition, we show that some pairs of relations occur together in text more often than expected by chance. This finding suggests that global sequence classification of the relations in text can lead to better results, especially for implicit relations. 1
Using Syntax to Disambiguate Explicit Discourse Connectives in Text ∗
"... Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation. There are two types of ambiguity that need to be resolved during discourse processing. First, a word can be ambiguous between discourse or non-discourse us ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(Show Context)
Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation. There are two types of ambiguity that need to be resolved during discourse processing. First, a word can be ambiguous between discourse or non-discourse usage. For example, once can be either a temporal discourse connective or a simply a word meaning “formerly”. Secondly, some connectives are ambiguous in terms of the relation they mark. For example since can serve as either a temporal or causal connective. We demonstrate that syntactic features improve performance in both disambiguation tasks. We report state-ofthe-art results for identifying discourse vs. non-discourse usage and human-level performance on sense disambiguation. 1
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank
"... We present an implicit discourse relation classifier in the Penn Discourse Treebank (PDTB). Our classifier considers the context of the two arguments, word pair information, as well as the arguments ’ internal constituent and dependency parses. Our results on the PDTB yields a significant 14.1 % imp ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
(Show Context)
We present an implicit discourse relation classifier in the Penn Discourse Treebank (PDTB). Our classifier considers the context of the two arguments, word pair information, as well as the arguments ’ internal constituent and dependency parses. Our results on the PDTB yields a significant 14.1 % improvement over the baseline. In our error analysis, we discuss four challenges in recognizing implicit relations in the PDTB. 1
Text-level discourse parsing with rich linguistic features
- In Proc. 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea
, 2012
"... Text-level discourse parsing with rich linguistic features ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
(Show Context)
Text-level discourse parsing with rich linguistic features
Automatically Evaluating Text Coherence Using Discourse Relations
"... We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original tex ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
(Show Context)
We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29 % over three data sets against human upper bounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73 % when the features from both models are combined for the task. 1
Modality and negation: An introduction to the special issue
- Computational linguistics
, 2012
"... ..."
(Show Context)
The Hindi Discourse Relation Bank
"... We describe the Hindi Discourse Relation Bank project, aimed at developing a large corpus annotated with discourse relations. We adopt the lexically grounded approach of the Penn Discourse Treebank, and describe our classification of Hindi discourse connectives, our modifications to the sense classi ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
(Show Context)
We describe the Hindi Discourse Relation Bank project, aimed at developing a large corpus annotated with discourse relations. We adopt the lexically grounded approach of the Penn Discourse Treebank, and describe our classification of Hindi discourse connectives, our modifications to the sense classification of discourse relations, and some crosslinguistic comparisons based on some initial annotations carried out so far. 1
Realization of Discourse Relations by Other Means: Alternative
"... Studies of discourse relations have not, in the past, attempted to characterize what serves as evidence for them, beyond lists of frozen expressions, or markers, drawn from a few well-defined syntactic classes. In this paper, we describe how the lexicalized discourse relation annotations of the Penn ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
(Show Context)
Studies of discourse relations have not, in the past, attempted to characterize what serves as evidence for them, beyond lists of frozen expressions, or markers, drawn from a few well-defined syntactic classes. In this paper, we describe how the lexicalized discourse relation annotations of the Penn Discourse Treebank (PDTB) led to the discovery of a wide range of additional expressions, annotated as AltLex (alternative lexicalizations) in the PDTB 2.0. Further analysis of AltLex annotation suggests that the set of markers is openended, and drawn from a wider variety of syntactic types than currently assumed. As a first attempt towards automatically identifying discourse relation markers, we propose the use of syntactic paraphrase methods. 1