Results 1 - 10
of
16
Querying and updating treebanks: A critical survey and requirements analysis
- In Proceedings of the Australasian Language Technology Workshop
, 2004
"... Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus-specific query tools or special-purpose scripts. While the size of these databases and the range of applications has grown rapidly in recent ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus-specific query tools or special-purpose scripts. While the size of these databases and the range of applications has grown rapidly in recent years, neither method for managing the data has led to reusable, scalable software. The formal properties of the query languages are not well understood. Hence established methods for indexing tree data and optimizing tree queries cannot be employed. We analyze a range of existing linguistic query languages, and adduce a set of requirements for a reusable, scalable linguistic query language. 1
Attribution and its annotation in the Penn Discourse Treebank. Traitement Automatique des Langues
, 2007
"... ABSTRACT. In this paper, we describe an annotation scheme for the attribution of abstract objects (propositions, facts, and eventualities) associated with discourse relations and their arguments annotated in the Penn Discourse TreeBank. The scheme aims to capture both the source and degrees of factu ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
(Show Context)
ABSTRACT. In this paper, we describe an annotation scheme for the attribution of abstract objects (propositions, facts, and eventualities) associated with discourse relations and their arguments annotated in the Penn Discourse TreeBank. The scheme aims to capture both the source and degrees of factuality of the abstract objects through the annotation of text spans signalling the attribution, and of features recording the source, type, scopal polarity, and determinacy of attribution. RÉSUMÉ. Dans cet article, nous décrivons un schéma d’annotation pour l’encodage des objets abstraits (propositions, faits et possibilités) associés aux relations de discours et à leurs arguments tels qu’annotés dans le Penn Discourse TreeBank. Ce schéma a pour objet la capture de la source et du degré de factualité des objets abstraits. Les aspects clés de ce schéma comprennent l’annotation des intervalles textuels signalant l’attribution, ainsi que l’annotation des proprietés caractérisant la source, le type, la polarité de la portée, et le degré de détermination de l’attribution.
D-LTAG: Extending lexicalized TAG to discourse
- Cognitive Science
"... Material in this article derives from talks I have given and from papers co-authored by mem- ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Material in this article derives from talks I have given and from papers co-authored by mem-
The Penn Discourse TreeBank as a resource for natural language generation
- In Proc. of the Corpus Linguistics Workshop on Using Corpora for Natural Language Generation
, 2005
"... While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse Tree-Bank (PDTB) can serve as a valuable large scale annotated corpus resource for furthering research in NLG and for inducing models for the development of NLG systems. The PDTB is annotated for discourse relations, and encodes explicitly the elements of these relations: explicit and implicit discourse connectives, denoting the predicates of the relations, and text spans, denoting the arguments of the relations. Connectives and arguments are also annotated with features and spans related to attribution, and each connective will be annotated with labels standing for the projected discourse relation, including sense distinctions for polysemous connectives. We exemplify the use of the corpus for two tasks in NLG: the realization of discourse relations during sentence planning, and the representation and realization of attribution. 1
Annotating attribution in the Penn Discourse TreeBank
- In Proceedings of the COLING/ACL Workshop on Sentiment and Subjectivity in Text
, 2006
"... rjprasad,nikhild,aleewk,joshi¡ An emerging task in text understanding and generation is to categorize information as fact or opinion and to further attribute it to the appropriate source. Corpus annotation schemes aim to encode such distinctions for NLP applications concerned with such tasks, such a ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
rjprasad,nikhild,aleewk,joshi¡ An emerging task in text understanding and generation is to categorize information as fact or opinion and to further attribute it to the appropriate source. Corpus annotation schemes aim to encode such distinctions for NLP applications concerned with such tasks, such as information extraction, question answering, summarization, and generation. We describe an annotation scheme for marking the attribution of abstract objects such as propositions, facts and eventualities associated with discourse relations and their arguments annotated in the Penn Discourse TreeBank. The scheme aims to capture the source and degrees of factuality of the abstract objects. Key aspects of the scheme are annotation of the text spans signalling the attribution, and annotation of features recording the source, type, scopal polarity, and determinacy of attribution. 1
Accounting for discourse relations: Constituency and dependency
- Intelligent Linguistic Architectures
, 2006
"... At the start of my career, I had the good fortune of working with ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
At the start of my career, I had the good fortune of working with
Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex than in Syntax?
"... This paper investigates the complexity of dependencies at the discourse level, in particular the dependencies between discourse connectives and their arguments. Our study is based on data from the Penn Discourse Treebank (PDTB) and is therefore an exploration into the ways treebanks can inform lingu ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
This paper investigates the complexity of dependencies at the discourse level, in particular the dependencies between discourse connectives and their arguments. Our study is based on data from the Penn Discourse Treebank (PDTB) and is therefore an exploration into the ways treebanks can inform linguistic issues. We observe that, unlike in syntax, there is more uncertainty and flexibility with regards to the location and extent of discourse arguments. This leads to a variety of possible patterns of dependencies between pairs of discourse relations, including nested, crossed and a range of other non-tree-like configurations. Nevertheless, our main conclusion is that the types of discourse dependencies are highly restricted since the more complex cases can be factored out by appealing to discourse notions like anaphora and attribution. We conjecture that the complexity of dependencies is far more restricted at the discourse level as compared to the syntactic level. 1
Automatic Identification of Discourse Markers in Multiparty Dialogues
, 2006
"... The lexical items that can serve as discourse markers (DMs) are often multi-functional. Like and well, in particular, play numerous other roles apart from DMs: for instance, the first one can also be a verb and the second one an adverb. The goal of the present study is the identification, on transcr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
The lexical items that can serve as discourse markers (DMs) are often multi-functional. Like and well, in particular, play numerous other roles apart from DMs: for instance, the first one can also be a verb and the second one an adverb. The goal of the present study is the identification, on transcripts of multi-party dialogues, of the occurrences of like and well that play a discourse or pragmatic role. DM identification is a binary classification task over the set of all occurrences of tokens like and well. The importance of DMs to computational linguistics is first discussed, along with previous experiments in DM identification. Then, the data is briefly described, emphasizing the DM annotation procedure and an inter-annotator agreement study. The proposed method uses lexical, prosodic/positional and sociolinguistic features, together with machine learning algorithms, among which decision trees are preferred. The results obtained using a ten-fold cross-validation procedure are analysed at length, focussing first on overall performance, and then on the relevance of each type of features. Feature analysis using a range of techniques shows that lexical indicators are the most reliable features for DM identification, followed by prosodic/positional features. Sociolinguistic features are slightly correlated with the use of like as DM, while the dialogue act of the utterance containing a DM candidate
TOWARDS A DISCOURSE RESOURCE FOR ITALIAN: DEVELOPING AN ANNOTATION SCHEMA FOR ATTRIBUTION
"... Anno Accademico 2008/2009Nobody believes the official spokesman... but everybody trusts an unidentified source. This thesis investigates the complex phenomenon of attribution and addresses the issue of annotating attribution relations, developing, by means of a pilot study, a possible annotation sch ..."
Abstract
- Add to MetaCart
(Show Context)
Anno Accademico 2008/2009Nobody believes the official spokesman... but everybody trusts an unidentified source. This thesis investigates the complex phenomenon of attribution and addresses the issue of annotating attribution relations, developing, by means of a pilot study, a possible annotation schema to be applied to the Italian Syntactic Semantic Treebank corpus of newspaper articles (ISST). Attribution is the relation occurring between assertions but also e.g. beliefs, feelings, intentions, and the agents they belong to (e.g. The minister says that taxes will rise in 2010). As this relation deeply affects the way we perceive information, this should not be considered in isolation. It is fundamental to recognise attribution in order to deal with the reliability of information and with opinions. The development of an annotation schema for attribution aims at providing a resource in which information is overtly linked to its source. Having this
unknown title
"... grammar, anaphora This paper surveys work on applying the insights of lexicalized grammars to low-level discourse, to show the value of positing an autonomous grammar for low-level discourse in which words (or idiomatic phrases) are associated with discourse-level predicate-argument structures or mo ..."
Abstract
- Add to MetaCart
(Show Context)
grammar, anaphora This paper surveys work on applying the insights of lexicalized grammars to low-level discourse, to show the value of positing an autonomous grammar for low-level discourse in which words (or idiomatic phrases) are associated with discourse-level predicate-argument structures or modification structures that convey their syntactic-semantic meaning and scope. It starts by describing a lexicalized Tree Adjoining Grammar for discourse (D-LTAG). It then reviews an initial experiment in parsing text automatically, using both a lexicalized TAG and D-LTAG, and then touches upon issues involved in how lexico-syntactic elements contribute to discourse semantics. The paper concludes with a brief description of the Penn Discourse TreeBank, a resource being developed for the study of discourse structure and semantics.