• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Assessing agreement on classification tasks: the kappa statistic,” (1996)

by J Carletta
Venue:Computational Linguistics,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 846
Next 10 →

TextTiling: Segmenting text into multi-paragraph subtopic passages

by Marti A. Hearst - Computational Linguistics , 1997
"... TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation t ..."
Abstract - Cited by 458 (2 self) - Add to MetaCart
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including information retrieval and summarization. 1.
(Show Context)

Citation Context

...other segmentation strategy would. 6.1 Reader Judgments There is a growing concern surrounding issues of intercoder reliability when using human judgments to evaluate discourse-processing algorithms (=-=Carletta 1996-=-; Condon and Cech 1995). Proposals have recently been made for protocols for the collection of human discourse segmentation data (Nakatani et al. 1995) and for how to evaluate the validity of judgment...

Dialogue act modeling for automatic tagging and recognition of conversational speech

by Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, Marie Meteer - COMPUTATIONAL LINGUISTICS , 2000
"... We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speec-act-like ..."
Abstract - Cited by 278 (14 self) - Add to MetaCart
We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speec-act-like

Inter-Coder Agreement for Computational Linguistics

by Ron Artstein, Massimo Poesio - COMPUTATIONAL LINGUISTICS , 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract - Cited by 243 (7 self) - Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.

The reliability of a dialogue structure coding scheme

by Jean Carletta, Amy Isard, Jacqueline C. Kowtko - Computational Linguistics , 1997
"... This paper describes the reliability of a dialogue structure coding scheme which is based on utterance function, game structure, and higher level transaction structure, and which has been applied to a corpus of spontaneous task-oriented spoken dialogues. 1. ..."
Abstract - Cited by 228 (16 self) - Add to MetaCart
This paper describes the reliability of a dialogue structure coding scheme which is based on utterance function, game structure, and higher level transaction structure, and which has been applied to a corpus of spontaneous task-oriented spoken dialogues. 1.
(Show Context)

Citation Context

...us for both classification and segmentation, the basic question is what level of agreement coders reach under the reliability tests. 4.2 Interpreting reliability results It has been argued elsewhere (=-=Carletta, 1996-=-) that since the amount of agreement one would expect by chance depends on the number and relative frequencies of the categories under test, reliability for category classifications should be measured...

Partially observable markov decision processes with continuous observations for dialogue management

by Jason D. Williams - Computer Speech and Language , 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract - Cited by 217 (52 self) - Add to MetaCart
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
(Show Context)

Citation Context

...ser state is rather more thorny. A host of annotation schemes for dialog exist [58, 119, 114, 73], and established measurements such as Kappa show whether an annotation scheme can be reliably applied =-=[15]-=-. However, a reliable annotation scheme is only a necessary condition: a useful annotation methodology should ultimately resultCHAPTER 7. CONCLUSIONS AND FUTURE WORK 115 in increased average return, ...

Summarizing Scientific Articles - Experiments with Relevance and Rhetorical Status

by Simone Teufel, Marc Moens - Computational Linguistics , 2002
"... this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the ..."
Abstract - Cited by 199 (3 self) - Add to MetaCart
this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the new contribution of the source paper and situate it with respect to earlier work. We provide a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics with human judgements of rhetorical status and relevance. We present several experiments measuring our judges' agreement on these annotations. We also present an algorithm which, on the basis of the annotated training material, selects content and classifies it into a fixed set of seven rhetorical categories. The output of this extraction and classification system can be viewed as a single-document summary in its own right; alternatively, it can be used to generate task-oriented and user-tailored summaries designed to give users an overview of a scientific field.

Towards detecting emotions in spoken dialogs

by Chul Min Lee, Student Member, Shrikanth S. Narayanan, Senior Member - IEEE Transactions on Speech and Audio Processing , 2005
"... Abstract—The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This paper explores the detection of domain-specific emotions using language and discourse information in conju ..."
Abstract - Cited by 178 (22 self) - Add to MetaCart
Abstract—The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This paper explores the detection of domain-specific emotions using language and discourse information in conjunction with acoustic correlates of emotion in speech signals. The specific focus is on a case study of detecting negative and non-negative emotions using spoken language data obtained from a call center application. Most previous studies in emotion recognition have used only the acoustic information contained in speech. In this paper, a combination of three sources of information—acoustic, lexical, and discourse—is used for emotion recognition. To capture emotion information at the language level, an information-theoretic notion of emotional salience is introduced. Optimization of the acoustic correlates of emotion with respect to classification error was accomplished by investigating different feature sets obtained from feature selection, followed by principal component analysis. Experimental results on our call center data show that the best results are obtained when acoustic and language information are combined. Results show that combining all the information, rather than using only acoustic information, improves emotion classification by 40.7 % for males and 36.4 % for females (linear discriminant classifier used for acoustic information). Index Terms—Acoustic correlates, dialog systems, emotion recognition, emotional salience, feature selection, information fusion, principal component analysis, spoken language processing. I.
(Show Context)

Citation Context

...t most non-negative emotion utterances were neutral in nature, i.e., they had no apparent display of emotions. To measure the amount of agreement among the taggers, the kappa statistic was used [19], =-=[20]-=-. The kappa statistic provides a measure of agreement for categorical variables in subjective tests. The kappa coefficient, K, is the ratio of the proportion of times that the coders/taggers agree (co...

Towards the Self-Annotating Web

by Philipp Cimiano, Siegfried Handschuh, Steffen Staab , 2004
"... The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata. In this paper we propose PANKOW (Pattern-based Annotation throug ..."
Abstract - Cited by 174 (11 self) - Add to MetaCart
The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata. In this paper we propose PANKOW (Pattern-based Annotation through Knowledge on the Web), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

Coding Dialogs with the DAMSL Annotation Scheme

by Mark G. Core, James F. Allen , 1997
"... This paper describes the DAMSL annotation scheme for communicative acts in dialog. The scheme has three layers: Forward Communicative Functions, Backward Communicative Functions, and Utterance Features. Each layer allows multiple communicative functions of an utterance to be labeled. The Forward Com ..."
Abstract - Cited by 165 (1 self) - Add to MetaCart
This paper describes the DAMSL annotation scheme for communicative acts in dialog. The scheme has three layers: Forward Communicative Functions, Backward Communicative Functions, and Utterance Features. Each layer allows multiple communicative functions of an utterance to be labeled. The Forward Communicative Functions consist of a taxonomy in a similar style as the actions of traditional speech act theory. The Backward Communicative Functions indicate how the current utterance relates to the previous dialog, such as accepting a proposal, confirming understanding, or answering a question. The Utterance Features include information about an utterance's form and content, such as whether an utterance concerns the communication process itself or deals with the subject at hand. The kappa inter-annotator reliability scores for the first test of DAMSL with human annotators show promise, but are on average 0.15 lower than the accepted kappa scores for such annotations. However, the slight revi...

A Corpus-Based Investigation of Definite Description Use

by Massimo Poesio, Renata Vieira - Computational Linguistics , 1998
"... We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a ..."
Abstract - Cited by 163 (43 self) - Add to MetaCart
We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K=0.63) that we obtained using versions of Hawkins' and Prince's classification schemes; better results (K=0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement ...
(Show Context)

Citation Context

...ose adopted in prior corpus-based studies such as (Prince, 1981; Fraurud, 1990). Our study is also different from these previous ones in that measuring the agreement among annotators became an issue (=-=Carletta, 1996-=-). We used for the experiments a set of randomly selected articles from the Wall Street Journal contained in the ACL/DCI CD-ROM, rather than a corpus of transcripts of spoken language corpora such as ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University