Results 1 -
6 of
6
Data Driven Grammatical Error Detection in Transcripts of Children’s Speech
"... We investigate grammatical error detec-tion in spoken language, and present a data-driven method to train a dependency parser to automatically identify and label grammatical errors. This method is ag-nostic to the label set used, and the only manual annotations needed for training are grammatical er ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We investigate grammatical error detec-tion in spoken language, and present a data-driven method to train a dependency parser to automatically identify and label grammatical errors. This method is ag-nostic to the label set used, and the only manual annotations needed for training are grammatical error labels. We find that the proposed system is robust to disfluencies, so that a separate stage to elide disfluen-cies is not required. The proposed system outperforms two baseline systems on two different corpora that use different sets of error tags. It is able to identify utterances with grammatical errors with an F1-score as high as 0.623, as compared to a baseline F1 of 0.350 on the same data. 1
First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages, pages 74-81, The effect of disfluencies and learner errors on the parsing of spoken learner language
"... Abstract NLP tools are typically trained on written data from native speakers. However, research into language acquisition and tools for language teaching & proficiency assessment would benefit from accurate processing of spoken data from second language learners. In this paper we discuss manua ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract NLP tools are typically trained on written data from native speakers. However, research into language acquisition and tools for language teaching & proficiency assessment would benefit from accurate processing of spoken data from second language learners. In this paper we discuss manual annotation schemes for various features of spoken language; we also evaluate the automatic tagging of one particular feature (filled pauses) -finding a success rate of 81%; and we evaluate the effect of using our manual annotations to 'clean up' the transcriptions for sentence parsing, resulting in a 25% improvement in parse success rate by completely cleaning the texts of disfluencies and errors. We discuss the need to adapt existing NLP technology to non-canonical domains such as spoken learner language, while emphasising the worth of continued integration of manual and automatic annotation.
CS562/CS662 (Natural Language Processing): Dependency parsing
"... The context-free grammar (CFG) formalism does not directly encode the notion of headedness, defined as follows: X is the head of a constituent XP if and only if X is an immediate daughter of XP, and ..."
Abstract
- Add to MetaCart
The context-free grammar (CFG) formalism does not directly encode the notion of headedness, defined as follows: X is the head of a constituent XP if and only if X is an immediate daughter of XP, and
Faculty of Linguistics and Literature
"... We present STIR (STrongly Incremen-tal Repair detection), a system that de-tects speech repairs and edit terms on transcripts incrementally with minimal la-tency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the ..."
Abstract
- Add to MetaCart
(Show Context)
We present STIR (STrongly Incremen-tal Repair detection), a system that de-tects speech repairs and edit terms on transcripts incrementally with minimal la-tency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the characteristic properties of different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final accu-racy on a par with state-of-the-art incre-mental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational over-head. We evaluate its performance using incremental metrics and propose new re-pair processing evaluation standards. 1
and Literature
"... We present STIR (STrongly Incremen-tal Repair detection), a system that de-tects speech repairs and edit terms on transcripts incrementally with minimal la-tency. STIR uses information-theoretic measures from n-gram models as its prin-cipal decision features in a pipeline of classifiers detecting th ..."
Abstract
- Add to MetaCart
(Show Context)
We present STIR (STrongly Incremen-tal Repair detection), a system that de-tects speech repairs and edit terms on transcripts incrementally with minimal la-tency. STIR uses information-theoretic measures from n-gram models as its prin-cipal decision features in a pipeline of classifiers detecting the the different stages of repairs. Results on the Switchboard dis-fluency tagged corpus show utterance-final accuracy on a par with state-of-the-art in-cremental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational overhead. We evaluate its performance us-ing incremental metrics and propose new repair processing evaluation standards. 1