Results 1 - 10
of
13
The Benefits of a Model of Annotation
"... This paper presents a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data. It is argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quali ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper presents a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data. It is argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quality gold standard labels. Compared to conventional agreement measures, application of an annotation model to instances with crowdsourced labels yields higher quality labels at lower cost. 1
Identifying Metaphorical Word Use with Tree Kernels
"... A metaphor is a figure of speech that refers to one concept in terms of another, as in “He is such a sweet person”. Metaphors are ubiquitous and they present NLP with a range of challenges for WSD, IE, etc. Identifying metaphors is thus an important step in language understanding. However, since alm ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
A metaphor is a figure of speech that refers to one concept in terms of another, as in “He is such a sweet person”. Metaphors are ubiquitous and they present NLP with a range of challenges for WSD, IE, etc. Identifying metaphors is thus an important step in language understanding. However, since almost any word can serve as a metaphor, they are impossible to list. To identify metaphorical use, we assume that it results in unusual semantic patterns between the metaphor and its dependencies. To identify these cases, we use SVMs with tree-kernels on a balanced corpus of 3872 instances, created by bootstrapping from available metaphor lists. 1 We outperform two baselines, a sequential and a vectorbased approach, and achieve an F1-score of 0.75. 1
A.: Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
- In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA) (2014) 859–866
"... Abstract Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering- ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.
Annotation of regular polysemy and underspecification
"... We present the result of an annotation task on regular polysemy for a series of semantic classes or dot types in English, Danish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods: majorit ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present the result of an annotation task on regular polysemy for a series of semantic classes or dot types in English, Danish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods: majority voting with a theory-compliant backoff strategy, and MACE, an unsupervised system to choose the most likely sense from all the annotations. 1
Augmenting English Adjective Senses with Supersenses
"... We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult to classify, even for trained human annotators. We release the manually annotated data, the classifier, and the induced supersense labeling of 12,304 WordNet adjective synsets.
Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments
, 2013
"... This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in All Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact scholarsarchive@byu.edu. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in All Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact scholarsarchive@byu.edu.
Adapting taggers to Twitter with not-so-distant supervision
"... We experiment with using different sources of distant supervision to guide unsupervised and semi-supervised adaptation of part-of-speech (POS) and named entity taggers (NER) to Twitter. We show that a particularly good source of not-so-distant supervision is linked websites. Specif-ically, with this ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We experiment with using different sources of distant supervision to guide unsupervised and semi-supervised adaptation of part-of-speech (POS) and named entity taggers (NER) to Twitter. We show that a particularly good source of not-so-distant supervision is linked websites. Specif-ically, with this source of supervision we are able to improve over the state-of-the-art for Twitter POS tagging (89.76 % accuracy, 8 % error reduction) and NER (F1=79.4%, 10 % error reduction). 1
Making the Most of . . . : Confused Supervised LDA
, 2015
"... Corpus labeling projects frequently use low-cost workers from microtask market-places; however, these workers are often inexperienced or have misaligned incentives. Crowdsourcing models must be robust to the resulting systematic and non-systematic inaccuracies. We introduce a novel crowdsourcing mod ..."
Abstract
- Add to MetaCart
Corpus labeling projects frequently use low-cost workers from microtask market-places; however, these workers are often inexperienced or have misaligned incentives. Crowdsourcing models must be robust to the resulting systematic and non-systematic inaccuracies. We introduce a novel crowdsourcing model that adapts the discrete supervised topic model sLDA to handle multiple corrupt, usually conflicting (hence “confused”) supervision signals. Our model achieves significant gains over previous work in the accuracy of deduced ground truth.
Crowdsourcing
"... Most annotated corpora of wide use in computational linguistics were created using traditional annotation methods, but such methods may not be appropriate for smaller scale annotation and tend to be too expensive for very large scale annotation. This chapter covers crowdsourcing, the use of web col ..."
Abstract
- Add to MetaCart
Most annotated corpora of wide use in computational linguistics were created using traditional annotation methods, but such methods may not be appropriate for smaller scale annotation and tend to be too expensive for very large scale annotation. This chapter covers crowdsourcing, the use of web collaboration for an-notation. Both microtask crowdsourcing and games-with-a-purpose are discussed, as well as their use in computational linguistics.
Analysis and Modeling of “Focus ” in Context
"... This paper uses a crowd-sourced definition of a speech phe-nomenon we have called “focus”. Given sentences, text and speech, in isolation and in context, we asked annotators to iden-tify what we term the “focus ” word. We present their consis-tency in identifying the focused word, when presented wit ..."
Abstract
- Add to MetaCart
(Show Context)
This paper uses a crowd-sourced definition of a speech phe-nomenon we have called “focus”. Given sentences, text and speech, in isolation and in context, we asked annotators to iden-tify what we term the “focus ” word. We present their consis-tency in identifying the focused word, when presented with text or speech stimuli. We then build models to show how well we predict that focus word from lexical (and higher) level features. Also, using spectral and prosodic information, we show the dif-ferences in these focus words when spoken with and without context. Finally, we show how we can improve speech synthe-sis of these utterances given focus information.