Results 1 - 10
of
252
Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise
"... Modern machine learning-based approaches to computer vision require very large databases of hand labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector [9]). New Internet-based services allow for a large number of l ..."
Abstract
-
Cited by 169 (4 self)
- Add to MetaCart
(Show Context)
Modern machine learning-based approaches to computer vision require very large databases of hand labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector [9]). New Internet-based services allow for a large number of labelers to collaborate around the world at very low cost. However, using these services brings interesting theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used “Majority Vote ” heuristic for inferring image labels, and is robust to both noisy and adversarial labelers. 1
The multidimensional wisdom of crowds
- In In Proc. of NIPS
, 2010
"... Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is base ..."
Abstract
-
Cited by 147 (6 self)
- Add to MetaCart
(Show Context)
Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought ” amongst the annotators, and can group together images belonging to separate categories. 1
Conducting behavioral research on Amazon’s Mechanical Turk. Behav Res Methods 2012;44(1):1–23
"... Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this paper is to demonstrate how to use this website for conducting behavioral research and lower the barrier to entry for re-searchers who could benefit ..."
Abstract
-
Cited by 136 (6 self)
- Add to MetaCart
Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this paper is to demonstrate how to use this website for conducting behavioral research and lower the barrier to entry for re-searchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments and faster iteration between developing theory and executing experiments. We will discuss how the behavior of workers compares to experts and to laboratory subjects. Then, we illustrate the mechanics of putting a task on Mechanical Turk including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform including techniques for conducting synchronous experiments, methods to ensure high quality work, how to keep data private, and how to maintain code security.
Robust Sentiment Detection on Twitter from Biased and Noisy Data
"... In this paper, we propose an approach to automatically detect sentiments on Twitter messages (tweets) that explores some characteristics of how tweets are written and meta-information of the words that compose these messages. Moreover, we leverage sources of noisy labels as our training data. These ..."
Abstract
-
Cited by 125 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we propose an approach to automatically detect sentiments on Twitter messages (tweets) that explores some characteristics of how tweets are written and meta-information of the words that compose these messages. Moreover, we leverage sources of noisy labels as our training data. These noisy labels were provided by a few sentiment detection websites over twitter data. In our experiments, we show that since our features are able to capture a more abstract representation of tweets, our solution is more effective than previous ones and also more robust regarding biased and noisy data, which is the kind of data provided by these sources. 1
The Online Laboratory: Conducting Experiments in a Real Labor Market. SSRN eLibrary
, 2010
"... Online labor markets have great potential as platforms for conducting experiments. They provide immediate access to a large and diverse subject pool, and allow researchers to control the experimental context. Online experiments, we show, can be just as valid—both internally and externally—as laborat ..."
Abstract
-
Cited by 109 (6 self)
- Add to MetaCart
Online labor markets have great potential as platforms for conducting experiments. They provide immediate access to a large and diverse subject pool, and allow researchers to control the experimental context. Online experiments, we show, can be just as valid—both internally and externally—as laboratory and field experiments, while often requiring far less money and time to design and conduct. To demonstrate their value, we use an online labor market to replicate three classic experiments. The first finds quantitative agreement between levels of cooperation in a prisoner’s dilemma played online and in the physical laboratory. The second shows – consistent with behavior in the traditional laboratory – that online subjects respond to priming by altering their choices. The third demonstrates that when an identical decision is framed differently, individuals reverse their choice, thus replicating a famed Tversky-Kahneman result. Then we conduct a field experiment showing that workers have upwardsloping labor supply curves. Finally, we analyze the challenges to online experiments, proposing methods to cope with the unique threats to validity in an online setting, and examining the conceptual issues surrounding the external validity of online results. We conclude by presenting our views on the potential role that online experiments can play within the social sciences, and then recommend software development priorities and best practices. ∗Thanks to Alex Breinin and Xiaoqi Zhu for excellent research assistance. Thanks to
Supervised Learning from Multiple Experts: Whom to trust when everyone lies a bit
"... We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard. The proposed algorithm evaluates the different experts and also gives an estimate of the actual hidden labels. Experimental results i ..."
Abstract
-
Cited by 87 (6 self)
- Add to MetaCart
(Show Context)
We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard. The proposed algorithm evaluates the different experts and also gives an estimate of the actual hidden labels. Experimental results indicate that the proposed method is superior to the commonly used majority voting baseline. 1.
The labor economics of paid crowdsourcing
- In EC, 2010. Shaili Jain, Yiling
"... Crowdsourcing is a form of “peer production ” in which work traditionally performed by an employee is outsourced to an “undefined, generally large group of people in the form of an open call. ” We present a model of workers supplying labor to paid crowdsourcing projects. We also introduce a novel me ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
(Show Context)
Crowdsourcing is a form of “peer production ” in which work traditionally performed by an employee is outsourced to an “undefined, generally large group of people in the form of an open call. ” We present a model of workers supplying labor to paid crowdsourcing projects. We also introduce a novel method for estimating a worker’s reservation wage— the smallest wage a worker is willing to accept for a task and the key parameter in our labor supply model. It shows that the reservation wages of a sample of workers from Ama-zon’s Mechanical Turk (AMT) are approximately log nor-mally distributed, with a median wage of $1.38/hour. At the median wage, the point elasticity of extensive labor sup-ply is 0.43. We discuss how to use our calibrated model to make predictions in applied work. Two experimental tests of the model show that many workers respond rationally to offered incentives. However, a non-trivial fraction of sub-jects appear to set earnings targets. These “target earners” consider not just the offered wage—which is what the ra-tional model predicts—but also their proximity to earnings goals. Interestingly, a number of workers clearly prefer earn-ing total amounts evenly divisible by 5, presumably because these amounts make good targets.
CrowdSearch: Exploiting Crowds for Accurate Real-time Image Search on Mobile Phones
"... Mobile phones are becoming increasingly sophisticated with a rich set of on-board sensors and ubiquitous wireless connectivity. However, the ability to fully exploit the sensing capabilities on mobile phones is stymied by limitations in multimedia processing techniques. For example, search using cel ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
(Show Context)
Mobile phones are becoming increasingly sophisticated with a rich set of on-board sensors and ubiquitous wireless connectivity. However, the ability to fully exploit the sensing capabilities on mobile phones is stymied by limitations in multimedia processing techniques. For example, search using cellphone images often encounters high error rate due to low image quality. In this paper, we present CrowdSearch, an accurate image search system for mobile phones. CrowdSearch combines automated image search with real-time human validation of search results. Automated image search is performed using a combination of local processing on mobile phones and backend processing on remote servers. Human validation is performed using Amazon Mechanical Turk, where tens of thousands of people are actively working on simple tasks for monetary rewards. Image search with human validation presents a complex set of tradeoffs involving energy, delay, accuracy, and monetary cost. CrowdSearch addresses these challenges using a novel predictive algorithm that determines which results need to be validated, and when and how to validate them. CrowdSearch is implemented on Apple iPhones and Linux servers. We show that CrowdSearch achieves over 95 % precision across multiple image categories, provides responses within minutes, and costs only a few cents.
Combining human and machine intelligence in large-scale crowdsourcing
- In AAMAS
, 2012
"... We show how machine learning and inference can be harnessed to leverage the complementary strengths of humans and computational agents to solve crowdsourcing tasks. We construct a set of Bayesian predictive models from data and describe how the models operate within an overall crowdsourcing architec ..."
Abstract
-
Cited by 71 (15 self)
- Add to MetaCart
(Show Context)
We show how machine learning and inference can be harnessed to leverage the complementary strengths of humans and computational agents to solve crowdsourcing tasks. We construct a set of Bayesian predictive models from data and describe how the models operate within an overall crowdsourcing architecture that combines the efforts of people and machine vision on the task of classifying celestial bodies defined within a citizens ’ science project named Galaxy Zoo. We show how learned probabilistic models can be used to fuse human and machine contributions and to predict the behaviors of workers. We employ multiple inferences in concert to guide decisions on hiring and routing workers to tasks so as to maximize the efficiency of large-scale crowdsourcing processes based on expected utility.