Results 1 - 10
of
47
Dynamically switching between synergistic workflows for crowdsourcing. AAAI
, 2012
"... To ensure quality results from unreliable crowdsourced work-ers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accom-plish the task, and eventually deploy the one that achieves th ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
To ensure quality results from unreliable crowdsourced work-ers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accom-plish the task, and eventually deploy the one that achieves the best performance during early trials. Surprisingly, this seemingly natural design paradigm does not achieve the full potential of crowdsourcing. In particu-lar, using a single workflow (even the best) to accomplish a task is suboptimal. We show that alternative workflows can compose synergistically to yield much higher quality output. We formalize the insight with a novel probabilistic graphi-cal model. Based on this model, we design and implement AGENTHUNT, a POMDP-based controller that dynamically switches between these workflows to achieve higher returns on investment. Additionally, we design offline and online methods for learning model parameters. Live experiments on Amazon Mechanical Turk demonstrate the superiority of AGENTHUNT for the task of generating NLP training data, yielding up to 50 % error reduction and greater net utility compared to previous methods.
Leveraging transitive relations for crowdsourced joins
- In SIGMOD Conference
, 2013
"... ABSTRACT The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
ABSTRACT The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. As a human-only solution is expensive, we adopt a hybrid human-machine approach which first uses machines to generate a candidate set of matching pairs, and then asks humans to label the pairs in the candidate set as either matching or non-matching. Given the candidate pairs, existing approaches will publish all pairs for verification to a crowdsourcing platform. However, they neglect the fact that the pairs satisfy transitive relations. As an example, if o1 matches with o2, and o2 matches with o3, then we can deduce that o1 matches with o3 without needing to crowdsource (o1, o3). To this end, we study how to leverage transitive relations for crowdsourced joins. We propose a hybrid transitive-relations and crowdsourcing labeling framework which aims to crowdsource the minimum number of pairs to label all the candidate pairs. We prove the optimal labeling order and devise a parallel labeling algorithm to efficiently crowdsource the pairs following the order. We evaluate our approaches in both simulated environment and a real crowdsourcing platform. Experimental results show that our approaches with transitive relations can save much more money and time than existing methods, with a little loss in the result quality.
Deco: Declarative crowdsourcing
- In CIKM
, 2012
"... Crowdsourcing enables programmers to incorporate “human com-putation ” as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Simi-larly, humans can be used as a building block in data-intensive applications—providing, comparing, and verifying ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Crowdsourcing enables programmers to incorporate “human com-putation ” as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Simi-larly, humans can be used as a building block in data-intensive applications—providing, comparing, and verifying data used by applications. Building upon the decades-long success of declara-tive approaches to conventional data management, we use a similar approach for data-intensive applications that incorporate humans. Specifically, declarative queries are posed over stored relational data as well as data computed on-demand from the crowd, and the underlying system orchestrates the computation of query answers. We present Deco, a database system for declarative crowdsourc-ing. We describe Deco’s data model, query language, and our pro-totype. Deco’s data model was designed to be general (it can be instantiated to other proposed models), flexible (it allows methods for data cleansing and external access to be plugged in), and prin-cipled (it has a precisely-defined semantics). Syntactically, Deco’s query language is a simple extension to SQL. Based on Deco’s data model, we define a precise semantics for arbitrary queries involv-ing both stored data and data obtained from the crowd. We then describe the Deco query processor which uses a novel push-pull hybrid execution model to respect the Deco semantics while coping with the unique combination of latency, monetary cost, and uncer-tainty introduced in the crowdsourcing environment. Finally, we describe our current prototype, and we experimentally explore the query processing alternatives provided by Deco. 1.
Crowdsourcing Control: Moving Beyond Multiple Choice
"... To ensure quality results from crowdsourced tasks, requesters often aggregate worker responses and use one of a plethora of strategies to infer the correct answer from the set of noisy responses. However, all current models assume prior knowledge of all possible outcomes of the task. While not an un ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
(Show Context)
To ensure quality results from crowdsourced tasks, requesters often aggregate worker responses and use one of a plethora of strategies to infer the correct answer from the set of noisy responses. However, all current models assume prior knowledge of all possible outcomes of the task. While not an unreasonable assumption for tasks that can be posited as multiple-choice questions (e.g. n-ary classification), we observe that many tasks do not naturally fit this paradigm, but instead demand a free-response formulation where the outcome space is of infinite size (e.g. audio transcription). We model such tasks with a novel probabilistic graphical model, and design and implement LazySusan, a decision-theoretic controller that dynamically requests responses as necessary in order to infer answers to these tasks. We also design an EM algorithm to jointly learn the parameters of our model while inferring the correct answers to multiple tasks at a time. Live experiments on Amazon Mechanical Turk demonstrate the superiority of LazySusan at solving SAT Math questions, eliminating 83.2 % of the error and achieving greater net utility compared to the state-ofthe-art strategy, majority-voting. We also show in live experiments that our EM algorithm outperforms majority-voting on a visualization task that we design. 1
Evaluating the crowd with confidence
- in SIGKDD
, 2013
"... Worker quality control is a crucial aspect of crowdsourcing sys-tems; typically occupying a large fraction of the time and money invested on crowdsourcing. In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of wor ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Worker quality control is a crucial aspect of crowdsourcing sys-tems; typically occupying a large fraction of the time and money invested on crowdsourcing. In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. We show that our techniques generate correct confidence intervals on a range of real-world datasets, and demonstrate wide applicability by using them to evict poorly performing workers, and provide confidence intervals on the accuracy of the answers.
On the complexity of mining itemsets from the crowd using taxonomies
, 2014
"... We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
(Show Context)
We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational com-plexity, that measures the computational effort required to choose the questions. We provide lower and upper complex-ity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.
Optimal Crowd-Powered Rating and Filtering Algorithms
"... We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solutions for crowd-powered filtering exist, they make a range of implicit assumptions and restrictions, ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solutions for crowd-powered filtering exist, they make a range of implicit assumptions and restrictions, ultimately rendering them not powerful enough for real-world applications. We describe two approaches to discard these implicit assumptions and restrictions: one, that carefully generalizes prior work, leading to an optimal, but oftentimes intractable solution, and another, that provides a novel way of reasoning about filtering strategies, leading to a sometimes suboptimal, but efficiently computable solution (that is asymptotically close to optimal). We demonstrate that our techniques lead to significant reductions in error of up to 30 % for fixed cost over prior work in a novel crowdsourcing application: peer evaluation in online courses. 1.
dbTouch: Analytics at your Fingertips
"... As we enter the era of data deluge, turning data into knowledge has become the major challenge across most sciences and businesses that deal with data. In addition, as we increase our ability to create data, more and more people are confronted with data management problems on a daily basis for numer ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
As we enter the era of data deluge, turning data into knowledge has become the major challenge across most sciences and businesses that deal with data. In addition, as we increase our ability to create data, more and more people are confronted with data management problems on a daily basis for numerous aspects of every day life. A fundamental need is data exploration through interactive tools, i.e., being able to quickly and effortlessly determine data and patterns of interest. However, modern database systems have not been designed with data exploration and usability in mind; they require users with expert knowledge and skills, while they react in a strict and monolithic way to every user request, resulting in correct answers but slow response times. In this paper, we introduce the vision of a new generation of data management systems, called dbTouch; our vision is to enable interactive and intuitive data exploration via database kernels which are tailored for touch-based exploration. No expert knowledge is needed. Data is represented in a visual format, e.g., a column shape for an attribute or a fat rectangle shape for a table, while users can touch those shapes and interact/query with gestures as opposed to firing complex SQL queries. The system does not try to consume all data; instead it analyzes only parts of the data at a time, continuously refining the answers and continuously reacting to user input. Every single touch on a data object can be seen as a request to run an operator or a collection of operators over part of the data. Users react to running results and continuously adjust the data exploration- they continuously determine the data to be processed next by adjusting the direction and speed of a gesture, i.e., a collection of touches; the database system does not have control on the data flow anymore. We discuss the various benefits that dbTouch systems bring for data analytics as well as the new and unique challenges for database research in combination with touch interfaces. In addition, we provide an initial architecture, implementation and evaluation (and demo) of a dbTouch prototype over IOs for IPad. This article is published under a Creative Commons Attribution License
Query processing over crowdsourced data
, 2012
"... We are building Deco, a comprehensive system for answering declarative queries posed over stored relational data together with data gathered from the crowd. In this paper we present Deco’s query processor, building on Deco’s data model and query language presented earlier. In general, it has been ob ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We are building Deco, a comprehensive system for answering declarative queries posed over stored relational data together with data gathered from the crowd. In this paper we present Deco’s query processor, building on Deco’s data model and query language presented earlier. In general, it has been observed that query processing over crowdsourced data must contend with issues and tradeoffs involving cost, latency, and uncertainty that don’t arise in traditional query processing. Deco’s overall objective in query execution is to maximize parallelism while fetching data from the crowd (to keep latency low), but only when the parallelism will not issue too many tasks (which would increase cost). Meeting this objective requires a number of changes from traditional query execution. First, Deco’s query processor uses a hybrid execution model, which respects Deco semantics while enabling our objective. Our objective also requires prioritizing accesses to crowdsourced data, which turns out to be an interesting NP-hard problem. Finally, because Deco incorporates resolution functions to handle the uncertainty in crowdsourced data, query execution bears as much similarity to incremental view maintenance as to a traditional iterator model. The paper includes initial experimental results, focusing primarily on how our query execution model and access prioritization scheme maximize parallelism without increasing cost.
Uncertainty in crowd data sourcing under structural constraints
- In DASFAA Workshops
, 2014
"... ar ..."
(Show Context)