Results 1 - 10
of
1,147
Financial incentives and the “performance of crowds
- Proc. HCOMP ’09
"... The relationship between financial incentives and performance, long of interest to social scientists, has gained new relevance with the advent of web-based “crowd-sourcing ” models of production. Here we investigate the effect of compensation on performance in the context of two experiments, conduct ..."
Abstract
-
Cited by 192 (3 self)
- Add to MetaCart
(Show Context)
The relationship between financial incentives and performance, long of interest to social scientists, has gained new relevance with the advent of web-based “crowd-sourcing ” models of production. Here we investigate the effect of compensation on performance in the context of two experiments, conducted on Amazon’s Mechanical Turk (AMT). We find that increased financial incentives increase the quantity, but not the quality, of work performed by participants, where the difference appears to be due to an “anchoring ” effect: workers who were paid more also perceived the value of their work to be greater, and thus were no more motivated than workers paid less. In contrast with compensation levels, we find the details of the compensation scheme do matter—specifically, a “quota ” system results in better work for less pay than an equivalent “piece rate ” system. Although counterintuitive, these findings are consistent with previous laboratory studies, and may have real-world analogs as well.
Random effects structure for confirmatory hypothesis testing: Keep it maximal.
- Journal of Memory and Language,
, 2013
"... Abstract Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, there is currently little understanding of how different random effects structures affect generalizability. Here, we argue that researchers using LMEMs for confirmatory h ..."
Abstract
-
Cited by 151 (5 self)
- Add to MetaCart
(Show Context)
Abstract Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, there is currently little understanding of how different random effects structures affect generalizability. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. In contrast, LMEMs including the maximal random
Stochastic Variational Inference
- JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract
-
Cited by 131 (27 self)
- Add to MetaCart
(Show Context)
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
Standardizing the world income inequality database. Social Science Quarterly, 90(2): 231–242. SWIID Version 3.1, December 2011. Retrieved April 21, 2012, from http://www.siuc.edu/~fsolt/swiid/swiid.html
- American Economic Review
, 2009
"... Objective. Cross-national research on the causes and consequences of income inequality has been hindered by the limitations of exist-ing inequality datasets: greater coverage across countries and over time is available from these sources only at the cost of significantly reduced comparability across ..."
Abstract
-
Cited by 90 (2 self)
- Add to MetaCart
(Show Context)
Objective. Cross-national research on the causes and consequences of income inequality has been hindered by the limitations of exist-ing inequality datasets: greater coverage across countries and over time is available from these sources only at the cost of significantly reduced comparability across observations. The goal of the Standard-ized World Income Inequality Database (SWIID) is to overcome these limitations. Methods. A custom missing-data algorithm was used to standardize the United Nations University’s World Income Inequality Database; data collected by the Luxembourg Income Study served as the standard. Results. The SWIID provides comparable Gini indices of gross and net income inequality for 153 countries for as many years as possible from 1960 to the present along with estimates of uncer-tainty in these statistics. Conclusions. By maximizing comparability for the largest possible sample of countries and years, the SWIID is better suited to broadly cross-national research on income inequality than previously available sources. ∗For helpful comments, I am grateful to Stephen Bloom, Mariola Espinosa, and the anonymous reviewers. The SWIID data, along with replication materials, are available at my website:
A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases
- In Proceedings of the First Workshop on Social Media Analytics, SOMA ’10
"... Political scientists lack methods to efficiently measure the priorities political actors empha-size in statements. To address this limitation, I introduce a statistical model that attends to the structure of political rhetoric when measuring expressed priorities: statements are naturally organized b ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
(Show Context)
Political scientists lack methods to efficiently measure the priorities political actors empha-size in statements. To address this limitation, I introduce a statistical model that attends to the structure of political rhetoric when measuring expressed priorities: statements are naturally organized by author. The expressed agenda model exploits this structure to simultaneously estimate the topics in the texts, as well as the attention political actors allocate to the estimated topics. I apply the method to a collection of over 64,000 press releases from senators from 2005-2007, which I demonstrate is an ideal medium to measure how senators explain their work in Washington to constituents. A set of examples validates the estimated priorities and demonstrates that the additional information included in the model provides better classifica-tion than expert human coders or statistical models for clustering that ignore the author of a document. The statistical model and its extensions will be made available in a forthcoming free software package for the R computing language and the press release data will be made available for download. ∗PhD Candidate, Harvard University Department of Government. I thank the Center for American Political Studies
Cultural cognition of scientific consensus.
- Journal of Risk Research, September,
, 2010
"... Abstract Why do members of the public disagree-sharply and persistently-about facts on which expert scientists largely agree? We designed a study to test a distinctive explanation: the cultural cognition of scientific consensus. The "cultural cognition of risk" refers to the tendency of i ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Abstract Why do members of the public disagree-sharply and persistently-about facts on which expert scientists largely agree? We designed a study to test a distinctive explanation: the cultural cognition of scientific consensus. The "cultural cognition of risk" refers to the tendency of individuals to form risk perceptions that are congenial to their values. The study presents both correlational and experimental evidence confirming that cultural cognition shapes individuals' beliefs about the existence of scientific consensus, and the process by which they form such beliefs, relating to climate change, the disposal of nuclear wastes, and the effect of permitting concealed possession of handguns. The implications of this dynamic for science communication and public policy-making are discussed.
Struggles with Survey Weighting and Regression Modeling
- Statistical Science
, 2007
"... Abstract. The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models ca ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems. Multilevel modeling, poststratification, sam-Key words and phrases:
Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
- Journal of Eye Movement Research
, 2008
"... The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers ’ eye mov ..."
Abstract
-
Cited by 53 (15 self)
- Add to MetaCart
(Show Context)
The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers ’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram frequency and bigram frequency (transitional probability), word length, and empirically-derived word predictability; the so-called “early ” and “late ” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.
Yes, But What’s the Mechanism? (Don’t Expect an Easy Answer)
"... Psychologists increasingly recommend experimental analysis of mediation. This is a step in the right direction because mediation analyses based on nonexperimental data are likely to be biased and because experiments, in principle, provide a sound basis for causal inference. But even experiments cann ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
(Show Context)
Psychologists increasingly recommend experimental analysis of mediation. This is a step in the right direction because mediation analyses based on nonexperimental data are likely to be biased and because experiments, in principle, provide a sound basis for causal inference. But even experiments cannot overcome certain threats to inference that arise chiefly or exclusively in the context of mediation analysis—threats that have received little attention in psychology. The authors describe 3 of these threats and suggest ways to improve the exposition and design of mediation tests. Their conclusion is that inference about mediators is far more difficult than previous research suggests and is best tackled by an experimental research program that is specifically designed to address the challenges of mediation analysis.
Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists
- BIOL REV CAMB PHILOS SOC 85:935–956
, 2010
"... Repeatability (more precisely the common measure of repeatability, the intra-class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between-subject ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
(Show Context)
Repeatability (more precisely the common measure of repeatability, the intra-class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between-subject (or between-group) variation. As a consequence, the non-repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non-Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non-Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation-based, analysis of variance (ANOVA)-based and linear mixed-effects model (LMM)-based methods, while for non-Gaussian data, we focus on generalised linear mixed-effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM-