Results 1  10
of
1,124
Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models
"... This paper identifies several serious problems with the widespread use of ANOVAs for the analysis of categorical outcome variables such as forcedchoice variables, questionanswer accuracy, choice in production (e.g. in syntactic priming research), et cetera. I show that even after applying the arc ..."
Abstract

Cited by 252 (7 self)
 Add to MetaCart
This paper identifies several serious problems with the widespread use of ANOVAs for the analysis of categorical outcome variables such as forcedchoice variables, questionanswer accuracy, choice in production (e.g. in syntactic priming research), et cetera. I show that even after applying the arcsinesquareroot transformation to proportional data, ANOVA can yield spurious results. I discuss conceptual issues underlying these problems and alternatives provided by modern statistics. Specifically, I introduce ordinary logit models (i.e. logistic regression), which are wellsuited to analyze categorical data and offer many advantages over ANOVA. Unfortunately, ordinary logit models do not include random effect modeling. To address this issue, I describe mixed logit models (Generalized Linear Mixed Models for binomially distributed outcomes, Breslow & Clayton, 1993), which combine the advantages of ordinary logit models with the ability to account for random subject and item effects in one step of analysis. Throughout the paper, I use a psycholinguistic data set to compare the different statistical methods.
multcomp: Simultaneous Inference in General Parametric Models,
, 2008
"... Abstract Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the prespecified significance level. Simultaneous inference procedures have to be ..."
Abstract

Cited by 234 (6 self)
 Add to MetaCart
Abstract Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the prespecified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth * This is a preprint of an article published in
Random effects structure for confirmatory hypothesis testing: Keep it maximal.
 Journal of Memory and Language,
, 2013
"... Abstract Linear mixedeffects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, there is currently little understanding of how different random effects structures affect generalizability. Here, we argue that researchers using LMEMs for confirmatory h ..."
Abstract

Cited by 151 (5 self)
 Add to MetaCart
(Show Context)
Abstract Linear mixedeffects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, there is currently little understanding of how different random effects structures affect generalizability. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. In contrast, LMEMs including the maximal random
Analyzing “visual world” eyetracking data using multilevel logistic regression.
 Journal of Memory and Language,
, 2008
"... Abstract A new framework is offered that uses multilevel logistic regression (MLR) to analyze data from 'visual world' eyetracking experiments used in psycholinguistic research. The MLR framework overcomes some of the problems with conventional analyses, making it possible to incorporate ..."
Abstract

Cited by 71 (2 self)
 Add to MetaCart
(Show Context)
Abstract A new framework is offered that uses multilevel logistic regression (MLR) to analyze data from 'visual world' eyetracking experiments used in psycholinguistic research. The MLR framework overcomes some of the problems with conventional analyses, making it possible to incorporate time as a continuous variable and gaze location as a categorical dependent variable. The multilevel approach minimizes the need for data aggregation and thus provides a more statistically powerful approach. With MLR, the researcher builds a mathematical model of the overall response curve that separates the response into different temporal components. The researcher can test hypotheses by examining the impact of independent variables and their interactions on these components. A worked example using MLR is provided.
Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
 Journal of Eye Movement Research
, 2008
"... The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers ’ eye mov ..."
Abstract

Cited by 53 (15 self)
 Add to MetaCart
(Show Context)
The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers ’ eye movements, the Potsdam Sentence Corpus. A linear mixedeffects model was used to quantify the effect of surprisal while taking into account unigram frequency and bigram frequency (transitional probability), word length, and empiricallyderived word predictability; the socalled “early ” and “late ” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically nonsignificant effect on empiricallyderived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of postsyntactic events may be difficult to uphold.
Analyzing reaction times
 International Journal of Psychological Research
, 2010
"... Reaction times (rts) are an important source of information in experimental psychology. Classical methodological considerations pertaining to the statistical analysis of rt data are optimized for analyses of aggregated data, based on subject or item means (c.f., Forster & Dickinson, 1976). Mixe ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
(Show Context)
Reaction times (rts) are an important source of information in experimental psychology. Classical methodological considerations pertaining to the statistical analysis of rt data are optimized for analyses of aggregated data, based on subject or item means (c.f., Forster & Dickinson, 1976). Mixedeffects modeling (see, e.g., Baayen, Davidson, & Bates, 2008) does not require prior aggregation and allows the researcher the more ambitious goal of predicting individual responses. Mixedmodeling calls for a reconsideration of the classical methodological strategies for analysing rts. In this study, we argue for empirical flexibility with respect to the choice of transformation for the rts. We advocate minimal apriori data trimming, combined with model criticism. We also show how trialtotrial, longitudinal dependencies between individual observations can be brought into the statistical model. These strategies are illustrated for a large dataset with a nontrivial randomeffects structure. Special attention is paid to the evaluation of interactions involving fixedeffect factors that partition the levels sampled by randomeffect factors.
Is syntactic knowledge probabilistic? experiments with the english dative alternation
 Linguistics in search of its evidential base, Studies in Generative Grammar. Mouton de Gruyter
, 2006
"... Theoretical linguistics traditionally relies on linguistic intuitions such as grammaticality judgments for data. But the massive growth of language technologies has made the spontaneous use of language in natural settings a rich and easily accessible alternative source of data. Moreover, studies of ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
Theoretical linguistics traditionally relies on linguistic intuitions such as grammaticality judgments for data. But the massive growth of language technologies has made the spontaneous use of language in natural settings a rich and easily accessible alternative source of data. Moreover, studies of usage as well as intuitive judgments have shown that linguistic intuitions of grammaticality are deeply flawed, because (1) they seriously underestimate the space of grammatical possibility by ignoring the effects of multiple conflicting formal, semantic, and contextual constraints, and (2) they may reflect probability instead of grammaticality. Both of these points are richly exemplified by studies of the English dative alternation (Green 1971; Gries
Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box
 Journal of Statistical Software
, 2009
"... Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: choice of predictors, models, and transformations for chained imputation models; standard and binned resi ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: choice of predictors, models, and transformations for chained imputation models; standard and binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates stateoftheart diagnostics that can be applied more generally and can be incorporated into the software of others.
Task Search in a Human Computation Market
, 2010
"... In order to understand how a labor market for human computation functions, it is important to know how workers search for tasks. This paper uses two complementary methods to gain insight into how workers search for tasks on Mechanical Turk. First, we perform a high frequency scrape of 36 pages of se ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
In order to understand how a labor market for human computation functions, it is important to know how workers search for tasks. This paper uses two complementary methods to gain insight into how workers search for tasks on Mechanical Turk. First, we perform a high frequency scrape of 36 pages of search results and analyze it by looking at the rate of disappearance of tasks across key ways Mechanical Turk allows workers to sort tasks. Second, we present the results of a survey in which we paid workers for selfreported information about how they search for tasks. Our main findings are that on a large scale, workers sort by which tasks are most recently posted and which have the largest number of tasks available. Furthermore, we find that workers look mostly at the first page of the most recently posted tasks and the first two pages of the tasks with the most available instances but in both categories the position on the result page is unimportant to workers. We observe that at least some employers try to manipulate the position of their task in the search results to exploit the tendency to search for recently posted tasks. On an individual level, we observed workers searching by almost all the possible categories and looking more than 10 pages deep. For a task we posted to Mechanical Turk, we confirmed that a favorable position in the search results do matter: our task with favorable positioning was completed 30 times faster and for less money than when its position was unfavorable.