Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy
 Psychological Methods
, 2000
"... Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other ..."
Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data. Null hypothesis statistical testing (NHST1) is arguably the most widely used method of analysis of data collected in psychological experiments and has been so for about 70 years. One might think that a method that had been embraced by an entire research community would be well understood and noncontroversial after many decades of constant use. However, NHST is very controversial.2 Criticism of the method, which essentially began with the introduction of the technique (Pearce, 1992), has waxed and waned over the years; it has been intense in the recent past. Apparently, controversy regarding the idea of NHST more generally extends back more than two and a half
What future quantitative social science research could look like: Confidence intervals for effect sizes
 Educational Researcher
, 2002
"... presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.” ..."
presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.”
The insignificance of statistical significance testing.
 Journal of Wildlife Management,
, 1999
"... Abstract: Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are o ..."
Abstract: Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of Pvalues, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
How to estimate and interpret various effect sizes
 Journal of Counseling Psychology
, 2004
"... The present article presents a tutorial on how to estimate and interpret various effect sizes. The 5th edition ..."
The present article presents a tutorial on how to estimate and interpret various effect sizes. The 5th edition
Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests
 Psychological Methods
, 2001
"... Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes a ..."
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and causeeffect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. The longstanding controversy surrounding null hypothesis statistical testing (NHST) has typically been argued on its technical merits, and they are not dis
Significance Tests Harm Progress in Forecasting
 International Journal of Forecasting
, 2007
"... Based on a summary of prior literature, I conclude that tests of statistical significance harm scientific progress. Efforts to find exceptions to this conclusion have, to date, turned up none. Even when done correctly, significance tests are dangerous. I show that summaries of scientific research do ..."
Based on a summary of prior literature, I conclude that tests of statistical significance harm scientific progress. Efforts to find exceptions to this conclusion have, to date, turned up none. Even when done correctly, significance tests are dangerous. I show that summaries of scientific research do not require tests of statistical significance. I illustrate the dangers of significance tests by examining an application to the M3Competition. Although the authors of that reanalysis conducted a proper series of statistical tests, they suggest that the original M3 was not justified in concluding that combined forecasts reduce errors and that the selection of the best method is dependent upon the selection of a proper error measure. I show that the original conclusions were justified and that they are correct. Authors should try to avoid tests of statistical significance, journals should discourage them, and readers should ignore them. Instead, to analyze and communicate findings from empirical studies, one should use effect sizes, confidence intervals, replications/extensions, and metaanalyses.
The epistemology of mathematical and statistical modeling: A quiet methodological revolution
 American Psychologist
, 2010
"... A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at leas ..."
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rulebased framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models.
Typology of analytical and interpretational errors in quantitative and qualitative educational research
, 2003
"... The purpose of this paper is to identify and to discuss major analytical and interpretational errors that occur regularly in quantitative and qualitative educational research. A comprehensive review of the literature discussing various problems was conducted. With respect to quantitative data analys ..."
The purpose of this paper is to identify and to discuss major analytical and interpretational errors that occur regularly in quantitative and qualitative educational research. A comprehensive review of the literature discussing various problems was conducted. With respect to quantitative data analyses, common analytical and interpretational misconceptions are presented for dataanalytic techniques representing each major member of the general linear model, including hierarchical linear modeling. Common errors associated with many of these approaches include (a) no evidence provided that statistical assumptions were checked; (b) no power/sample size considerations discussed; (c) inappropriate treatment of multivariate data; (d) use of stepwise procedures; (e) failure to report reliability indices for either previous or present samples; (f) no control for Type I error rate; and (g) failure to report effect sizes. With respect to qualitative research studies, the most common errors are failure to provide evidence for judging the dependability (i.e., reliability) and credibility (i.e., validity) of findings, generalizing findings beyond the sample, and failure to
Statistical significance and effect size reporting: Portrait of a possible future
 Research in the Schools
, 1998
"... The present paper comments on the matters raised regarding statistical significance tests by three sets of authors in this issue. These articles are placed within the context of contemporary literature. Next, additional empirical evidence is cited showing that the APA publication manual's " ..."
The present paper comments on the matters raised regarding statistical significance tests by three sets of authors in this issue. These articles are placed within the context of contemporary literature. Next, additional empirical evidence is cited showing that the APA publication manual's "encouraging " effect size reporting has had no appreciable effect. Editorial policy will be required to affect change, and some model policies are quoted. Science will move forward to the extent that both effect size and replicability evidence of one or more sorts are finally seriously considered within our inquiry. I appreciate the opportunity to comment on matters raised by Daniel (1998), McLean and Ernest (1998), and Nix and Barnette (1998) as regards statistical significance tests. Theme issues of journals such as the present one (see also Thompson (1993)) allow various perspectives to be articulated and help slowly but inexorably move the field toward improved practices. Of course, an important recent book (Harlow, Mulaik, & Steiger, 1997) also presents diverse perspectives regarding these continuing controversies (for reviews see
What if there were no more bickering about statistical significance tests
 RESEARCH IN THE SCHOOLS
, 1998
"... Questions and concerns are directed to those who advocate replacing statistical hypothesis testing with alternative dataanalysis strategies. It is further suggested that: (1) commonly recommended hypothesistesting alternatives are anything but perfect, especially when allowed to stand alone witho ..."
Questions and concerns are directed to those who advocate replacing statistical hypothesis testing with alternative dataanalysis strategies. It is further suggested that: (1) commonly recommended hypothesistesting alternatives are anything but perfect, especially when allowed to stand alone without an accompanying inferential filtering device; (2) various hypothesistesting modifications can be implemented to make the hypothesistesting process and its associated conclusions more credible; and (3) hypothesis testing, when implemented intelligently, adds importantly to the storytelling function of a published empirical research investigation. From the local pubs to our professional "pubs, " everyone in socialscience academic circles seems to be talking about it these days. Not that there's anything wrong with talking about it, mind you, even to a more practically oriented crowd such as the readership of this journal. But as with the "gates " of Washington politics on the one coast and the Gates of Washington state on the other, when do we stand up and say "Enough already!"? When do we decide that ample arguments