### SHOULD PSYCHOLOGY ABANDON p VALUES AND TEACH CIs INSTEAD? EVIDENCE-BASED REFORMS IN STATISICS EDUCATION

"... Several editorial and institutional interventions in psychology have aimed to improve statistical reporting in journals. These efforts have sought to de-emphasise statistical significance and encourage alternative analyses, especially effect sizes and confidence intervals (CIs), but the intervention ..."

Abstract
- Add to MetaCart

Several editorial and institutional interventions in psychology have aimed to improve statistical reporting in journals. These efforts have sought to de-emphasise statistical significance and encourage alternative analyses, especially effect sizes and confidence intervals (CIs), but the interventions to date have had short-lived and superficial impact—if any impact at all. I review some of these interventions in psychology and discuss possible reasons for lack of success. I give an inter-disciplinary context by discussing reform efforts in medicine—in which useful reform has already been achieved—and ecology. I then identify statistics education as the next major challenge for reformers, and report data on students ’ understanding of CIs, and difficulties they have making appropriate interpretation of CIs. I explain the need for further evidence on which to base improved statistics education in psychology. WHAT IS STATISTICAL REFORM? There is clear evidence that researchers in psychology have many serious misconceptions about null hypothesis significance testing (NHST; Oakes, 1986). Haller and Krauss (2002) confirmed that the problems persist, and are also exhibited even by many teachers of statistics in psychology. Nickerson (2000) provided a review of NHST problems and of efforts by reformers

### The Journal of Socio-Economics 33 (2004) 587–606 Mindless statistics

"... Statistical rituals largely eliminate statistical thinking in the social sciences. Rituals are indispensable for identification with social groups, but they should be the subject rather than the procedure of science. What I call the “null ritual ” consists of three steps: (1) set up a statistical nu ..."

Abstract
- Add to MetaCart

(Show Context)
Statistical rituals largely eliminate statistical thinking in the social sciences. Rituals are indispensable for identification with social groups, but they should be the subject rather than the procedure of science. What I call the “null ritual ” consists of three steps: (1) set up a statistical null hypothesis, but do not specify your own hypothesis nor any alternative hypothesis, (2) use the 5 % significance level for rejecting the null and accepting your hypothesis, and (3) always perform this procedure. I report evidence of the resulting collective confusion and fears about sanctions on the part of students and teachers, researchers and editors, as well as textbook writers. © 2004 Elsevier Inc. All rights reserved.

### UNDERSTANDING, TEACHING AND USING p VALUES

"... There are many problems with the p value. Is it an indicator of strength of evidence (Fisher), or only to be compared with � (Neyman-Pearson)? Many researchers and even statistics teachers have misconceptions about p, although p has been little studied, and we know little about how textbooks present ..."

Abstract
- Add to MetaCart

There are many problems with the p value. Is it an indicator of strength of evidence (Fisher), or only to be compared with � (Neyman-Pearson)? Many researchers and even statistics teachers have misconceptions about p, although p has been little studied, and we know little about how textbooks present it, and how researchers think about it, react to it, and use it in practice. The p value varies dramatically because of sampling variability, but textbooks do not mention this and researchers do not appreciate how widely it varies. I discuss the problems of p and advantages of confidence intervals, and identify research needed to guide the design of improved statistics education about p. I suggest the most promising teaching approach may be to focus throughout on estimation, use confidence intervals wherever possible, give p only a minor role, and explain p mainly as indicating where the confidence interval falls in relation to the null hypothesised value. Many disciplines rely on the p value to draw conclusions, yet p is often misunderstood and poorly used. It is at the heart of research, so it is surprising and disappointing how little it has been studied. We know little about how researchers think and feel about p, and little about how textbooks explain p and how that relates to what researchers do. The very large variation in p over replication is not widely appreciated, or mentioned in textbooks. I discuss these problems of p, and

### USING DATA TO MAKE SENSE OF STATISTICS: THE ROLE OF TECHNOLOGY IN SCAFFOLDING UNDERSTANDING

"... Research and classroom experience identify topics with which students in introductory statistics struggle such as interpreting box plots, standard deviation or z-scores and the normal curve. One reason is that many core statistical concepts are subtle and difficult to sort out. Dynamic interactive t ..."

Abstract
- Add to MetaCart

(Show Context)
Research and classroom experience identify topics with which students in introductory statistics struggle such as interpreting box plots, standard deviation or z-scores and the normal curve. One reason is that many core statistical concepts are subtle and difficult to sort out. Dynamic interactive technology can provide opportunities for learners to begin to make sense of these concepts by enabling them to generate large amounts of data, explore distributions, examine probability models and investigate the nuances that often seem to obscure reasoning and sense making in statistics. Interactive technology allows learners, using real and motivating data that stem from questions about ways of reasoning in statistics, to move between representations, looking for patterns and generating models related to hypotheses and to informed decision making.

### THE USE OF A HIERARCHICAL CONSTRUCT TO INVESTIGATE STUDENTS’ LEARNING OF INFERENTIAL STATISTICS

"... At present, there is still a need for more research in the teaching and learning of inferential statistics because of the limitedness of literature in this area of statistics education. Moreover, there is continuing evidence of students ’ partial or unsuccessful learning of many aspects of inferenti ..."

Abstract
- Add to MetaCart

(Show Context)
At present, there is still a need for more research in the teaching and learning of inferential statistics because of the limitedness of literature in this area of statistics education. Moreover, there is continuing evidence of students ’ partial or unsuccessful learning of many aspects of inferential statistics. This is one of the concerns brought to attention in my postgraduate research whereby part of my work involved the development of a hierarchical construct to identify the different levels of students ’ learning of inferential statistics. This paper particularly discusses the use of this hierarchical construct to investigate the learning of inferential statistics among students.

### 2015 © The Author(s) & Dept. of Mathematical Sciences-The University of Montana Risk as an Explanatory Factor for Researchers ’ Inferential Interpretations

"... Abstract: Logical reasoning is crucial in science, but we know that this is not something that humans are innately good at. It becomes even harder to reason logically about data when there is uncertainty, because there is always a chance of being wrong. Dealing with uncertainty is inevitable, for ex ..."

Abstract
- Add to MetaCart

Abstract: Logical reasoning is crucial in science, but we know that this is not something that humans are innately good at. It becomes even harder to reason logically about data when there is uncertainty, because there is always a chance of being wrong. Dealing with uncertainty is inevitable, for example, in situations in which the evaluation of sample outcomes with respect to some population is required. Inferential statistics is a structured way of reasoning rationally about such data. One could therefore expect that using well-known statistical techniques protects its users against misinterpretations regarding uncertainty. Unfortunately, this does not seem to be the case. Researchers often pretend to be too certain about the presence or absence of an effect, and data are analysed in a selective way, which impacts the validity of conclusions that can be drawn from the techniques that are used. In this paper, the concept of risk is used to explain why unwanted behaviour may not be as unreasonable as it seems, once the risks that researchers face are taken into account.

### Informal Inferential Reasoning: a Computer-based Training Environment

"... The logic behind statistical inference is difficult for students to understand. Recent research in statistics education focuses on learners ’ informal and intuitive ideas of inferential reasoning rather than on the mastery of formal mathematical procedures. We introduce a computer-based training env ..."

Abstract
- Add to MetaCart

(Show Context)
The logic behind statistical inference is difficult for students to understand. Recent research in statistics education focuses on learners ’ informal and intuitive ideas of inferential reasoning rather than on the mastery of formal mathematical procedures. We introduce a computer-based training environment (a “data game“) to shape intuition for inference in the context of a particular type of statistical decision problem. In change-point detection tasks, one must decide if a process is running smoothly or if it is out of control. In our data game a mechanism produces data sequentially with a likely built-in shift in location at random time. The task for the students is to detect if and when the mechanism has changed the level of produced data as early as possible but without raising false alarms. The data game is embedded in a data analysis environment. We discuss the relation between change point detection and informal inference, the game itself as well as its relation to inferential reasoning.

### TEACHING HYPOTHESIS TESTING: A NECESSARY CHALLENGE

"... The last decades a debate has arisen about the use of hypothesis testing. This has led some teachers to think that confidence intervals and effect sizes need to be taught instead of formal hypothesis testing with p-values. Although we see shortcomings of the use of p-values in statistical inferences ..."

Abstract
- Add to MetaCart

The last decades a debate has arisen about the use of hypothesis testing. This has led some teachers to think that confidence intervals and effect sizes need to be taught instead of formal hypothesis testing with p-values. Although we see shortcomings of the use of p-values in statistical inferences and the difficulties in really understanding hypothesis tests, we take a different view. We think that it is essential to understand what the fundamental principles are behind hypothesis testing in order to obtain correct statistical inference by interpreting confidence intervals (and p-values). In our course “Applied Statistics ” for graduate students we designed course material in which we explain the three main approaches of hypothesis testing, Fisher, Neyman-Pearson and Bayesian, using a popular chance game as illustration. In this paper, we will shortly present the highlights of the course material, the results of the evaluation of our teaching, and suggestions for extensions.

### Editorial Beyond the Significance Test Ritual What Is There?

"... The mindless use of null-hypothesis significance testing – the significance test ritual (e.g., Salsburg, 1985) – has long been criticized. The main component of the ritual can be characterized as follows: Once you have collected your da-ta, try to refute your null hypothesis (e.g., no mean differ-e ..."

Abstract
- Add to MetaCart

(Show Context)
The mindless use of null-hypothesis significance testing – the significance test ritual (e.g., Salsburg, 1985) – has long been criticized. The main component of the ritual can be characterized as follows: Once you have collected your da-ta, try to refute your null hypothesis (e.g., no mean differ-ence, zero correlation, etc.) in an automatized manner. Of-ten the ritual is complemented by the “star procedure”: If p <.05, assign one star to your results (*), if p <.01 give two stars (**), and if p <.001 you have earned yourself three stars (***). If you have obtained at least one star, the ritual has been successfully performed; if not, your results are not worth much. The stars, or the corresponding numer-ical values, have been door-openers to prestigious psychol-ogy journals and, therefore, the ritual has received strong reinforcement. The ritual does not have a firm theoretical grounding; it seems to have arisen as a badly understood hybrid mixture of the approaches of Ronald A. Fisher, Jerzy Neyman, Egon S. Pearson, and (at least in some variations of the