Results 1  10
of
54
Inference by eye: Confidence intervals and how to read pictures of data
 American Psychologist
, 2005
"... Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs a ..."
Abstract

Cited by 118 (14 self)
 Add to MetaCart
Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs and statistical significance testing. They propose 7 rules of eye to guide the inferential use of figures with error bars. These include general principles: Seek bars that relate directly to effects of interest, be sensitive to experimental design, and interpret the intervals. They also include guidelines for inferential interpretation of the overlap of CIs on independent group means. Wider use of interval estimation in psychology has the potential to improve research communication substantially. Inference by eye is the interpretation of graphically presented data. On first seeing Figure 1, what questions should spring to mind and what inferences are justified? We discuss figures with means and confidence intervals (CIs), and propose rules of eye to guide the interpretation of such figures. We believe it is timely to consider inference by eye because psychologists are now being encouraged to make greater use of CIs. Many who seek reform of psychologists ’ statistical practices advocate a change in emphasis from null hypothesis significance testing (NHST) to CIs, among other techniques
Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy
 Psychological Methods
, 2000
"... Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other ..."
Abstract

Cited by 97 (0 self)
 Add to MetaCart
(Show Context)
Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data. Null hypothesis statistical testing (NHST1) is arguably the most widely used method of analysis of data collected in psychological experiments and has been so for about 70 years. One might think that a method that had been embraced by an entire research community would be well understood and noncontroversial after many decades of constant use. However, NHST is very controversial.2 Criticism of the method, which essentially began with the introduction of the technique (Pearce, 1992), has waxed and waned over the years; it has been intense in the recent past. Apparently, controversy regarding the idea of NHST more generally extends back more than two and a half
What future quantitative social science research could look like: Confidence intervals for effect sizes
 Educational Researcher
, 2002
"... presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.” ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.”
The insignificance of statistical significance testing.
 Journal of Wildlife Management,
, 1999
"... Abstract: Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are o ..."
Abstract

Cited by 92 (0 self)
 Add to MetaCart
(Show Context)
Abstract: Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of Pvalues, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests
 Psychological Methods
, 2001
"... Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes a ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
(Show Context)
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and causeeffect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. The longstanding controversy surrounding null hypothesis statistical testing (NHST) has typically been argued on its technical merits, and they are not dis
Bilingualism, biliteracy, and learning to read: interactions among languages and writing systems
 Scientific Studies of Reading
, 2005
"... Four groups of children in first grade were compared on early literacy tasks. Children in three of the groups were bilingual, each group representing a different combination of language and writing system, and children in the fourth group were monolingual speakers of English. All the bilingual child ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Four groups of children in first grade were compared on early literacy tasks. Children in three of the groups were bilingual, each group representing a different combination of language and writing system, and children in the fourth group were monolingual speakers of English. All the bilingual children used both languages daily and were learning to read in both languages. The children solved decoding and phonological awareness tasks, and the bilinguals completed all tasks in both languages. Initial differences between the groups in factors that contribute to early literacy were controlled in an analysis of covariance, and the results showed a general increment in reading ability for all the bilingual children but a larger advantage for children learning two alphabetic systems. Similarly, bilinguals transferred literacy skills across languages only when both languages were written in the same system. Therefore, the extent of the bilingual facilitation for early reading depends on the relation between the two languages and writing systems. Learning to read is indisputably the premier academic achievement of early schooling. It prepares children for their educational futures and is the key to the possibilities that their futures hold for them. Thus, if knowing two languages at the time that literacy is introduced, or learning to read in a language that is not the child’s dominant one, or acquiring literacy simultaneously in two languages affects the outcome of literacy instruction, then it would be important to know that. These possibilities affect a sizable portion of the world’s children: A significant number are bilingual at the time they begin reading, many are instructed in a language they do not speak at home, and some number of those are expected to acquire this skill in two languages. Requests for reprints should be sent to Ellen Bialystok, Department of Psychology, York University,
Sample Size for Multiple Regression: Obtaining Regression Coefficients That Are Accurate, Not Simply Significant
"... An approach to sample size planning for multiple regression is presented that emphasizes accuracy in parameter estimation (AIPE). The AIPE approach yields precise estimates of population parameters by providing necessary sample sizes in order for the likely widths of confidence intervals to be suffi ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
An approach to sample size planning for multiple regression is presented that emphasizes accuracy in parameter estimation (AIPE). The AIPE approach yields precise estimates of population parameters by providing necessary sample sizes in order for the likely widths of confidence intervals to be sufficiently narrow. One AIPE method yields a sample size such that the expected width of the confidence interval around the standardized population regression coefficient is equal to the width specified. An enhanced formulation ensures, with some stipulated probability, that the width of the confidence interval will be no larger than the width specified. Issues involving standardized regression coefficients and random predictors are discussed, as are the philosophical differences between AIPE and the power analytic approaches to sample size planning. Sample size estimation from a power analytic perspective is often performed by mindful researchers in order to have a reasonable probability of obtaining parameter estimates that are statistically significant. In general, the social sciences have slowly become more aware of the problems associated with underpowered studies and their corresponding Type II errors, which can yield misleading results in a given
Methods for the Behavioral, Educational, and Social Sciences (MBESS) [Computer software and manual]. Retrievable from www.cran.rproject.org
, 2007
"... package for R (R Development Core Team, 2007b), an open source statistical programming language and environment. MBESS implements methods that are not widely available elsewhere, yet are especially helpful for the idiosyncratic techniques used within the behavioral, educational, and social sciences. ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
package for R (R Development Core Team, 2007b), an open source statistical programming language and environment. MBESS implements methods that are not widely available elsewhere, yet are especially helpful for the idiosyncratic techniques used within the behavioral, educational, and social sciences. The major categories of functions are those that relate to confidence interval formation for noncentral t, F, and � 2 parameters, confidence intervals for standardized effect sizes (which require noncentral distributions), and sample size planning issues from the power analytic and accuracy in parameter estimation perspectives. In addition, MBESS contains collections of other functions that should be helpful to substantive researchers and methodologists. MBESS is a longterm project that will continue to be updated and expanded so that important methods can continue to be made available to researchers in the behavioral, educational, and social sciences. R is an open source statistical programming language and environment for (essentially) all operating systems that has gained a widespread following in quantitative disciplines (R Development Core Team, 2007b). This following is perhaps most prevalent in the statistical sciences, where many published works now provide R routines
Problems with Null Hypothesis Significance Testing (NHST): What Do the Text Book Say?”, The
 Journal of Experimental Education
, 2002
"... ABSTRACT. The first of 3 objectives in this study was to address the major problem with Null Hypothesis Significance Testing (NHST) and 2 common misconceptions related to NHST that cause confusion for students and researchers. The misconceptions are (a) a smaller p indicates a stronger relationship ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT. The first of 3 objectives in this study was to address the major problem with Null Hypothesis Significance Testing (NHST) and 2 common misconceptions related to NHST that cause confusion for students and researchers. The misconceptions are (a) a smaller p indicates a stronger relationship and (b) statistical significance indicates practical importance. The second objective was to determine how this problem and the misconceptions were treated in 12 recent textbooks used in education research methods and statistics classes. The third objective was to examine how the textbooks ’ presentations relate to current best practices and how much help they provide for students. The results show that almost all of the textbooks fail to acknowledge that there is controversy surrounding NHST. Most of the textbooks dealt, at least minimally, with the alleged misconceptions of interest, but they provided relatively little help for students. Key words: effect size, NHST, practical importance, research and statistics textbooks THERE HAS BEEN AN INCREASE in resistance to null hypothesis significance testing (NHST) in the social sciences during recent years. The intensity of these objections to NHST has increased, especially within the disciplines of psy
Measuring Progress towards a Goal: Estimating Teacher Productivity using a Multivariate Multilevel Model for ValueAdded Analysis
 Sociological Methods and Research
, 2001
"... This paper develops a procedure for measuring how much is gained, and at what precision, by students in a pretest and posttest situation against a target score on the posttest. We define our productivity index, M j , for teacher j as the ratio of estimated gains to an estimated standard that is t ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
This paper develops a procedure for measuring how much is gained, and at what precision, by students in a pretest and posttest situation against a target score on the posttest. We define our productivity index, M j , for teacher j as the ratio of estimated gains to an estimated standard that is the distance between an estimate of the pretest score and the target score. Using language, mathematics, and reading scores on the SAT 9 for 1999 and 2000 from 75 public elementary classrooms (grades 3, 4, 5, and 6 in 2000), we employ a Bayesian implementation of a multivariate mixed model for repeated test scores from individual students who in turn are nested within teachers. Our analysis point to statistically significant gains on the whole for grades 3, 4, and 6. The strength of the approach lies in a straightforward estimation of the productivity index. Using the simulated sampling distribution of the posterior mean of the productivity index, we introduce a fuller depiction of progress in the productivity curve, or productivity profile, by calculating the probability that the index exceeds set proportions of the estimated standard. The basic model employed in this study thus contributes three essential components for sound accountability decisions. First, it estimates correlated measurement errors when using multiple measures. In doing so, we take full advantage of the informational redundancy in the measures. Second, it estimates initial status and valueadded gains simultaneously. Lastly, it proposes a productivity index along with new procedures for representing the uncertainty in individual productivity estimates in the form of a productivity profile. This approach also facilitates a Bayesian e#ectsize analysis free from frequentist appeals to noncentral t or F d...