Partenaires

CNRS
Université Paul Sabatier



Rechercher

Sur ce site

Sur le Web du CNRS


Accueil du site > Charte statistique > CerCo guidelines for data analysis and statistics usage

CerCo guidelines for data analysis and statistics usage

Foreword

The goal of this page is to propose practical guidelines and to try to develop new statistical standards and methods for use in cognitive neuroscience and more generally in psychology and biomedical research. Its goal is certainly not to propose recipes or to tell researchers how they should work, but to clarify important issues about the misusage of statistics tools, which have devastating consequences on scientific progress. We hope it will serve as a reference page for researchers and students at CerCo, when writing papers and presenting results, as well as for researchers visiting the lab. This page will be indefinitely “under construction”, please visit it from time to time. You’re welcome to write to us for suggestions, improvement, discussion or even contestation.

Main statement : A ban on p-values and ‘statistical significance’. Prefer confidence intervals.

For decades we have been teaching students Null Hypothesis Significance Tests (NHST) statistics to analyze their data. These tools have yet been criticized since their inception (Meehl 1967), because they are only appropriate for taking decisions related to a Null hypothesis (like in manufacturing), not for making inferences about behavioral and neuronal processes (Kline 2004). But the times seem ripe for a radical change. In neuroscience, Ioannidis and colleagues revealed that most published studies were severely underpowered (Button et al. 2013, Nature Reviews Neuroscience 14) and, therefore, basically useless. Following Psychological Science that now embraces the “New Statistics” defended by Cumming (2013), CerCo encourages our community to publish Confidence Intervals (CI) of effect sizes instead of p-values. Such a small change of practice may foster revolutionary changes in the sociology of our science. At p = 0.05 the 95% CI of the effect size includes zero. The mathematics is the same but the story based upon one or the other presentation differs a lot. In many studies, CIs are often “embarrassingly large” (Cohen 1994) and, if shown, would have prevented publication. Precision is what we should strive for, not ill-named statistical “significance”. Precise inference or estimation may often require much more data, and therefore, fewer publications. On the other hand, insisting on precision erases the difference between formerly called “null” and “significant” results, and should help fighting against publication biases, file-drawers problems and even fraud. In 1998, Robert Matthews had written in the Sunday Telegraph : “The plain fact is that 70 years ago Ronald Fisher gave scientists a mathematical machine for turning baloney into breakthroughs, and flukes into funding. It is time to pull the plug.” CerCo wants to put an end to “our generations-long obsession with p values and the statistical buffoonery” (Lambdin, 2012).

Must read short list (papers that no scientist is supposed to ignore)

  • Meehl PE (1967) Theory-testing in psychology and physics : A methodological paradox. Philosophy of Science 34 : 103–115

  • Cohen J (1994) The earth is round (p<.05). American Psychologist 49 : 997-1003



  • Kline (2004). Beyond Significance Testing : Reforming Data Analysis Methods in Behavioral Research. Washington, D.C. : APA Books

  • Lambdin C (2012) Significance tests as sorcery : Science is empirical - significance tests are not. Theory & Psychology 22 : 67–90


  • Cumming G (2013) The new statistics : why and how. Psychological Science 25 : 7-29


Videos of Cumming’s pedagogical tutorial (2014 APS Annual Convention in San Francisco).

This page is written on behalf of CerCo and moderated by a CerCo scientific committee led by Jean-Michel Hupé and including Robin Baures, Benoit Cottereau and Muriel Mescam. It was first published in January 2015.
Please send messages to jean-michel.hupe@cerco.ups-tlse.fr.



Update (May 10th, 2016)

Authoritative criticism of p-values by the American Statistical Association

(Excerpt from Lauren Richardson in the Plos Biologue, March 16, 2016) :

“For the first time in its 177-year history, the American Statistical Association (ASA) has voiced its opinion and made specific recommendations for a statistical practice. The subject of their ire ? The (arguably) most common statistical output, the p-value. The p-value has long been the primary metric for demonstrating that study results are “statistically significant,” usually by achieving the semi-arbitrary value of p<0.05. However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Ronald L. Wasserstein & Nicole A. Lazar (2016) : The ASA’s statement on p-values : context, process, and purpose, The American Statistician, DOI:10.1080/00031305.2016.1154108.

What about multiple comparisons ?

The ASA indicated that they could not reach a consensus about the problem of multiple comparisons. This problem is especially critical for MRI and EEG data. For MRI, JMH discussed this problem here, together with many other (unresolved) problems with MRI analysis.

Should we go Bayesian ?

People interested in the answer should probably read this book by Kruschke. There won’t be any universal recipe, of course, but our impression so far if that Bayesian “Higher Density Intervals” may be a necessary alternative to frequentist “Confidence Intervals” only for complex analyses, when we can’t find any tractable statistical model. But for most of "us", users of t-tests and ANOVAs, CIs and HDI perform just the same, as long as the conditions of validity are carefully verified, of course. The advantage of CIs, in our opinion, is that they should engage to think of the data as “one possible sample among many others”, and therefore encourage replications and meta-studies, as advocated by Cumming. The other practical advantage is that we can keep using our familiar statistical tools. Bayesian factors, on the other hand, provide a valuable alternative only when you do have real prior information, which is rare in “our” field – unlike in clinical studies.

Mise à jour 11/05/2016