I am thrilled to report that, after considerable discussion, the University of Amsterdam has agreed to ban the teaching of p-values for first-year students at the Faculty of Social and Behavioural Sciences. The new policy will take effect at the start of the new academic year, and its introduction was facilitated by the fact that I am taking over from Ingmar Visser as Director of Teaching at the Psychology Department starting September 1st 2024.
I have discussed the drawbacks of p-values many times in this blog. For students, the main problem with the p-value is that it is a difficult measure to understand, since it does not map on to anything that a reasonable person would wish to learn. In the immortal words of Harold Jeffreys (1939, p. 316; 1961, p. 385):
What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred. This seems a remarkable procedure.
And Sellke, Bayarri, and Berger (2001, p. 71) conclude
(…) for testing “precise” hypotheses, p values should not be used directly, because they are too easily misinterpreted. The standard approach in teaching—of stressing the formal definition of a p value while warning against its misinterpretation—has simply been an abysmal failure.
My own view on this: p-values can be useful in the sense that they correlate with evidence (i.e., the degree to which the data change the plausibility of the hypotheses under consideration). So if you test 50 participants and find p = .00001, then the data will generally offer evidence against H0, and it will usually be the case that H0 is now less plausible than H1. But p-values are not the same as evidence or plausibility, in the same way that height is not the same as weight. From an educational perspective, the only excuse for teaching students the p-value is that it is used in academic research. Not only is this a poor argument, but it also perpetuates a vicious cycle in which students who are taught a dysfuncational methodology later go on to apply that methodology in their research, which further bolsters the argument that the method ought to be taught because it is popular. There exist helpful and straightforward calibrations of p-values (e.g., Sellke et al., 2001; Edwards, Lindman, & Savage, 1963; Wagenmakers, 2022), and these would make for a perfect educational replacement.
The statistical arguments against the use of p-values are numerous and compelling (for an overview see Wagenmakers et al., 2018; quick summary for rushed readers: p-values do not quantify evidence, p-values violate the likelihood principle, p-values cannot be monitored as the data accumulate, p-values do not take into account the predictions from the alternative hypothesis, p-values cannot discriminate between absence of evidence and evidence of absence, etc.). One might say that the p-value has been severely tested (van Dongen et al., 2023), and found wanting — again and again. Now we need to step up and act: it is time that we reject not the null-hypothesis, but the p-value itself. I am excited that the University of Amsterdam is leading the charge, and I am hopeful that other universities will soon follow suit.
References
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193-242.
Jeffreys, H. (1939/1961). Theory of Probability (1st/3rd. ed.). Oxford: Oxford University Press.
Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of p Values for Testing Precise Null Hypotheses. The American Statistician, 55, 62-71.
van Dongen, N., Wagenmakers, E.-J., & Sprenger, J. (2023). A Bayesian perspective on severity: Risky predictions and specific hypotheses. Psychonomic Bulletin & Review, 30, 516-533.
Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, A. J., Love, J., Selker, R., Gronau, Q. F., Šmíra, M., Epskamp, S., Matzke, D., Rouder, J. N., Morey, R. D. (2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25, 35-57.
Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The 3p√n rule. Manuscript submitted for publication.