Rationale and Origin of the One-Sided Bayes Factor Hypothesis Test

The default Bayes factor hypothesis test compares the predictive performance of two rival models, the point-null hypothesis \mathcal{H}_0: \delta = 0 and the alternative hypothesis \mathcal{H}_1: \delta \sim f(\cdot). In the case of the t-test, a popular choice for the prior distribution f is a Cauchy centered on zero with scale parameter 0.707 (for informed alternatives see Gronau et al., in press).

One of the problems with such default prior distributions is that they are not directional — in frequentist lingo, they instantiate a two-sided test. Two-sided tests are sometimes useful, but the theories that provide the inspiration for experimental work generally come with strong directional predictions. In fact, a directional prediction is often all that a ‘theory’ provides. For example, the facial feedback hypothesis states that people who hold a pen between their teeth find cartoons to be more funny –not less funny– than people who hold a pen with their lips (e.g., Strack et al., 1988; but see Wagenmakers et al., 2016). Similarly, arguments have been put forward as to why lonely people take showers that are hotter (not colder) than people who are not lonely (e.g., Bargh & Shalev, 2012; but see Donnellan et al., 2015).

Consequently, when the purpose is to assess the relative predictive performance of \mathcal{H}_1, a prior distribution centered on zero misrepresents the fundamental directional nature of \mathcal{H}_1. To respect the fact that \mathcal{H}_1 makes a directional prediction, the easiest way is to use folded versions of the two-sided prior distribution, where all mass from the anomalous direction is relocated to the direction associated with the hypothesis under scrutiny.

By eliminating the anomalous predictions, \mathcal{H}_+ (when the predicted direction is positive) makes more specific claims, and this enables the same data to provide more diagnostic evidence. Examples of the properties and benefits of one-sided Bayes factor tests are provided for instance in Wetzels et al., 2009, Wagenmakers et al., 2010, and Wagenmakers et al., 2016. I’ve always found the one-sided versions of the Bayes factor hypothesis test highly appropriate for hypotheses in experimental psychology, and I was happy to see this intuition validated by our Bayesian hero Harold Jeffreys. In Theory of Probability, Jeffreys mentions the one-sided test in the third edition (1961) on page 283 (and also in Jeffreys, 1948, p. 256; but not, it seems, in the original 1939 edition). Specifically, in section 5.43, “Test of whether a standard error has a suggested value \sigma_0”, Jeffreys writes:

“But where there is a predicted standard error the type of disturbance chiefly to be considered is one that will make the actual one larger [italics mine], and verification is desirable before the predicted value is accepted. Hence we consider also the case where \zeta is restricted to be non-negative [italics mine]. The result is to change \sqrt{2} in (8) to 2\sqrt{2} and make the lower limit 0 [this is the folding process — EJ]. The approximations now fall into three types according as \zeta = z lies well within the range of integration, well outside it, or near 0”.

Jeffreys then elaborates: when the unrestricted posterior is symmetric around zero, the one-sided Bayes factor equals the two-sided Bayes factor; when the unrestricted posterior is located almost entirely in the restricted region, the one-sided Bayes factor against the null hypothesis is twice as high as the two-sided Bayes factor (the alternative hypothesis predicts better as its anomalous predictions have been removed); and when the unrestricted posterior is located outside the restricted region, the one-sided Bayes factor strongly supports the null (but also raises doubts about the data and/or the models). These three cases are also the ones discussed here and here.

References

Bargh, J. A., & Shalev, I. (2012). The substitutability of physical and social warmth in daily life. Emotion, 12, 154-162.

Donnellan, M. B., Lucas, R. E., & Cesario, J. (2015). On the association between loneliness and bathing habits: Nine replications of Bargh and Shalev (2012) Study 1. Emotion, 15, 109-119.

Gronau, Q. F., Ly, A., & Wagenmakers, E.-J. (in press). Informed Bayesian t-tests. The American Statistician. ArXiv: https://arxiv.org/abs/1704.02479

Jeffreys, H. (1948). Theory of Probability (2nd ed.). Oxford: Oxford University Press.

Strack, F., Martin, L. L., & Stepper, S. (1988). Inhibiting and facilitating conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of Personality and Social Psychology, 54, 768-777.

Wetzels, R., Raaijmakers, J. G. W., Jakab, E., & Wagenmakers, E.-J. (2009). How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian t test. Psychonomic Bulletin & Review, 16, 752-760.

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., and Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158-189.

Wagenmakers, E.-J., Verhagen, A. J., & Ly, A. (2016). How to quantify the evidence for the absence of a correlation. Behavior Research Methods, 48, 413-426.

Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., et al. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11, 917-928.

About The Author

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.