Accordingly, our critique of Wagenmakers et al.’s (2011) analysis is that their choice of H1 was unrealistic. In particular, they assumed that we have no prior knowledge of the likely effect sizes that the experiments were explicitly designed to detect. As Utts et al. (2010) argued,
It is rare that we have no information about a situation before we collect data. If we want to estimate the proportion of a community that is infected with HIV, do we really believe it is equally likely to be anything from 0 to 1? If we want to estimate the mean change in blood pressure after 10 weeks of meditation, do we really believe it could be anything from to ? Even the choice of what hypotheses to test, and whether to make them one-sided or two-sided is an illustration of using prior knowledge. (p. 2)
In general, we know that effect sizes in psychology typically fall in the range of 0.2 to 0.3. A survey of “one hundred years of social psychology” that cataloged 25,000 studies of eight million people yielded a mean effect size (
r) of .21 (Richard, Bond, & Stokes- Zoota, 2003). An example relevant to Bem’s (2011) retroactive habituation experiments is Bornstein’s (1989) meta-analysis of 208 mere exposure studies, which yielded an effect size (
r) of .26.
We even have some knowledge about previous psi experiments. The Bayesian meta-analysis of 56 telepathy studies, cited above, revealed a Cohen’s
h effect size of approximately 0.18 (Utts et al., 2010), and the meta-analysis of 38 presentiment studies, also cited above, yielded a mean effect size of 0.28 (Mossbridge et al., 2011).
Consequently, no reasonable observer would ever expect effect sizes in laboratory psi experiments to be greater than 0.8—what Cohen (1988) terms a large effect. Cohen noted that even a medium effect of 0.5 “is large enough to be visible to the naked eye” (p. 26). Yet the prior distribution for H1 that Wagenmakers et al. (2011) adopted places a probability of .57 on effect sizes that equal or exceed 0.8. It even places a probability of .06 on effect sizes exceeding 10. If effect sizes were really that large, there
would be no debate about the reality of psi. Thus, the prior distribution Wagenmakers et al. placed on the possible effect sizes under H1 is wildly unrealistic.
Their unfortunate choice has major consequences for their con- clusions about Bem’s data. Whenever the null hypothesis is sharply defined but the prior distribution on the alternative hypoth- esis is diffused over a wide range of values, as it is in the distribution adopted by Wagenmakers et al. (2011), it boosts the probability that
any observed data will be higher under the null hypothesis than under the alternative. This is known as the Lindley–Jeffreys paradox: A frequentist analysis that yields strong evidence in support of the experimental hypothesis can be contra- dicted by a misguided Bayesian analysis that concludes that the same data are more likely under the null. Christensen, Johnson, Branscum, and Hanson (2011) discussed an analysis comparable to that of Wagenmakers et al.,
noting that “the moral of the Lindley– Jeffreys paradox is that if you pick a stupid prior, you can get a stupid posterior” (p. 60).
/QUOTE]