Was Bem's "Feeling the Future" paper exploratory?

But I don't think that is a fair criticism.

To clarify, my point was not meant as a criticism of Bem himself or his paper. Bem was following the analytical standards of experimental psychology. Unfortunately, those standards allow for, indeed encourage, researcher degrees of freedom, which invalidate most published research in the field. Francis (2013a) found excess success in 82% of multi-experiment papers in Psychological Science and Francis et al (2014) found excess success in 83% of multi-experiment psychology papers in Science. What Francis, I, and others are criticizing are the accepted standards themselves, not the practitioners who are following them.

If there were a predetermined hypothesis, then the p-value for that hypothesis would be valid, whatever additional exploratory comparisons were made using the same data.

That's true: if there were a specific predetermined hypothesis that admitted a single statistical hypothesis, then the p-value for that test would be valid—for that test. The problem is that there is an overarching, more-general, hypothesis that each experiment is testing for which many tests could be conducted. And if the investigator would conduct these other tests if his "main" hypothesis test were not significant, and claim that the overarching hypothesis were supported if any one of these tests were significant, then none of these p-values (not even the "main" one) would be a valid p-value for the experiment. This would be true even if each p-value were valid for each test individually.

To put it another way, a valid p-value for the whole experiment would have to be calculated by taking into account how many options the investigators had, and what options would they take, depending on the outcome of their "main" test. This would likely be an impossible calculation to perform, because I doubt that experimenters know themselves what they would do ahead of time. This is why predetermined analysis protocols, which I suspect have been rarely employed in experimental psych, are important.

And every experiment conducted using both male and female subjects has the potential for an additional exploratory male-female comparison.

I agree that exploratory research is important, but investigators need to clearly separate their confirmatory hypotheses from their exploratory ones. If researchers would substitute an exploratory or secondary hypothesis for their main hypothesis if their main hypothesis test were insignificant then they inflate their Type I error rate for the experiment, and no p-value from any of the tests will be a valid p-value for the experiment. It's really just a multiple comparison problem. What is subtle is that the problem exists even if the main hypothesis test is significant and none of the secondary tests are actually performed. The Type I error rate is a long-run probability; hence a valid p-value for the experiment would have to take into account things that would have been done had the main test been non-significant (even when the main test is significant).
 
Last edited:
Jay

I agree with most of what you say, and looking more closely at this work I think there are more potential causes for concern than I had noticed before.

The thing is, though, that Bem says he did have a pre-defined hypothesis for each experiment. Granted, we can't necessarily just take that at face value. But equally, we can't just assume it's untrue, purely because we can think of a number of hypotheses that can be tested against the data.

Experiment 1 is complicated, because it includes several different classes of targets. But for most of the experiments, to my mind the hypothesis given by Bem does seem like the obvious one to be tested.
 
Continuing from discussion of Bem's experiments on another thread:
http://www.skeptiko-forum.com/threads/parapsychology-science-or-pseudoscience.2468/page-6
http://www.skeptiko-forum.com/threads/parapsychology-science-or-pseudoscience.2468/page-7
http://www.skeptiko-forum.com/threads/parapsychology-science-or-pseudoscience.2468/page-8
http://www.skeptiko-forum.com/threads/parapsychology-science-or-pseudoscience.2468/page-9

Here's a blog post in which someone examines the variance of the z-scores in Bem's 10 experiments. According to this analysis, the variance is far too low, and is associated with a p value of 0.005. That is, it would occur by chance only one time in 200:
https://replicationindex.wordpress....detection-of-questionable-research-practices/

This is a much smaller p value than the ones Francis obtained (just under 0.1) by considering the likelihood of obtaining as many as 9 significant results out of 10.

In theory, the same objection discussed on the other thread could apply. It could be that psi effects - such as an experimenter effect - invalidate the statistical analysis that's been used. But if this person has got their sums right, it does seem to be strong evidence that something very odd happened in Bem's experiments - either questionable research practices of some kind, or else some kind of over-arching psi effect that means the experiment can't be analysed in the usual way as a sampling of statistically independent phenomena.
 
I suppose this unexpectedly tight clustering of the z-scores around their mean value reflects the same aspect of Bem's studies illustrated by Daniel Lakens in his review of the meta-analysis by Bem et al:
http://daniellakens.blogspot.nl/2015/04/why-meta-analysis-of-90-precognition.html

The figure with a grey triangle nearly halfway down the web page shows nearly all the data points concentrated near the right-hand edge of the grey triangle, outside of which lies the region of statistical significance. Lakens concludes that if data points all lie on the edge of the triangle, "there is a clear indication the studies were selected based on their (one-sided) p-value." (But if that were the case, shouldn't the distribution tend to spread to the right of the triangle, rather than just following its edge?)

Lakens thinks the higher-than-expected number of p values in Bem's experiments indicates "a much more substantial file-drawer, or p-hacking." The trouble with that - if it's meant to explain away the significance of Bem's results altogether - is that it would require something like 97% of Bem's results to be in the file drawer, or else it would require Bem to have had 30 or 40 independent hypotheses to choose from for each experiment. I find that difficult to believe.
 
Here's a blog post in which someone examines the variance of the z-scores in Bem's 10 experiments. According to this analysis, the variance is far too low, and is associated with a p value of 0.005. That is, it would occur by chance only one time in 200:

https://replicationindex.wordpress....detection-of-questionable-research-practices/

This is a much smaller p value than the ones Francis obtained (just under 0.1) by considering the likelihood of obtaining as many as 9 significant results out of 10.

Thanks for finding that post. I was not aware of Schimmack's Test of Insufficient Variance (TIVA). The TIVA is much more sensitive than the Test of Excess Success, which Francis used.

In theory, the same objection discussed on the other thread could apply. It could be that psi effects - such as an experimenter effect - invalidate the statistical analysis that's been used. But if this person has got their sums right, it does seem to be strong evidence that something very odd happened in Bem's experiments - either questionable research practices of some kind, or else some kind of over-arching psi effect that means the experiment can't be analysed in the usual way as a sampling of statistically independent phenomena.

Your idea that an "over-arching psi effect" could create dependency among the observations is difficult to fathom. Perhaps you are thinking back to our discussion of ganzfeld studies, in which the overall hit rate for the experiment was computed by pooling trials across subjects and analyzed using a binomial test. I complained that such an analysis is invalid because it assumes independence from trial to trial, not allowing for non-independence of trials performed by the same subject. You disagreed with me, saying that since that any such non-independence could only be due to psi. And, therefore, since the null hypothesis was "no psi," that the statistical test was valid. If this is what you're thinking when you say that psi could cause non-independence in Bem's statistics, then you're mistaken, because for Bem's statistics in Table 7, the ones that Francis analyzed, Bem did not pool individual trials across participants. Rather, he computed a hit rate for each participant, and performed his statistical analysis on these participant hit rates.

The dependence in pooled analyses (like the typical ganzfeld analysis) results from heterogeneity of hit rates among subjects. This heterogeneity implies that the variance of the hit rates comprises two components: a binomial component and an additional, between-participants component, which a standard binomial analysis does not take into account. However, a set of per-participant hit rates clearly reflects this heterogeneity; it goes without saying that a set of participant-level hit rates reflects differences in participant-level hit rates. So, there is no issue of dependency among the observations Bem used in Table 7. These participant-level observations are independent, and Bem's analyses are valid in this respect as is Francis's analysis of Bem's results.

The other possibility is that you really think that an unhypothesized psi effect could have affected Bem's participant-level statistics in some spooky, capricious manner. Bem's hypothesis was not that psi could do anything, such as induce dependence between observations on unrelated subjects; it was that psi would cause the mean hit rate to be greater than predicted by chance. But if psi can induce unhypothesized dependence in a precognition experiment, then why couldn't psi induce unhypotheszied dependence in a conventional psychology experiment. In that case, every statistical investigation of questionable research practices in experimental psychology would be invalid. Moreover, researchers do more than just report p-values; they report effect estimates. But if their experiments are affected by psi-induced dependency, then their effect estimates are incorrect. If psi-induced dependency is possible, then every psychological experiment's results are suspect. Furthermore, once you go down this road, what is to say that unhypothesized psi effects would be restricted to psych studies. Why not clinical trials, experiments in particle physics, and so on? Believing that such capricious psi effects are realistically possible implies that you believe that ordinary physical or psychological phenomena are inherently unpredictable and hence not amenable to systematic study, a profoundly unscientific worldview.
 
Last edited:
Jay

Yes, it's the "spooky, capricious" idea I'm talking about. If there is an "experimenter effect" - I mean a paranormal one - then by definition the responses of individual trials/subjects won't be independent of one another, because they will all depend on the experimenter. So the idea shouldn't be difficult to fathom. Nor is it particularly outlandish in the context of such experiments, because the possibility of an experimenter effect has been discussed so often.

Of course, you're right that such effects wouldn't necessarily be limited to psychical research experiments. But statistical analyses have indicated similar problems in psychological experiments, so that in itself isn't particularly implausible. Certainly such effects would make experimental results more difficult to analyse statistically, but I think it's going too far to say that one would have to adopt a "profoundly unscientific worldview". After all, some extreme sceptics say that about the very possibility of psychical phenomena.
 
Bem's hypothesis was not that psi could do anything, such as induce dependence between observations on unrelated subjects; it was that psi would cause the mean hit rate to be greater than predicted by chance.

Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for. I'm not sure he said anywhere that he expected it to be limited to a pure increase in the hit rate and nothing more complicated. But even if he did say that, there's always the possibility that he found something he hadn't expected to find!
 
Yes, it's the "spooky, capricious" idea I'm talking about. If there is an "experimenter effect" - I mean a paranormal one - then by definition the responses of individual trials/subjects won't be independent of one another, because they will all depend on the experimenter.

Not by any definition of independent I've ever seen. If there were only one experimenter, we would expect the subject-level responses to still be independent. If there were a random sample of experimenters, and they had different experimenter effects (whatever those supposedly are), then the subject-level responses would be conditionally independent given the experimenter, but unconditionally dependent.

Nor is it particularly outlandish in the context of such experiments, because the possibility of an experimenter effect has been discussed so often.

I don't think outlandishness depends on frequency of discussion.

Of course, you're right that such effects wouldn't necessarily be limited to psychical research experiments. But statistical analyses have indicated similar problems in psychological experiments, so that in itself isn't particularly implausible.

So you think it plausible that the reproducibility problems that experimental psychology has been experiencing are due to paranormal influences. Um, okay.

Certainly such effects would make experimental results more difficult to analyse statistically, but I think it's going too far to say that one would have to adopt a "profoundly unscientific worldview".

If you believe paranormal forces affect the results of experiments in psychology and physics, then you have already adopted a profoundly unscientific worldview.

Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for. I'm not sure he said anywhere that he expected it to be limited to a pure increase in the hit rate and nothing more complicated.

His statistical tests imply that he expected to find a difference in mean hit rates (or in "DR%" in the last two experiments).

But even if he did say that, there's always the possibility that he found something he hadn't expected to find!

Oh, Bem expected to find everything he found. After all, all his hypotheses were preplanned. ;-)
 
Last edited:
His statistical tests imply that he expected to find a difference in mean hit rates (or in "DR%" in the last two experiments).

Just read what I said:
Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for.

And much the same goes for all your comments. You're just not thinking.
 
Just read what I said:
Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for.

And much the same goes for all your comments. You're just not thinking.

No, I am thinking. What I'm not doing is making up ad hoc hypotheses. Bem's hypothesis was that people have precognition, not that people's responses will be non-independent. The latter you have just made up.

But it doesn't matter anyway. You were wrong that an experimenter effect in a single-experimenter study implies that the subject-level responses will be dependent.

And in case that weren't enough, even if the responses were dependent, then that would increase the variance of Bem's effect estimates, increase the p-values of his results, and result in less statistical significance.

You're barking up the wrong tree.
 
Last edited:
Bem's hypothesis was that people have precognition, not that people's responses will be non-independent. The latter you have just made up.

I'll repeat it again:
Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for.

I've said nothing whatsoever about Bem having a hypothesis about non-independence. For the umpteenth time, I'm talking about the null hypothesis he used for his statistical calculations.

You were wrong that an experimenter effect in a single-experimenter study implies that the subject-level responses will be dependent.

Your problem is that you're making all manner of assumptions, and I don't think you're even aware that you're making them.
 
I'll repeat it again:
Obviously we have to distinguish between the null hypothesis Bem used to analyse his experiments statistically - which was a no psi hypothesis - and what ideas he might have had in his head about the phenomena he was looking for.

I guess you're going to have to remind my what the relevance of that statement is.

I've said nothing whatsoever about Bem having a hypothesis about non-independence. For the umpteenth time, I'm talking about the null hypothesis he used for his statistical calculations.

You have yet to address any argument I've made. You just repeat statements from your original argument, which I thought I already showed was invalid. Being charitable to both of us, it seems that one of us has missed something. I'll give you the benefit of the doubt and assume that it's me. So, since I thought I've already showed why your argument is wrong for several reasons, all I can do is asked that you restate your argument. If I think it's valid I'll say so; otherwise, I will try again to explain why it is not.

Your problem is that you're making all manner of assumptions, and I don't think you're even aware that you're making them.

That's too general to comment on. I'll say this, though: you have to assume something. As a mathematician you should know that better than anyone. I think your argument comes down to this: We can't assume anything ever about anything. Therefore, anything could have happened in Bem's experiments. Therefore, no statistical analysis could show Bem's results are invalid. And that goes for every analysis that has ever been done: psi could do anything; therefore no experiment can ever rule out psi. Therefore, we might as well not bother doing experiments. Therefore, all science is pseudoscience.

I guess you should just make your argument explicitly (again). I will try one more time to address it point by point.
 
Last edited:
I guess you're going to have to remind my what the relevance of that statement is.

Reread the discussion above if you can't remember it.

And if you can't remember what I posted just a few hours ago, perhaps you are not in the best position to tell me what I am arguing.

Can threads be made Mod+ retrospectively?
 
Last edited by a moderator:
Reread the discussion above if you can't remember it.

And if you can't remember what I posted just a few hours ago, perhaps you are not in the best position to tell me what I am arguing.

Can threads be made Mod+ retrospectively?

Like I said, I think I showed why your argument was wrong, whereas, as far as I can tell, you have ignored all my objections to it. I guess we're at an impasse. But that's to be expected since, as far as I can tell, you think that psi can do anything to anything, and therefore scientific study of anything is futile.
 
Last edited:
... as far as I can tell, you think that psi can do anything to anything, and therefore scientific study of anything is futile.

That's a bit rich coming from someone whose hobby - as far as I can tell - is eating babies. :D
 
If there is an "experimenter effect" - I mean a paranormal one - then by definition the responses of individual trials/subjects won't be independent of one another, because they will all depend on the experimenter.

And of course - now I come to think of it - once the no psi hypothesis is dropped, there's no reason to suppose that successive trials by the same subject are independent of one another either.
 
If there is an "experimenter effect" - I mean a paranormal one - then by definition the responses of individual trials/subjects won't be independent of one another, because they will all depend on the experimenter.
And of course - now I come to think of it - once the no psi hypothesis is dropped, there's no reason to suppose that successive trials by the same subject are independent of one another either.

First you quote yourself about something that I have already explained is wrong. To repeat, an "experimenter effect" does not imply that responses of different subjects will be dependent. Secondly, successive trials by the same subject would be expected to be dependent, whether one has "dropped" (whatever that means) the no psi hypothesis or not.
 
Reread the discussion above if you can't remember it.

And if you can't remember what I posted just a few hours ago, perhaps you are not in the best position to tell me what I am arguing.

Can threads be made Mod+ retrospectively?

Not in the critical discussions subforum. This is the area where skeptics are always allowed to participate.
I've followed the discussion and I can understand your frustration. Jay seem incapable of understanding your very clear statement about differentiating between statistical analyses and what ideas someone might have in their head about what psi is.
 
First you quote yourself about something that I have already explained is wrong. To repeat, an "experimenter effect" does not imply that responses of different subjects will be dependent. Secondly, successive trials by the same subject would be expected to be dependent, whether one has "dropped" (whatever that means) the no psi hypothesis or not.

Wrong on all counts. As I said, you're not thinking at all.
 
Back
Top