I have very little understanding of statistics, and as a layperson I'm very curious about this. Thanks in advance.

- Thread starter bishop
- Start date

I have very little understanding of statistics, and as a layperson I'm very curious about this. Thanks in advance.

Regular non-random strings in a huge sample like the GCP should, as I understand it, result in an overall database that is many standard deviations off from expectation.

Regular non-random strings in a huge sample like the GCP should, as I understand it, result in an overall database that is many standard deviations off from expectation.

Search for your favorite number in the digits of pi. '111111111' appears first at position 812,432,526.

http://www.subidiom.com/pi/

Note that whether any constant is normal is an outstanding problem in mathematics.

~~ Paul

I believe you can have windows where the Z-score deviates from the null without skewing the overall database. I may be wrong. But how are they measuring the standardness of the database?

Search for your favorite number in the digits of pi. '111111111' appears first at position 812,432,526.

http://www.subidiom.com/pi/

Note that whether any constant is normal is an outstanding problem in mathematics.

~~ Paul

Search for your favorite number in the digits of pi. '111111111' appears first at position 812,432,526.

http://www.subidiom.com/pi/

Note that whether any constant is normal is an outstanding problem in mathematics.

~~ Paul

Here's my basic understanding: if the non-random string happens once, or not very often, then sure, it's going to get eaten up by variance. But if what we are talking about is regular insertions of non-random elements into the stream - which is what I understand is being alleged in the GCP - then it is going to show up in the overall results. My understanding of this comes from the context of discussions involving online poker from some pretty knoweldgeable stats guys. I could probably dig up some very old posts if anyone is interested enough, where one poster demonstrated mathematically how even small - but regular - insertions of non-random strings would eventually result in large deviations from expectation over large samples (and you don't get much larger in terms of samples than the entire GCP stream!)

Let's take an example. Suppose we have 2000 samples. Now suppose the first 1000 samples are nonrandom, but the first 1,500 are not. Isn't the overall measure then random?

What do I know? We need a statistician. Oh, but how do we know that the overall sequence is statistically random?

~~ Paul

Doesn't this depend on how the measure of nonrandomness for substrings interacts with the overall measure?

Let's take an example. Suppose we have 2000 samples. Now suppose the first 1000 samples are nonrandom, but the first 1,500 are not. Isn't the overall measure then random?

What do I know? We need a statistician.

~~ Paul

Let's take an example. Suppose we have 2000 samples. Now suppose the first 1000 samples are nonrandom, but the first 1,500 are not. Isn't the overall measure then random?

What do I know? We need a statistician.

~~ Paul

I'm not sure I followed your example, but the point as I understand it is that as long as the insertions of non-randomness occur often enough, even if tiny the overall database will be many standard deviations from expectation over a large sample. They actual block that poker website here at work but I'll try to remember when I get home to search for the post I'm talking about. It was several years ago but hopefully I can find it.

~~ Paul

Assume we have 2000 samples in the entire database. We find an anomaly in the first 1000 samples, so they are not random. But when we consider the first 1500 samples, they are random. There is nothing special about the final 500 samples. So isn't the overall database random? Perhaps not.

~~ Paul

~~ Paul

http://goodmath.blogspot.com/2006/05/repearing-bad-math.html

http://goodmath.blogspot.com/2006/05/pear-yet-again-theory-behind.html

~~ Paul

You seem to be talking about a one off in the first 1000? Or non-random elements in the first 1000 but not after that? I'm not sure whether that would necessarily affect the entire database or not, but what I'm referring to are regular insertions of non-randomness throughout the entire database.

I should think it would have something to do with the percentage of the database that is nonrandom. But then I don't know what to do about overlapping sequences. Consider looking at subsequences of pi and calculating the mean of the digits. The expectation is 4.5. How many subsequences can we find that are significantly different from 4.5?

~~ Paul

I have very little understanding of statistics, and as a layperson I'm very curious about this. Thanks in advance.

I think the researchers have found that one cannot specify beforehand which range of eggs (the RNG devices) will have been found to be affected, which time period, nor what form the non-randomness will take, except for strictly recurring events. The patterns found, for which windows, and for which eggs, seem to vary for Burning Man, for example.

Of course, this unfortunately gives the appearance that the patterns found are the result of the post hoc pattern search (as described in the link which Paul gave).

It would make sense to focus on the New Years data, as it wouldn't be as subject to this problem (assuming there is never any change in which eggs and which window are analyzed for which pattern). Has this been published?

Hmmm...I found this.

http://noosphere.princeton.edu/newyear.2014.html

Doesn't look at all promising for prediction.

Linda

I guess we probably should stay out of it. I think you're asking, how would Radin and Nelson explain the inability to predict, not how I (or Paul or Arouet) would.

Linda

Linda

I'm not asking why Radin and Nelson are unable to explain an inability, because I don't yet know that that's the reality. Have they been unable to make predictions?

Let's work off of the notion that the GCP is working as they claim it does. Is there some way in which it is possible that relevant data is only discoverable

Let's take a step back for a minute. Say a big event happens, like a tsunami or WTC. What happens then? They use a protocol to go back to the data and locate statistically significant anomalies?

~~ Paul

Yes. Part of the problem is that they can vary the position and size of the analysis window to find deviations from random. How much they do that is not clear to me.

In general I have no problem with GCP working backward by finding the data only after an event. That doesn't make or break the project for me. But I'm a bit puzzled no one seems to understand why. If the GCP could ramp it up to the next level it would have tremendous practical value, such as acting as a early warning system.

In general I have no problem with GCP working backward by finding the data only after an event. That doesn't make or break the project for me. But I'm a bit puzzled no one seems to understand why. If the GCP could ramp it up to the next level it would have tremendous practical value, such as acting as a early warning system.

~~ Paul

Linda