PCG Passes PractRand
There were various questions that people asked when PCG first started being discussed on the Internet; one those questions was whether I had also tested PCG with PractRand. I hadn't. I had done my testing using TestU01, which is considered by most to be the “gold standard”, with the broadest range of tests. Back in 2014, PractRand hadn't been on my radar, but looking into it I discovered that it is TestU01's closest competitor. I put testing with PractRand on my to-do list, but it wasn't a very high priority because I knew that all my PCG family members had been designed to pass TestU01's BigCrush test battery with plenty of headroom, which gave me good reason to believe they would do well in any set of statistical tests.
Finally, this summer I was discussing the best practices for testing random number generators with a colleague and I realized that it was high time for me to finally get around to testing PCG (and a few other recent PRNGs, too!). The short version is, it passes just fine. But for the details, read on…
How Random Number Generator Testing Works
PRNG test suites work by performing some statistically well-understood task using a candidate generator and then checking the plausibility of the results. For example, we could simulate the Birthday Problem, where we keep adding people to a room until two people have the same birthday, and then count how many people we needed. The mathematics of this problem is well understood, so we know exactly how probable every outcome is. A generator that repeated “42” forever would fail this test because repeats (“the same birthday”) occur far too often, as would a generator which never allowed any number to repeat (not enough repeats), as would a generator where repeats happened with the exact same gap between every repeat (incorrect distribution for repeats).
There are a large number of possible tests, some based on familiar ideas like the birthday problem or hands of poker, and others that are more abstruse. It is not uncommon for a generator to pass many tests in a test battery but be tripped up by just one family of tests.
Much like experiments in other fields of science, these tests typically produce a p-value, but whereas scientists usually desire results where the null hypothesis—that the observations were merely due to chance—is ruled out, we desire the opposite: a result confirming that the observed behavior is consistent with chance.
It's in the nature of random number generation that if you run enough tests enough times, you'll sometimes have test outcomes that amount to a “one in a thousand” or even “one in a million” event, but these kinds of failures are easily separated from genuine problems because real problems get worse and worse the more we test, whereas the occasional improbable result won't recur through extended or repeated testing.
Installation and Use; PractRand vs TestU01
It turns out that TestU01 and PractRand are quite complementary.
TestU01
TestU01 comes with an academic paper and two user manuals (a basic
one and an advanced one). It compiles to a library and users must
understand the TestU01 API in order to use it. In practice, to use
the various test batteries TestU01 provides, we just need to
understand one function, unif01_CreateExternGenBits
. We pass it a
function that generates a 32-bit unsigned int and the name of the
generator, and then pass the unif01_Gen*
object it returns on to
the test battery we wish to use, such as bbattery_BigCrush
.
TestU01 is not very hard to interface with in theory, but in practice
there are a few gotchas. It is designed to test at most 32 bits and
some tests use fewer. And it tends to be most sensitive to issues in the
high/most-significant bits rather than the low/least-significant bits.
(Some users have thought that because it also provides a function
unif01_CreateExternGen01
for generators that return floating point
double
value in the range of 0–1 that this function would use all 48
bits of the floating-point mantissa but that is not the case.)
Thus even when testing 32-bit values, it is important to run two tests, a regular test and a test with a modified version of the PRNG that returns its bits flipped into reverse order.
When the Crush batteries finish, they write a summary. For example, a successful pair of Small Crush runs looks like this:
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: 32-bit WELLRNG512a
Number of statistics: 15
Total CPU time: 00:00:07.82
All tests were passed
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: 32-bit WELLRNG512a [Reversed]
Number of statistics: 15
Total CPU time: 00:00:09.10
All tests were passed
and a failure looks like this:
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: 32-bit MINSTD LCG
Number of statistics: 15
Total CPU time: 00:00:10.05
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
1 BirthdaySpacings eps
2 Collision 1 - eps1
6 MaxOft eps
----------------------------------------------
All other tests were passed
For generators that produce 64-bit values, we must generate at least four variations to return the high 32 bits, the low 32 bits, and each of these reversed. Even there, it is not clear whether there might be some unfortunate correlation between the top half and the bottom half of the 64-bit value that this “half at a time” approach examines, so perhaps we should also add two more variants, a version that alternates the high-32 and low-32 bits and its reversed version.
Because 64-bit generators are difficult to test with much certainty in TestU01, I strongly recommend that creators of 64-bit generators make sure that their generation scheme is scalable to smaller sizes, allowing them to test cut-down 32-bit versions that use half as much memory for generator state. That way TestU01 can have a more holistic view of a smaller-scale generator. If the smaller-scale generator passes TestU01's tests, we have much more reason to be confident about its larger sibling.
TestU01 is capable of far more than just running its crush batteries. There are more detailed interfaces to individual tests. For example, as I mentioned in the PCG paper, the linear complexity test (which a number of generators fail, including some of L'Ecuyer's generators) was inexplicably omitted from Small Crush, so my testing program has an option to directly call the linear complexity test, running it at four different sizes, instead of running one of the crush benchmarks. Here's are the results of the first of those calls:
***********************************************************
HOST = satsuki.local, Darwin
32-bit WELLRNG512a
scomp_LinearComp test:
-----------------------------------------------
N = 1, n = 5000, r = 0, s = 1
-----------------------------------------------
Number of degrees of freedom : 5
Chi2 statistic for size of jumps : 3.76
p-value of test : 0.58
-----------------------------------------------
Normal statistic for number of jumps : -40.14
p-value of test : 1 - eps1 *****
-----------------------------------------------
CPU time used : 00:00:00.00
Generator state:
As you can see, individual test results are more detailed than the Crush test battery results. The failure of the test is marked by the asterisks. The other three tests fail similarly. (Running all four tests took less than 0.1 of a second to run, so it would not have significantly slowed to Small Crush to have included some linear-complexity testing.)
In summary, TestU01 is well documented, powerful, and fairly easy to use, but it is also a little tricky to use well and a little sketchy when it comes to testing 64-bit generators.
PractRand
In contrast, PractRand is more sparsely documented and connecting a preexisting random number generator into its C++ code is more complex, but doing so is not actually necessary because it will happily read a binary stream of random numbers from standard input.
Thus, it is even easier to run tests than it is with TestU01. In
fact, we can test macOS's /dev/random
(a special “device file” that
outputs nondeterministic cryptographically secure randomness using the
Yarrow PRNG) by just
running a simple command:
macOS$ ./RNG_test stdin < /dev/random
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0x906c8753
test set = normal, folding = standard(unknown format)
rng=RNG_stdin, seed=0x906c8753
length= 32 megabytes (2^25 bytes), time= 3.9 seconds
no anomalies in 130 test result(s)
rng=RNG_stdin, seed=0x906c8753
length= 64 megabytes (2^26 bytes), time= 8.3 seconds
no anomalies in 139 test result(s)
rng=RNG_stdin, seed=0x906c8753
length= 128 megabytes (2^27 bytes), time= 16.4 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +4.9 p = 1.1e-3 unusual
...and 150 test result(s) without anomalies
rng=RNG_stdin, seed=0x906c8753
length= 256 megabytes (2^28 bytes), time= 32.5 seconds
no anomalies in 162 test result(s)
rng=RNG_stdin, seed=0x906c8753
length= 512 megabytes (2^29 bytes), time= 63.0 seconds
no anomalies in 171 test result(s)
rng=RNG_stdin, seed=0x906c8753
length= 1 gigabyte (2^30 bytes), time= 122 seconds
no anomalies in 183 test result(s)
^C
Compared to TestU01, there are a few things to notice. TestU01's various Crush test batteries run silently for a period of time (a few hours for Big Crush) and then output their results. In contrast, PractRand starts telling you what it's seeing almost immediately and issues a report every time it has seen twice as much data as the last time it made a report.
Also, PractRand is oriented towards receiving a bytestream. It doesn't mind if the generators are 32-bit, 64-bit, 128-bit, or something more unusual like 72-bit or 96-bit. (Although it does benefit from knowing that the generator is producing 32- or 64-bit chunks.)
Finally, we can see that one test reported a result that was slightly unlikely from a statistical perspective but the blip vanished as testing continued.
I stopped the test interactively after 1 GB by pressing Control-C. By default PractRand will keep going until it has tested 32 TB of output. If 1 GB takes 122 seconds, then it'll take 46 days to finish the test! Might as well stop it now.
For fun let's try another Unix special file, /dev/zero
, which
outputs zeros forever:
macOS$ $ ./RNG_test stdin < /dev/zero
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0xbbfee72
test set = normal, folding = standard(unknown format)
rng=RNG_stdin, seed=0xbbfee72
length= 128 megabytes (2^27 bytes), time= 2.4 seconds
Test Name Raw Processed Evaluation
BCFN(2+0,13-3,T) R=+17706724 p = 0 FAIL !!!!!!!!
BCFN(2+1,13-3,T) R=+8369402 p = 0 FAIL !!!!!!!!
BCFN(2+2,13-3,T) R=+4018669 p = 0 FAIL !!!!!!!!
BCFN(2+3,13-3,T) R=+1951923 p = 0 FAIL !!!!!!!!
BCFN(2+4,13-4,T) R=+1217378 p = 0 FAIL !!!!!!!!
BCFN(2+5,13-5,T) R=+754738 p = 0 FAIL !!!!!!!!
BCFN(2+6,13-5,T) R=+373439 p = 0 FAIL !!!!!!!!
BCFN(2+7,13-6,T) R=+229795 p = 0 FAIL !!!!!!!!
BCFN(2+8,13-6,T) R=+114286 p = 0 FAIL !!!!!!!!
BCFN(2+9,13-7,T) R=+69277 p = 0 FAIL !!!!!!!!
BCFN(2+10,13-8,T) R=+41033 p = 0 FAIL !!!!!!!!
BCFN(2+11,13-8,T) R=+20468 p = 1e-5195 FAIL !!!!!!!!
BCFN(2+12,13-9,T) R=+11745 p = 3e-2640 FAIL !!!!!!!!
BCFN(2+13,13-9,T) R= +5857 p = 4e-1317 FAIL !!!!!!!!
DC6-9x1Bytes-1 R=+10806727 p = 0 FAIL !!!!!!!!
Gap-16:A R=+3583824 p = 0 FAIL !!!!!!!!
Gap-16:B R=+16048500 p = 0 FAIL !!!!!!!!
FPF-14+6/16:cross R=+294195018 p = 0 FAIL !!!!!!!!
BRank(12):128(4) R= +5300 p~= 1e-2819 FAIL !!!!!!!!
BRank(12):256(4) R=+10811 p~= 1e-5750 FAIL !!!!!!!!
BRank(12):384(1) R= +8161 p~= 1e-2457 FAIL !!!!!!!!
BRank(12):512(2) R=+15438 p~= 2e-4648 FAIL !!!!!!!!
BRank(12):768(1) R=+16427 p~= 3e-4946 FAIL !!!!!!!!
BRank(12):1K(2) R=+31025 p~= 1e-9340 FAIL !!!!!!!!
BRank(12):1536(1) R=+32960 p~= 4e-9923 FAIL !!!!!!!!
[Low1/8]BCFN(2+0,13-5,T) R=+3546530 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+1,13-5,T) R=+1676322 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+2,13-5,T) R=+804897 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+3,13-5,T) R=+390941 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+4,13-6,T) R=+237394 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+5,13-6,T) R=+116954 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+6,13-7,T) R=+70419 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+7,13-8,T) R=+41511 p = 0 FAIL !!!!!!!!
[Low1/8]BCFN(2+8,13-8,T) R=+20637 p = 2e-5238 FAIL !!!!!!!!
[Low1/8]BCFN(2+9,13-9,T) R=+11814 p = 1e-2655 FAIL !!!!!!!!
[Low1/8]BCFN(2+10,13-9,T) R= +5881 p = 1e-1322 FAIL !!!!!!!!
[Low1/8]DC6-9x1Bytes-1 R=+1591281 p = 0 FAIL !!!!!!!!
[Low1/8]Gap-16:A R=+604230 p = 0 FAIL !!!!!!!!
[Low1/8]Gap-16:B R=+2770854 p = 0 FAIL !!!!!!!!
[Low1/8]FPF-14+6/16:cross R=+33904284 p = 0 FAIL !!!!!!!!
[Low1/8]BRank(12):128(4) R= +5300 p~= 1e-2819 FAIL !!!!!!!!
[Low1/8]BRank(12):256(2) R= +7644 p~= 3e-2302 FAIL !!!!!!!!
[Low1/8]BRank(12):384(1) R= +8161 p~= 1e-2457 FAIL !!!!!!!!
[Low1/8]BRank(12):512(2) R=+15438 p~= 2e-4648 FAIL !!!!!!!!
[Low1/8]BRank(12):768(1) R=+16427 p~= 3e-4946 FAIL !!!!!!!!
[Low4/32]BCFN(2+0,13-5,T) R=+3546530 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+1,13-5,T) R=+1676322 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+2,13-5,T) R=+804897 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+3,13-5,T) R=+390941 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+4,13-6,T) R=+237394 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+5,13-6,T) R=+116954 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+6,13-7,T) R=+70419 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+7,13-8,T) R=+41511 p = 0 FAIL !!!!!!!!
[Low4/32]BCFN(2+8,13-8,T) R=+20637 p = 2e-5238 FAIL !!!!!!!!
[Low4/32]BCFN(2+9,13-9,T) R=+11814 p = 1e-2655 FAIL !!!!!!!!
[Low4/32]BCFN(2+10,13-9,T) R= +5881 p = 1e-1322 FAIL !!!!!!!!
[Low4/32]DC6-9x1Bytes-1 R=+1591281 p = 0 FAIL !!!!!!!!
[Low4/32]Gap-16:A R=+604230 p = 0 FAIL !!!!!!!!
[Low4/32]Gap-16:B R=+2770854 p = 0 FAIL !!!!!!!!
[Low4/32]FPF-14+6/16:cross R=+33904284 p = 0 FAIL !!!!!!!!
[Low4/32]BRank(12):128(4) R= +5300 p~= 1e-2819 FAIL !!!!!!!!
[Low4/32]BRank(12):256(2) R= +7644 p~= 3e-2302 FAIL !!!!!!!!
[Low4/32]BRank(12):384(1) R= +8161 p~= 1e-2457 FAIL !!!!!!!!
[Low4/32]BRank(12):512(2) R=+15438 p~= 2e-4648 FAIL !!!!!!!!
[Low4/32]BRank(12):768(1) R=+16427 p~= 3e-4946 FAIL !!!!!!!!
[Low1/32]BCFN(2+0,13-6,T) R=+1099294 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+1,13-6,T) R=+519590 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+2,13-6,T) R=+249477 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+3,13-6,T) R=+121164 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+4,13-7,T) R=+72212 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+5,13-8,T) R=+42258 p = 0 FAIL !!!!!!!!
[Low1/32]BCFN(2+6,13-8,T) R=+20899 p = 5e-5305 FAIL !!!!!!!!
[Low1/32]BCFN(2+7,13-9,T) R=+11920 p = 1e-2679 FAIL !!!!!!!!
[Low1/32]BCFN(2+8,13-9,T) R= +5918 p = 6e-1331 FAIL !!!!!!!!
[Low1/32]DC6-9x1Bytes-1 R=+483643 p = 0 FAIL !!!!!!!!
[Low1/32]Gap-16:A R=+223041 p = 0 FAIL !!!!!!!!
[Low1/32]Gap-16:B R=+888483 p = 0 FAIL !!!!!!!!
[Low1/32]FPF-14+6/16:cross R=+7961869 p = 0 FAIL !!!!!!!!
[Low1/32]BRank(12):128(2) R= +3748 p~= 3e-1129 FAIL !!!!!!!!
[Low1/32]BRank(12):256(2) R= +7644 p~= 3e-2302 FAIL !!!!!!!!
[Low1/32]BRank(12):384(1) R= +8161 p~= 1e-2457 FAIL !!!!!!!!
[Low1/32]BRank(12):512(1) R=+10916 p~= 3e-3287 FAIL !!!!!!!!
We can see that PractRand is very unhappy with tons of failures. PractRand actually varies the number of exclamation marks based on the severity of the failure. Obviously all zeros all the time causes very severe failures. The names of the tests are a little cryptic, but they are described on the PractRand website.
That example was a pretty ridiculously extreme fail, so I'll also show you the results from a test of a couple of bad PRNGs to show what more typical fails might look like:
linux$ ./bad-64bit-rng1 | ./RNG_test stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xfecb5683
test set = normal, folding = standard (64 bit)
rng=RNG_stdin64, seed=0xfecb5683
length= 128 megabytes (2^27 bytes), time= 2.1 seconds
no anomalies in 148 test result(s)
rng=RNG_stdin64, seed=0xfecb5683
length= 256 megabytes (2^28 bytes), time= 4.8 seconds
no anomalies in 159 test result(s)
rng=RNG_stdin64, seed=0xfecb5683
length= 512 megabytes (2^29 bytes), time= 10.0 seconds
no anomalies in 169 test result(s)
rng=RNG_stdin64, seed=0xfecb5683
length= 1 gigabyte (2^30 bytes), time= 19.8 seconds
no anomalies in 180 test result(s)
rng=RNG_stdin64, seed=0xfecb5683
length= 2 gigabytes (2^31 bytes), time= 36.5 seconds
Test Name Raw Processed Evaluation
[Low4/64]FPF-14+6/16:(0,14-0) R= +6.3 p = 1.9e-5 unusual
[Low4/64]FPF-14+6/16:(1,14-0) R= +6.2 p = 2.6e-5 unusual
[Low4/64]FPF-14+6/16:all R= +7.5 p = 1.5e-6 suspicious
[Low4/64]FPF-14+6/16:all2 R= +9.6 p = 3.1e-5 mildly suspicious
...and 187 test result(s) without anomalies
rng=RNG_stdin64, seed=0xfecb5683
length= 4 gigabytes (2^32 bytes), time= 71.7 seconds
Test Name Raw Processed Evaluation
[Low16/64]FPF-14+6/16:all R= +7.8 p = 8.4e-7 very suspicious
[Low4/64]FPF-14+6/16:(0,14-0) R= +10.4 p = 3.0e-9 very suspicious
[Low4/64]FPF-14+6/16:(1,14-0) R= +9.5 p = 2.0e-8 very suspicious
[Low4/64]FPF-14+6/16:all R= +12.3 p = 4.9e-11 VERY SUSPICIOUS
[Low4/64]FPF-14+6/16:all2 R= +33.1 p = 1.7e-14 FAIL
[Low1/64]FPF-14+6/16:cross R= +9.3 p = 1.7e-8 very suspicious
...and 195 test result(s) without anomalies
linux$ ./bad-64bit-rng2 | ./RNG_test stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0x72e95cf7
test set = normal, folding = standard (64 bit)
rng=RNG_stdin64, seed=0x72e95cf7
length= 128 megabytes (2^27 bytes), time= 2.1 seconds
no anomalies in 148 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 256 megabytes (2^28 bytes), time= 4.7 seconds
no anomalies in 159 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 512 megabytes (2^29 bytes), time= 9.3 seconds
no anomalies in 169 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 1 gigabyte (2^30 bytes), time= 18.3 seconds
no anomalies in 180 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 2 gigabytes (2^31 bytes), time= 36.3 seconds
no anomalies in 191 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 4 gigabytes (2^32 bytes), time= 71.4 seconds
Test Name Raw Processed Evaluation
[Low16/64]Gap-16:B R= +4.1 p = 2.0e-3 unusual
...and 200 test result(s) without anomalies
rng=RNG_stdin64, seed=0x72e95cf7
length= 8 gigabytes (2^33 bytes), time= 141 seconds
no anomalies in 212 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 16 gigabytes (2^34 bytes), time= 292 seconds
no anomalies in 223 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 32 gigabytes (2^35 bytes), time= 578 seconds
no anomalies in 233 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 64 gigabytes (2^36 bytes), time= 1124 seconds
no anomalies in 244 test result(s)
rng=RNG_stdin64, seed=0x72e95cf7
length= 128 gigabytes (2^37 bytes), time= 2193 seconds
Test Name Raw Processed Evaluation
Gap-16:B R= +4.1 p = 1.9e-3 unusual
[Low16/64]FPF-14+6/16:all R= +5.3 p = 1.7e-4 unusual
...and 253 test result(s) without anomalies
rng=RNG_stdin64, seed=0x72e95cf7
length= 256 gigabytes (2^38 bytes), time= 4239 seconds
Test Name Raw Processed Evaluation
[Low16/64]FPF-14+6/16:all R= +6.4 p = 1.6e-5 mildly suspicious
[Low16/64]FPF-14+6/16:all2 R= +8.9 p = 2.1e-5 mildly suspicious
...and 263 test result(s) without anomalies
rng=RNG_stdin64, seed=0x72e95cf7
length= 512 gigabytes (2^39 bytes), time= 8493 seconds
Test Name Raw Processed Evaluation
[Low16/64]FPF-14+6/16:(3,14-0) R= +11.1 p = 7.2e-10 very suspicious
[Low16/64]FPF-14+6/16:(4,14-0) R= +9.0 p = 6.6e-8 suspicious
[Low16/64]FPF-14+6/16:(5,14-0) R= +6.9 p = 5.4e-6 unusual
[Low16/64]FPF-14+6/16:all R= +10.8 p = 1.3e-9 VERY SUSPICIOUS
[Low16/64]FPF-14+6/16:all2 R= +38.1 p = 9.0e-19 FAIL !
...and 271 test result(s) without anomalies
One thing you can see from these results is that the time it takes a generator to fail can vary significantly. The first failed under two minutes, whereas the second took 2 hours, 22 minutes to fail. You also get to see the way things fail, with problems building over time and becoming more obvious as we take more data. One nice thing about PractRand is that unless you tell it to keep going regardless, it stops the moment the generator shows serious failures. In contrast, TestU01 waits to complete the test battery.
One other nice thing about PractRand is that all the tests are being applied at once (sharing the same PRNG output). In contrast, TestU01 runs each test one after the other. In other words, PractRand makes better use of the (purportedly) random data it receives.
In summary, PractRand is a great complement to TestU01. It doesn't have as many statistical tests, but the ones it does have are good ones. It gives you results sooner, and it can test more of the output of a PRNG.
Testing PCG
Here are the results of testing pcg64
.
linux$ ./pcg64 | ./RNG_test stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0x78041158
test set = normal, folding = standard (64 bit)
rng=RNG_stdin64, seed=0x78041158
length= 128 megabytes (2^27 bytes), time= 2.1 seconds
no anomalies in 148 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 256 megabytes (2^28 bytes), time= 4.8 seconds
no anomalies in 159 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 512 megabytes (2^29 bytes), time= 9.3 seconds
no anomalies in 169 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 1 gigabyte (2^30 bytes), time= 17.8 seconds
no anomalies in 180 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 2 gigabytes (2^31 bytes), time= 33.9 seconds
no anomalies in 191 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 4 gigabytes (2^32 bytes), time= 64.5 seconds
no anomalies in 201 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 8 gigabytes (2^33 bytes), time= 128 seconds
Test Name Raw Processed Evaluation
[Low4/64]BCFN(2+1,13-1,T) R= -7.6 p =1-5.7e-4 unusual
...and 211 test result(s) without anomalies
rng=RNG_stdin64, seed=0x78041158
length= 16 gigabytes (2^34 bytes), time= 254 seconds
no anomalies in 223 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 32 gigabytes (2^35 bytes), time= 504 seconds
no anomalies in 233 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 64 gigabytes (2^36 bytes), time= 998 seconds
no anomalies in 244 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 128 gigabytes (2^37 bytes), time= 1989 seconds
no anomalies in 255 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 256 gigabytes (2^38 bytes), time= 3946 seconds
no anomalies in 265 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 512 gigabytes (2^39 bytes), time= 7932 seconds
no anomalies in 276 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 1 terabyte (2^40 bytes), time= 16853 seconds
no anomalies in 287 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 2 terabytes (2^41 bytes), time= 37027 seconds
no anomalies in 297 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 4 terabytes (2^42 bytes), time= 75378 seconds
no anomalies in 308 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 8 terabytes (2^43 bytes), time= 147024 seconds
no anomalies in 319 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 16 terabytes (2^44 bytes), time= 292970 seconds
no anomalies in 329 test result(s)
rng=RNG_stdin64, seed=0x78041158
length= 32 terabytes (2^45 bytes), time= 563979 seconds
no anomalies in 339 test result(s)
This is exactly the kind of result we might hope for. It takes about a week to run, but in the end completes with no problems. As we would expect, during the test there was a brief moment one of the test results looked unusual but as testing continued it proved to be nothing.
It's a similar story for pcg64_fast
. In the interest of not filling
your screen with junk, I'll switch to abbreviating the rest of the
results to show just the beginnings and the ends of the tests.
linux$ ./pcg64_fast | ./RNG_test stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xf24baca7
test set = normal, folding = standard (64 bit)
rng=RNG_stdin64, seed=0xf24baca7
length= 128 megabytes (2^27 bytes), time= 2.2 seconds
no anomalies in 148 test result(s)
[...]
rng=RNG_stdin64, seed=0xf24baca7
length= 8 terabytes (2^43 bytes), time= 132005 seconds
no anomalies in 319 test result(s)
rng=RNG_stdin64, seed=0xf24baca7
length= 16 terabytes (2^44 bytes), time= 263825 seconds
no anomalies in 329 test result(s)
rng=RNG_stdin64, seed=0xf24baca7
length= 32 terabytes (2^45 bytes), time= 526487 seconds
no anomalies in 339 test result(s)
Of course, pcg64
and pcg64_fast
have large periods
(2128 and 2126, respectively) and large state
spaces (2255 and 2127, respectively). 32 TB of
output seems like a lot, but these are 64-bit generators outputting 8
bytes at a time, so we've only used 242 numbers from each
generator. We haven't even reached the square root of the period;
we're closer to the cube root. In my opinion, any generator with 128
bits of state (or more) that fails statistical tests must have some
serious flaws. More on that another time.
A greater challenge is to test PCG's smaller generators that produce
32-bit output. Because they only produce four bytes of output, 32 TB
represents 243 outputs. pcg32
and pcg32_fast
have
periods of 264 and 262 and state spaces of
2127 and 263, respectively. Some older papers
on random number generation suggest that you should never use more
numbers from a generator than the square root of its period, otherwise
statistical problems may occur. Those comments date back to an
earlier era of generation when generators were lower quality, but I
will admit that I had a moment of doubt. Was I wrong to have waited
so long to test with PractRand? Would it push these generators past
their limits? In theory, based on my prior headroom-based
pronouncements with TestU01, no, but what about in practice?
(TheorRand vs PractRand!) Let's see...
linux$ ./pcg32 | ./RNG_test stdin32
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x16692278
test set = normal, folding = standard (32 bit)
rng=RNG_stdin32, seed=0x16692278
length= 128 megabytes (2^27 bytes), time= 2.3 seconds
no anomalies in 117 test result(s)
[...]
rng=RNG_stdin32, seed=0x16692278
length= 8 terabytes (2^43 bytes), time= 130996 seconds
no anomalies in 244 test result(s)
rng=RNG_stdin32, seed=0x16692278
length= 16 terabytes (2^44 bytes), time= 260545 seconds
no anomalies in 252 test result(s)
rng=RNG_stdin32, seed=0x16692278
length= 32 terabytes (2^45 bytes), time= 519966 seconds
no anomalies in 260 test result(s)
linux$ ./pcg32_fast | ./RNG_test stdin32
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x3d5f2c7e
test set = normal, folding = standard (32 bit)
rng=RNG_stdin32, seed=0x3d5f2c7e
length= 128 megabytes (2^27 bytes), time= 2.3 seconds
no anomalies in 117 test result(s)
[...]
rng=RNG_stdin32, seed=0x3d5f2c7e
length= 8 terabytes (2^43 bytes), time= 131096 seconds
no anomalies in 244 test result(s)
rng=RNG_stdin32, seed=0x3d5f2c7e
length= 16 terabytes (2^44 bytes), time= 261300 seconds
no anomalies in 252 test result(s)
rng=RNG_stdin32, seed=0x3d5f2c7e
length= 32 terabytes (2^45 bytes), time= 518888 seconds
no anomalies in 260 test result(s)
One interesting thing to note here is that pcg32_fast
doesn't seem
to be significantly faster than pcg32
. Most of the time is due to
PractRand itself, and the small inefficiency of sending numbers over a
Unix pipe, not the random number generation process itself. In other
contexts, pcg32_fast
can be faster.
I could have stopped there, but PractRand provides the option to
apply more variations of its tests and try much harder to find
flaws. These extended tests make PractRand run more slowly and aren't
enabled by default, but it seemed worthwhile to really hammer on
pcg32_fast
as the most vulnerable generator of the ones I recommend
for general-purpose use.
linux$ ./pcg32_fast | time ./RNG_test stdin32 -te 1 -tf 2
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0xb5f0bbc4
test set = expanded, folding = extra
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 32 megabytes (2^25 bytes), time= 2.3 seconds
no anomalies in 796 test result(s)
[...]
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 2 terabytes (2^41 bytes), time= 153323 seconds
no anomalies in 1723 test result(s)
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 4 terabytes (2^42 bytes), time= 282780 seconds
no anomalies in 1770 test result(s)
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 8 terabytes (2^43 bytes), time= 539058 seconds
no anomalies in 1816 test result(s)
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 16 terabytes (2^44 bytes), time= 1064442 seconds
no anomalies in 1859 test result(s)
rng=RNG_stdin32, seed=0xb5f0bbc4
length= 32 terabytes (2^45 bytes), time= 2178649 seconds
no anomalies in 1901 test result(s)
./RNG_test stdin32 -te 1 -tf 2 2170392.00s user 9792.10s system 100% cpu 605:11:07.57 total
linux$
These extended-test results, which represent more than twenty-five days of continuous testing, reveal no problems up to 32 TB (the standard stopping point for PractRand).
Making PCG Fail
One of my key arguments about testing is that it isn't enough to proudly say “my generator passes statistical tests” (especially if it has a large period or state space). Instead we should squeeze it down smaller and smaller until it fails.
So, this time we'll test pcg_engines::setseq_xsh_rr_32_16
and
pcg_engines::mcg_xsh_rs_32_16
, which are half-size versions of
pcg32
and pcg32_fast
. Their periods are 232 and 230,
respectively, and they only produce 16-bit outputs (2 bytes), so it
would be utterly unrealistic to expect them to generate 32 TB of
data. They must fail. The question is, when? Let's find out!
We'll run PractRand so that it starts telling is about what it is doing a little sooner so we can enjoy some successes before we fail.
linux$ ./pcg_setseq_xsh_rr_32_16 | ./RNG_test stdin16 -tlmin 16
RNG_test using PractRand version 0.93
RNG = RNG_stdin16, seed = 0xe7dc4836
test set = normal, folding = standard (16 bit)
rng=RNG_stdin16, seed=0xe7dc4836
length= 64 kilobytes (2^16 bytes), time= 0.1 seconds
no anomalies in 39 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 128 kilobytes (2^17 bytes), time= 0.7 seconds
no anomalies in 43 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 256 kilobytes (2^18 bytes), time= 1.2 seconds
no anomalies in 50 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 512 kilobytes (2^19 bytes), time= 1.7 seconds
no anomalies in 56 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 1 megabyte (2^20 bytes), time= 2.3 seconds
no anomalies in 64 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 2 megabytes (2^21 bytes), time= 2.8 seconds
Test Name Raw Processed Evaluation
[Low4/16]DC6-9x1Bytes-1 R= +5.7 p = 5.3e-3 unusual
...and 71 test result(s) without anomalies
rng=RNG_stdin16, seed=0xe7dc4836
length= 4 megabytes (2^22 bytes), time= 3.4 seconds
no anomalies in 79 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 8 megabytes (2^23 bytes), time= 4.0 seconds
no anomalies in 87 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 16 megabytes (2^24 bytes), time= 4.6 seconds
no anomalies in 95 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 32 megabytes (2^25 bytes), time= 5.4 seconds
no anomalies in 103 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 64 megabytes (2^26 bytes), time= 6.6 seconds
no anomalies in 111 test result(s)
rng=RNG_stdin16, seed=0xe7dc4836
length= 128 megabytes (2^27 bytes), time= 8.7 seconds
Test Name Raw Processed Evaluation
FPF-14+6/16:all R= -5.3 p =1-1.0e-4 mildly suspicious
...and 118 test result(s) without anomalies
rng=RNG_stdin16, seed=0xe7dc4836
length= 256 megabytes (2^28 bytes), time= 12.4 seconds
Test Name Raw Processed Evaluation
FPF-14+6/16:all R= -6.4 p =1-7.5e-6 suspicious
...and 126 test result(s) without anomalies
rng=RNG_stdin16, seed=0xe7dc4836
length= 512 megabytes (2^29 bytes), time= 19.0 seconds
Test Name Raw Processed Evaluation
FPF-14+6/16:(0,14-0) R= -9.3 p =1-2.7e-8 very suspicious
FPF-14+6/16:(1,14-0) R= -8.5 p =1-1.6e-7 suspicious
FPF-14+6/16:all R= -11.5 p =1-7.5e-11 VERY SUSPICIOUS
FPF-14+6/16:all2 R= +31.0 p = 7.9e-14 FAIL
...and 131 test result(s) without anomalies
So, for pcg_engines::pcg_setseq_xsh_rr_32_16
, we begin to see
intimations of problems at 227 bytes (i.e., 226
16-bit outputs) and a clear fail at 229 bytes (i.e.,
228 16-bit outputs). If we extrapolate from that result,
pcg32
might start to show issues sometime around 252
outputs (i.e., 254 bytes; 4 petabytes), and would probably
have failed by the time we reach 256 outputs
(258 bytes; 256 petabytes). In other words, we can
estimate that pcg32 will fail PractRand if increase the test that took
a week to complete by a factor somewhere between 512 and 8192. In
other words, we might see issues with pcg32
after 10 years of
testing, and we can be pretty sure that if we test for 150 years,
it'll have failed by then.
I should also point out that any extrapolation from behavior at small
sizes is going to be imperfect. The only sure way to find out whether
the full size pcg32
generator takes a thousand years to fail, or
somehow fails unexpectedly after just a few of months, would be
actually run tests that might take years to complete. What these
small-scale tests provide is a back-of-the-envelope way to make an
informed guess about whether testing beyond 32TB is worthwhile. (The
headroom analysis in the PCG paper is a little more rigorous than the
quite simple test we've done here, but that too is necessarily
extrapolatory.)
Now let's move on to the tiny sibling of pcg32_fast
. We should
expect it to do worse, and it does.
linux$ ./pcg_mcg_xsh_rs_32_16 | ./RNG_test stdin16 -tlmin 16
seeding with 0xdd44ecf0
RNG_test using PractRand version 0.93
RNG = RNG_stdin16, seed = 0x1aac2a7b
test set = normal, folding = standard (16 bit)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 64 kilobytes (2^16 bytes), time= 0.1 seconds
no anomalies in 39 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 128 kilobytes (2^17 bytes), time= 0.7 seconds
no anomalies in 43 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 256 kilobytes (2^18 bytes), time= 1.2 seconds
no anomalies in 50 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 512 kilobytes (2^19 bytes), time= 1.8 seconds
no anomalies in 56 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 1 megabyte (2^20 bytes), time= 2.3 seconds
no anomalies in 64 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 2 megabytes (2^21 bytes), time= 2.8 seconds
no anomalies in 72 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 4 megabytes (2^22 bytes), time= 3.4 seconds
no anomalies in 79 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 8 megabytes (2^23 bytes), time= 4.0 seconds
no anomalies in 87 test result(s)
rng=RNG_stdin16, seed=0x1aac2a7b
length= 16 megabytes (2^24 bytes), time= 4.6 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +7.3 p = 2.5e-5 suspicious
...and 94 test result(s) without anomalies
rng=RNG_stdin16, seed=0x1aac2a7b
length= 32 megabytes (2^25 bytes), time= 5.4 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +9.1 p = 5.9e-7 very suspicious
...and 102 test result(s) without anomalies
rng=RNG_stdin16, seed=0x1aac2a7b
length= 64 megabytes (2^26 bytes), time= 6.5 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +17.4 p = 1.3e-14 FAIL !
Gap-16:B R= +9.5 p = 3.4e-8 very suspicious
FPF-14+6/16:all R= -6.3 p =1-9.0e-6 suspicious
...and 108 test result(s) without anomalies
Here we see PractRand becoming suspicious at 224 bytes (i.e.,
223 outputs) and failing at 226 (i.e.,
225 outputs). This generator has a period of
230, whereas pcg32_fast
has a period of 262. Thus
the equivalent points for pcg32_fast
would be 2(23/30)×62
and 2(25/30)×62, or somewhere between 0.72 petabytes of
output and 12.7 petabytes, or somewhere between 23 weeks and 7.8 years
of testing.
Comparison with SplitMix
To set these results into context, it helps to compare them with another PRNG. Popular choices that Internet folks might want to see are comparisons with SplitMix and Xoroshiro128+. XoroShiro isn't a good choice for comparison because it is already known to fail some of PractRand's statistical tests, as acknowledged by its authors in its source code, although the extent of the issues are larger, so that comparison will have to wait for another day (coming soon!).
In some ways I quite like SplitMix because it has some similar ideas to PCG in that it avoids having a trivial output function, but it does have a trivial state transition function. If we test a full-sized version of SplitMix, we find it passes PractRand.
linux$ ./splitmix | ./RNG_test stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xbb88093
test set = normal, folding = standard (64 bit)
rng=RNG_stdin64, seed=0xbb88093
length= 128 megabytes (2^27 bytes), time= 2.4 seconds
no anomalies in 148 test result(s)
[...]
rng=RNG_stdin64, seed=0xbb88093
length= 8 terabytes (2^43 bytes), time= 132138 seconds
no anomalies in 319 test result(s)
rng=RNG_stdin64, seed=0xbb88093
length= 16 terabytes (2^44 bytes), time= 265163 seconds
Test Name Raw Processed Evaluation
[Low1/64]FPF-14+6/16:all R= +5.4 p = 1.6e-4 unusual
...and 328 test result(s) without anomalies
rng=RNG_stdin64, seed=0xbb88093
length= 32 terabytes (2^45 bytes), time= 530227 seconds
Test Name Raw Processed Evaluation
[Low4/64]BCFN(2+2,13-0,T) R= +7.6 p = 1.4e-3 unusual
...and 338 test result(s) without anomalies
Notice that SplitMix is not (necessarily) starting to fail here. We already know that something statistically “unusual” must be expected to happen occasionally. The fact that different tests fail means that we're not (yet) seeing any systemic breakdown.
So we could say, SplitMix passes, PCG passes, all is good. But let's instead test a cut-down version of SplitMix equivalent to the cut-down versions of pcg32 we tested earlier. For a fair comparison, we'll test its 16-bit output function.
linux$ ./splitmix32 | ./RNG_test stdin16 -tlmin 16
RNG_test using PractRand version 0.93
RNG = RNG_stdin16, seed = 0xdfafcc2c
test set = normal, folding = standard (16 bit)
rng=RNG_stdin16, seed=0xdfafcc2c
length= 64 kilobytes (2^16 bytes), time= 0.1 seconds
no anomalies in 39 test result(s)
rng=RNG_stdin16, seed=0xdfafcc2c
length= 128 kilobytes (2^17 bytes), time= 0.7 seconds
no anomalies in 43 test result(s)
rng=RNG_stdin16, seed=0xdfafcc2c
length= 256 kilobytes (2^18 bytes), time= 1.2 seconds
no anomalies in 50 test result(s)
rng=RNG_stdin16, seed=0xdfafcc2c
length= 512 kilobytes (2^19 bytes), time= 1.8 seconds
no anomalies in 56 test result(s)
rng=RNG_stdin16, seed=0xdfafcc2c
length= 1 megabyte (2^20 bytes), time= 2.3 seconds
Test Name Raw Processed Evaluation
[Low4/16]Gap-16:B R= +4.1 p = 3.0e-3 unusual
...and 63 test result(s) without anomalies
rng=RNG_stdin16, seed=0xdfafcc2c
length= 2 megabytes (2^21 bytes), time= 2.8 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +5.4 p = 4.9e-4 unusual
...and 71 test result(s) without anomalies
rng=RNG_stdin16, seed=0xdfafcc2c
length= 4 megabytes (2^22 bytes), time= 3.4 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +6.6 p = 7.3e-5 mildly suspicious
...and 78 test result(s) without anomalies
rng=RNG_stdin16, seed=0xdfafcc2c
length= 8 megabytes (2^23 bytes), time= 4.0 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +8.4 p = 1.6e-6 very suspicious
...and 86 test result(s) without anomalies
rng=RNG_stdin16, seed=0xdfafcc2c
length= 16 megabytes (2^24 bytes), time= 4.6 seconds
Test Name Raw Processed Evaluation
Gap-16:A R= +16.8 p = 7.3e-13 FAIL
...and 94 test result(s) without anomalies
Here, we can see SplitMix32 showing a clear tell by 222
bytes (221 outputs) and failing by 224 bytes
(223 outputs). That result is clearly much worse than
pcg32_fast
, which is the weakest PCG generator I would recommend for
general purpose use.
If performance at small sizes really is a proxy to performance at larger sizes, the full-size SplitMix is actually very close to failing a regular run of PractRand. I've set a longer test in motion and we'll see what happens. If it does fail, it validates the approach of testing at small sizes. On the other hand, if it doesn't fail, it shows that the small-size results may sometimes be too pessimistic.
Regardless, it seems that in a PractRand statistical quality contest,
pcg32
and pcg32_fast
beat SplitMix (and Xoroshiro128+, because it
doesn't pass PractRand's tests to begin with).
Conclusion
Results from PractRand do indeed validate the statistical quality claims I've made about the PCG family. Yay!