Simple Portable C++ Seed Entropy
In two recent posts, I've looked at seeding random number generators in in C++11 (looking at what's wrong with std::seed_seq
and developing something that avoids those flaws), but seed_seq
exists as a mechanism to “mix up” entropy that you give it. You still need to get that entropy from somewhere. So where?
Sadly std::random_device
Is Not Your Friend
The obvious source for external randomness we can use in seeding is std::random_device
. But as I mentioned in this post,
- It's hard to make
std::random_device
conform to the requirements of a seed sequence. - It's unspecified just how costly this “device” is to read from.
- It may not be nondeterministic at all, rendering it unfit for our purposes.
Portable code needs to look to other sources of entropy for RNG seeding. And even if std::random_device
works well, mixing in entropy from other sources can have a variety of benefits, including performance.
Other (Cheap) Sources of Entropy
If we can't rely on std::random_device
, what can we use? We will survey our options, focusing on low-cost and highly portable choices (rather than, say, trying to connect to random.org on the Internet).
Entropy from the Time
The current time has a long history as a source of seed entropy. Vast numbers of C programs use
srand(time(NULL));
to initialize the C RNG. In C++11 we can do better than the venerable C time
function though, because we have std::chrono::high_resolution_clock
. In fact, if we use the LLVM or GNU C++ libraries, we find that the (claimed) resolution of this clock is one nanosecond. Nanosecond resolution means that no calls that occur one after the other will ever see the same value, and makes it extremely unlikely that calls in programs executing at (almost) the same moment will observe the same value. With the low-order 32 bits cycling through 232 values every 4.3 seconds we can easily claim that a high-resolution clock can give a typical program 32 bits of entropy.
Seeding with Just the Clock
Let's see what happens if we seed with just a nanosecond resolution timer. First, let's try std::seed_seq
and see if there seems to be any pattern to its output:
uint32_t seedseq_random_using_clock()
{
uint64_t seed = std::chrono::high_resolution_clock::
now().time_since_epoch().count();
std::seed_seq seeder{uint32_t(seed),uint32_t(seed >> 32)};
++seed;
int out;
seeder.generate(&out,&out+1);
return out;
}
If we test this generator with TestU01's SmallCrush battery, we get these rather disappointing results:
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: std::seedseq (using std::chrono::high_resolution_clock)
Number of statistics: 15
Total CPU time: 00:01:15.25
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
1 BirthdaySpacings eps
2 Collision eps
3 Gap eps
4 SimpPoker eps
5 CouponCollector eps
6 MaxOft AD 1 - 2.1e-11
7 WeightDistrib eps
----------------------------------------------
All other tests were passed
In other words, if we use std::seed_seq
to seed two RNGs at nearby times, there will be a discernible pattern in their seeds.
Now let's try randutils::seed_seq_fe128
instead. not only does it pass SmallCrush, it passes Crush and BigCrush as well:
========= Summary results of BigCrush =========
Version: TestU01 1.2.3
Generator: randutils::seed_seq_fe128 (using std::chrono::high_resolution_clock)
Number of statistics: 160
Total CPU time: 12:02:01.31
All tests were passed
Suitably scrambled, sampled at high enough resolution, the time alone is good enough to provide workable statistically good seeds. But before we celebrate, we need to look a little closer.
Troubles with the Time
There are some issues with using C++11's high_resolution_clock
. One is that libraries lie. The GNU libstdc++ library claims that its high-resolution clock ticks every nanosecond, but on some platforms, including OS X, the underlying implementation uses a POSIX gettimeofday
system call which only has microsecond resolution (even though better alternatives exist).
And even if your C++ library tells the truth, there is no requirement that the high-resolution clock be especially high resolution. You could be using a C++ library where the clock only ticks once a second (or even once a year!).
So std::chrono::high_resolution_clock
might be a good (if limited) source of entropy (perhaps even able to do all the heavy lifting by itself), but it might be merely mediocre. It can't be our only source.
Entropy from the Memory Configuration
In C and C++ we can find out where in memory our variables and functions are stored. Thus, we can write programs like this one, which prints out where things are in memory:
#include <iostream>
#include <iomanip>
#include <cstddef>
#include <cstdlib>
static void lf()
{
// Do nothing
}
static int zsv;
static int isv = 1;
int main()
{
int lv;
void* mem = malloc(1);
free(mem);
std::cout << std::hex;
std::cout << "Local Variable " << (void*)&lv << std::endl;
std::cout << "Heap Memory " << mem << std::endl;
std::cout << "Static Variable (Zeroed) " << (void*)&zsv << std::endl;
std::cout << "Static Variable (Inited) " << (void*)&isv << std::endl;
std::cout << "Local Function " << (void*)&lf << std::endl;
std::cout << "Library Function " << (void*)&_Exit << std::endl;
}
If we run this program on OS X, we see the following results for three different runs:
Kind | Run 1 | Run 2 | Run 3 |
---|---|---|---|
Local Variable | 0x7fff5ea636f4 |
0x7fff5dd726f4 |
0x7fff50caf6f4 |
Heap Memory | 0x7fa7d1c04bb0 |
0x7f8eb3404bb0 |
0x7ffb6ac04bb0 |
Static Variable (Zeroed) | 0x10119e0cc |
0x101e8f0cc |
0x10ef520cc |
Static Variable (Inited) | 0x10119e0c8 |
0x101e8f0c8 |
0x10ef520c8 |
Local Function | 0x10119d8a0 |
0x101e8e8a0 |
0x10ef518a0 |
Library Function | 0x7fff94c4256a |
0x7fff94c4256a |
0x7fff94c4256a |
Notice that almost everything is in a different location each run. A few years ago, the results would have been different—the memory addresses of everything would be the same every time—but today's operating systems improve security with address-space–layout randomization (ASLR). Not everything changes—we can see that in all cases the high-order bits and the low-order bits often stay the same. In addition, the address of the library function stayed constant, but even that would change if we rebooted (or if we set the environment variable DYLD_SHARED_REGION=avoid
).
On Linux, by default, the memory map isn't quite as random:
Kind | Run 1 | Run 2 | Run 3 |
---|---|---|---|
Local Variable | 0x7fffa0b322cc |
0x7fff8cff3bdc |
0x7fffc8e07e4c |
Heap Memory | 0x2320c20 |
0x7aac20 |
0x249cc20 |
Static Variable (Zeroed) | 0x6021b8 |
0x6021b8 |
0x6021b8 |
Static Variable (Inited) | 0x602098 |
0x602098 |
0x602098 |
Local Function | 0x400db0 |
0x400db0 |
0x400db0 |
Library Function | 0x400950 |
0x400950 |
0x400950 |
In these Linux tests, the stack and the heap have moved around but all the code has stayed in the exact same place. Linux can move libraries and application code around as well, but does not do so by default, choosing to do so only for applications where security is considered to be important. The claimed reason for limiting ASLR and keeping code at fixed addresses is one of efficiency—position-independent executables purportedly adds a small amount of additional overhead. Although both OS X and OpenBSD consider the overhead acceptable for the benefit of making attacks difficult, mainstream Linux distributions apparently do not.
But we can override Linux's defaults and compile position-independent code by running by using the compiler arguments -pie -fpie
:
Kind | Run 1 | Run 2 | Run 3 |
---|---|---|---|
Local Variable | 0x7ffd6d03e18c |
0x7ffec84f456c |
0x7fff745b05bc |
Heap Memory | 0x7f47ef07ac20 |
0x7f89b0f24c20 |
0x7fbddf6e4c20 |
Static Variable (Zeroed) | 0x7f47edf2f09c |
0x7f89b002009c |
0x7fbddf50f09c |
Static Variable (Inited) | 0x7f47edf2f090 |
0x7f89b0020090 |
0x7fbddf50f090 |
Local Function | 0x7f47edd2df70 |
0x7f89afe1ef70 |
0x7fbddf30df70 |
Library Function | 0x7f47ecf6b2d0 |
0x7f89af05c2d0 |
0x7fbdde54b2d0 |
Here we can see that everything is moving around, but it doesn't actually seem like the total entropy has changed, since the heap, static data, and functions all seem to be similarly located.
(Looking at startup performance with LD_DEBUG=statistics
[and DYLD_PRINT_STATISTICS
on OS X], I could find no appreciable performance difference between the position-independent and regular versions, which makes me think that OS X and OpenBSD have it right, and most Linux distributions have it wrong.)
Of course, it's possible that our code won't run on a system that has ASLR. But even on those systems, what is on the stack and on the heap may vary depending on exactly what has previously happened during the program's execution. And different threads will have different stacks, so it provides a mechanism to give different threads different entropy even if other parts of the entropy stay the same (e.g., checking the time at the exact same moment).
Thus the memory configuration will provide some entropy on all systems, and on systems with ASLR, we can hope it's providing about 32 bits worth (even if some of that is baked in at compile time rather than runtime).
Entropy from the CPU
Many CPUs have some kind of cycle counter. Although C++ provides no portable way to read such a counter, there are compiler-specific mechanisms.
- Clang provides the intrinsic function
__builtin_readcyclecounter
as a platform-independent way to access such a counter. (Although on platforms that do not provide an accessible cycle counter, such as ARM, it always returns zero.) - Intel provides a header file
<immintrin.h>
that declares several useful X86-specific intrinsics:-
__rdtsc
reads the timestamp counter (essentially equivalent to__builtin_readcyclecounter
). -
_rdrand64_step
provides a 64-bit cryptographically secure random number. Unfortunately, this facility only exists in Ivy Bridge (and later) Intel CPUs. Also some (possibly rather paranoid) people have expressed concern that this facility may have been somehow subverted. -
_rdseed64_step
provides a 64-bit “true” random number for use in seeding (generated from thermal noise). Unfortunately, this facility only exists in Broadwell (and later) Intel CPUs.
-
On the one hand, some of these sources seem to be truly excellent. On the other, the best ones are the least likely to be available. Even if we're only developing for Intel machines, we can't necessarily assume our hardware has Intel's randomness instructions (Sandy Bridge machines are still commonplace, for example, the previous generation Mac Pro).
For portable code, perhaps we can count on 32 bits of entropy from a cycle counter, but not much else. But, like nanosecond-resolution time, the Intel cycle counter alone is sufficient to pass TestU01's BigCrush battery provided we scramble the bits sufficiently (e.g., with seed_seq_fe128
).
Entropy from the Operating System
Although std::random_device
seems like the obvious source for operating system for entropy, there are some other traditional options. One of the classics for seeding RNGs on a Unix system is to use getpid()
to get the process id (PID), a small integer which should vary from run to run. Although getpid
is a Unix-ism, Microsoft Windows provides a similar call.
C++11 does, however, provide something remarkably similar for portable code—the thread id—which we can obtain via std::this_thread::get_id()
. Unfortunately, for some compilers/platforms, its value doesn't seem to vary from run to run as much as the PID does.
Entropy from Local State
Some of our entropy sources change every time we run the program, but remain the same for the duration of the run. Thus it's useful to also have an entropy source that explicitly changes every time we examine it. A simple counter can do that task.
Design Goals & Resulting Class
Ordinary programmers shouldn't have to think about all these considerations. The whole point of libraries should be to shield everyday programmers from worrying about unnecessary details. Ideally, std::random_device
would have been designed to serve as an entropy source (and work as a Seed Sequence), but since that's not the case, we can at least provide the missing functionality.
My way of wrapping up the process of gathering entropy for RNG seeding is to provide a class template auto_seeded
that can initialize any Seed Sequence using a combination of the entropy sources I've mentioned in this blog post. For some class S
, auto_seeded<S>
derives from S
but replaces the default constructor with a constructor that initializes S with some appropriately chosen entropy.
You can find the code in randutils.hpp
, which includes the code from this post and others in this series.
Thus, to create a Mersenne Twister engine with an arbitrary seed that varies from run to run, we can write
randutils::auto_seeded<std::seed_seq> seeds;
std::mt19937 rng{seeds};
As we've previously learned, std::seed_seq
has some issues, so we might want to prefer the alternative, seed_seq_fe
, that we developed in the previous blog post. Thus the code also provides auto_seed_128
and auto_seed_256
as handy aliases for auto_seeded<seed_seq_fe128>
and auto_seeded<seed_seq_fe256>
, allowing us to write:
randutils::auto_seed_128 seeds;
std::mt19937 rng{seeds};
Getting Technical
You might hope that we could compress the two lines of our previous example into a single line, and write it as
std::mt19937 rng{randutils::auto_seed_128{}};
Unfortunately, trying to use a temporary object here doesn't work, due to another minor glitch in the C++11 standard. If we try it, we'll get this error:
error: no matching constructor for initialization of 'std::mt19937'
std::mt19937 rng{randutils::auto_seed_128{}};
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/v1/random:2113:18: note:
candidate constructor [with _Sseq = randutils::auto_seed_128] not viable:
expects an l-value for 1st argument
The glitch in the standard is that Seed Sequences are passed by l-value reference, and you can't pass a temporary object as an l-value reference (or at least you can't without doing some extra work). The C++ committee should have made the seed parameter a universal reference and then we'd have been all good.
But there is actually a bigger issue than this one. The two lines of code I showed you are (technically) wrong. Once again, C++11's strange requirements of Seed Sequences bite us. For any legitimate Seed Sequence S
, auto_seeded<S>
is technically not a Seed Sequence, because the requirements state that the default constructor for a Seed Sequence
Creates a seed sequence with the same initial state as all other default-constructed seed sequences [of that type]
In other words, all default constructed seed sequences must have the same state! The whole point of our default constructor was to create seed sequences with (nondeterministically) different states.
Thankfully, there is a solution—we can just cast the object back to its base class which does have a dumb always-the-same default constructor. I thus provide a base
method to do just that. And, as a bonus, we get something nice for free, the ability to do all the initialization on one line using a temporary object (because the base
method does return a l-value reference!).
Thus the code actually becomes
std::mt19937 rng{randutils::auto_seed_128{}.base()};
Not the most beautiful thing, perhaps, but it is technically correct.
Testing and Performance
Here's a quick test program to show off the resulting class:
#include "randutils.hpp"
#include <cstdio>
#include <cstdint>
#include <cstddef>
int main()
{
uint32_t seeds[8];
for (int i = 0; i < 16; ++i) {
randutils::auto_seed_256 seeder;
seeder.generate(seeds, seeds+8);
bool first = true;
for (uint32_t seed : seeds) {
printf("%s%08x", first ? "" : ", ", seed);
first = false;
}
printf("\n");
}
}
Here's one run:
670e5a45, 9ac9ea9c, 88853620, c4c836f9, 07ab5640, b20b313e, 7b945051, 37f50e84
51cb6dd9, b2e34e8a, 8cc4f78e, 6530b128, 1a5eab3c, 0adae353, e3c1587b, dec6000e
45f869e9, 9c802b0a, db790af6, 6ced810c, e8a2a513, 09e6de7c, 87a55cea, 3b67b4a6
7d171d66, e53dd7e9, cc09c87e, 32672c79, 87f427a5, 079b721f, 7e844793, 4729f207
a0db8507, 4ed656d4, 4bc9cc88, bd723730, bb11303d, 979ac968, 34bdc61c, 389d75af
b4840033, e7176d4a, c0a01431, 52e58522, 2ff42863, b4bde340, 34177446, 56b100b6
fc555da1, 56888d72, 2a9c5000, a5a1a01a, 07fa8cfe, a949b563, 1c31c4bd, 37d45aef
4aad1d58, 5cfae50e, e37ecc55, efe03ba6, 6dc0f415, 93fa0f9e, 84c9fa62, b340c696
6767ca87, b0e54c11, 89045654, b889850a, 9aabe2bb, 45628c06, 46d666d6, 0eb4ad37
3273cef7, d0469f31, 759e4dd3, 6fecd66b, b7d350a7, b758b04f, 230c3361, 83cc1a0b
eb0a4104, b99d0441, c69d8737, 40fd935a, 695f7321, 0f71b9bd, 7b05e99f, 02e60d01
0ee38480, 5649ed46, b770579c, c627516b, 6510e5f3, 11df41f0, 5d287c54, 771f7a23
af81b6bb, e437c6df, f062227e, 165653f2, 4be238a8, 8f19f0c1, 001aa816, 2fef1cfb
e088cb5f, 126fb9b6, 4e43d64e, 9cd96242, e0dc26ab, a6b95e20, c989ea24, 9f414b5a
0fdd89ce, 4bea7340, 28a2f1b3, 20a92dec, 5729581a, d20a9670, c0deea0f, 6f0120c4
53f35b6d, 96031a00, 8ac98dc3, 96942558, 0376dd98, aa2e418d, b596fe6c, be355589
and here's another:
cad830e2, 7a572603, 67153026, 15e2b642, 9cbe2af4, d0f7acf3, 11fda8ee, b8cde0bd
9e5b8acd, 1b097e56, cbb3630c, 51b11bd5, 41738595, 4cc7fc18, 125bcf04, 932570f1
fd8854f5, 5c66ebbc, 807b3b14, e747bbf0, 0698eba5, 4875823f, 25c40a6b, 74238874
6fd21e7c, 728f027a, 3c8cc9aa, bf89f104, 5388e256, 7e5cc849, fd3718e1, 3c72215f
a18ec556, 0b523c1e, 6570c3e0, 7cbf8f70, 20f390db, f8d52009, 6851ed91, 55e5d56a
b6460b40, 54659aa8, 8a0ecf2b, fb5cffdc, 49143b62, 22c6c0cb, bb9e47db, 2fc20892
a6df5533, ae12f233, da955e9d, 0abc43df, f3e416f8, 6de0f041, 3d73b1ab, 2b6d860e
0466466c, f5284293, 9d642904, 7ef5f2a9, a4afded5, 0d098b32, 9f8669b6, 51b9c355
7f3222ef, 742f80fc, b3182b2c, 8d6b5fba, 4947df9a, f7ad0a7e, 93e27446, 89d9dfc8
c7cf0252, 755b253a, d8e2c647, 43c7272c, dafc95e1, dd65d784, ca1101d4, 74bdb176
eb37b834, 71509f10, ca168b17, 45855a2d, 4a2c9d6f, 29bece0e, 0d92f673, bc927e6a
daac0c7f, 851aa6b9, 84903a8b, df36a790, 92d87bab, 6096da95, bc79a802, 9b01dd37
5ad0b74e, f75188b6, e8ced8a5, 140c31d8, 3b937a7d, 086fb548, b20a7261, b4e31307
4a20ba89, 58827d51, e8d0fbd9, d052ecb4, e818f611, d97adda9, 33cbe5a2, 863cbc4a
39cb4142, 438fd23d, 7e2aa6bf, 58fdc4c1, 37f3e4da, ad449f06, 5aa3cc36, 3cf6a63a
ec75e458, cda85e15, d60eba81, 9b34dc4e, 6e39d9b4, aa742c69, 311f1950, 027bf991
Of course, as satisfying as it is to see some random output, we can't tell how good this output is just by looking, but, as we did earlier, we can abuse our seed sequence to act as a random number generator. Since we could pass TestU01 using only the high-resolution time, we should expect it to pass TestU01's test batteries. It does. Here are the results from running SmallCrush and BigCrush.
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: auto_seed_128
Number of statistics: 15
Total CPU time: 00:01:09.76
All tests were passed
========= Summary results of BigCrush =========
Version: TestU01 1.2.3
Generator: auto_seed_128
Number of statistics: 160
Total CPU time: 27:34:38.55
All tests were passed
Although its statistical performance is excellent, this time doesn't look stellar compared to actual random number generators. So let's look more closely at time performance. Here I'll use cycle counts rather than seconds (and take the median of 101 samples), and consider four possible ways to provide our seed data,
randutils::auto_seed_256 seeder;
randutils::auto_seed_128 seeder;
randutils::seed_seq_fe128 seeder{rdev(),rdev()};
std::seed_seq seeder{rdev(),rdev()};
where rdev
in the last two is an instance of std::random_device
.
auto_seed_128 |
auto_seed_256 |
seed_seq_fe_128 |
std::seed_seq |
|
---|---|---|---|---|
First Call (LLVM 3.6) | 114580 | 115232 | 20900 | 113796 |
First Call (GCC 4.9) | 31724 | 31153 | 25424 | 32132 |
Subsequent Calls (LLVM 3.6) | 440 | 704 | 8316 | 9928 |
Subsequent Calls (GCC 4.9) | 552 | 808 | 1252 | 1516 |
As I mentioned in a previous post the first-call performance numbers can be hard to make sense of. There's a lot going on, such as page faults bringing library code into memory and so forth. For LLVM, much of the overhead somehow relates to the libc++ C++ library—if we use Clang with the GNU C++ library (not shown above), it actually outperforms the GCC version. Although we could spend more time investigating, it's worth remembering that a one-time cost of 115,000 cycles on my test machine works out to be a mere 34 µs.
Looking beyond the strange world of first-call performance, the auto_seed_128
and auto_seed_256
classes outperform everything else, despite the number entropy sources they consult (and all the mixing done by seed_seq_fe
, which they're built on). In fact, auto_seed_128
can produce eight words of output in the time it takes to read a single word from GCC's rdrand
-based std::random_device
.
(These kind of benchmarks also show that between GCC and LLVM, there's no clear performance winner. Each has scenarios where it shines.)
Conclusions
With auto_seed_128
and auto_seed_256
we can generate high-quality nondeterministic seed entropy, and do so quickly. It works around various questionable decisions in the C++11 specification, from the possibility that std::random_device
isn't fit for its purpose, to the strange requirements of C++11 Seed Sequences, to the ungainly ways they must be passed to C++11 RNGs.
Of course, it's still a bit annoying that someone who wants to generate a random letter grade 'A' through 'D' needs to write
std::mt19937 rng_engine{randutils::auto_seed_128{}.base()};
std::uniform_int_distribution<char> dist{'A','D'};
char grade = dist(rng_engine);
That's a lot of visible machinery. But finally we have the pieces in place to fix that, and that'll be the topic for my next post.