## 4.1 Pseudorandom number generators

Most simulation algorithms are based on algorithms for generating
*pseudorandom* uniformly distributed variables in \((0, 1)\). They arise from
deterministic integer sequences initiated by a *seed*. A classical
example of a pseudorandom integer generator is the
linear congruential generator. A sequence of numbers from
this generator is computed iteratively by
\[x_{n+1} = (a x_n + c) \text{ mod m}\]
for integer parameters \(a\), \(c\) and \(m\). The seed \(x_1\) is a
number between \(0\) and \(m - 1\), and the resulting
sequence is in the set \(\{0, \ldots, m - 1\}\). The ANSI C standard specifies
the choices \(m = 2^{31}\), \(a = 1,103,515,245\) and \(c = 12,345\). The generator
is simple to understand and implement but has been superseded by much better
generators.

Pseudorandom number generators are generally defined in terms of a finite state space \(\mathcal{Z}\) and a one-to-one map \(f : \mathcal{Z} \to \mathcal{Z}\). The generator produces a sequence in \(\mathcal{Z}\) iteratively from the seed \(\mathbf{z}_1 \in \mathcal{Z}\) by \[\mathbf{z}_n = f(\mathbf{z}_{n-1}).\] Pseudorandom integers are typically obtained as \[x_n = h(\mathbf{z}_n)\] for a transformation \(h : \mathcal{Z} \mapsto \mathbb{Z}\). If the image of \(h\) is in the set \(\{0, 1, \ldots, 2^{w} - 1\}\) of \(w\)-bit integers, pseudorandom numbers in \([0, 1)\) are typically obtained as \[x_n = 2^{-w} h(\mathbf{z}_n).\]

In R (3.6.0), the default pseudorandom number generator is the 32-bit
*Mersenne Twister*, which generates integers in the range
\[\{0, 1, \ldots, 2^{32} -1\}.\] The state space is
\[\mathcal{Z} = \{0, 1, \ldots, 2^{32} -1\}^{624},\]
that is, a state is a 624 dimensional vector of 32-bit integers.
The function \(f\) is of the form
\[f(\mathbf{z}) = (z_2, z_3, \ldots, z_{623}, f_{624}(z_1, z_2, z_{m + 1})),\]
for \(1 \leq m < 624\), and \(h\) is a function of \(z_{624}\) only.
The standard choice \(m = 397\) is used in the R implementation.
The function \(f_{624}\) is a bit complicated, it includes
what is known as the *twist transformation*, and it requires
additional parameters. The period of the generator
is the astronomical number
\[2^{32 \times 624 - 31} - 1 = 2^{19937} - 1,\]
which is a Mersenne prime.
Moreover, all combinations of consecutive integers
up to dimension 623 occur equally often in a period, and empirical tests of the
generator demonstrate that it has good statistical properties, though it
is known to fail some tests.

In R you can set the seed using the function `set.seed`

that takes an
integer argument and produces an element in the state space. The argument
given to `set.seed`

is not the actual seed, and `set.seed`

computes a
valid seed for any pseudorandom number generator that R is using,
whether it is the Mersenne Twister or not. Thus the use of `set.seed`

is
the safe and recommended way of setting a seed.

The actual seed (together with some additional information) can be accessed
via the vector `.Random.seed`

. Its first entry, `.Random.seed[1]`

, encodes the
pseudorandom number generator used as well as the generator for
Gaussian variables and discrete uniform variables. This
information is decoded by `RNGkind()`

.

`## [1] "Mersenne-Twister" "Inversion" "Rejection"`

For the Mersenne Twister,
`.Random.seed[3:626]`

contains the vector in the state space, while
`.Random.seed[2]`

contains the “current position” in the state vector.
The implementation needs a position variable because it does 624 updates of the
state vector at a time and then runs through those values sequentially
before the next update. This is equivalent to but more efficient than e.g.
implementing the position shifts explicitly as in the definition of \(f\) above.

```
set.seed(27112015) ## Computes a new seed from an integer
oldseed <- .Random.seed[-1] ## The actual seed
.Random.seed[1] ## Encoding of generators used, will stay fixed
```

`## [1] 10403`

`## [1] 624`

`## [1] 0.7793288`

Every time a random number is generated, e.g. by `runif`

above, the same underlying
sequence of pseudorandom numbers is used, and the state vector stored in
`.Random.seed`

is updated accordingly.

`## [1] 624 -1660633125 -1167670944 1031453153 815285806`

`## [1] 1 -696993996 -1035426662 -378189083 -745352065`

`## [1] 0.7793288 0.5613179`

`## [1] 2 -696993996 -1035426662 -378189083 -745352065`

Resetting the seed will restart the pseudorandom number generator with the same seed and result in the same sequence of random numbers.

`## [1] 624 -1660633125 -1167670944 1031453153 815285806`

`## [1] 624 -1660633125 -1167670944 1031453153 815285806`

`## [1] 0.7793288`

Note that when using any of the standard R generators, any value of \(0\) or \(1\) returned by the underlying pseudorandom uniform generator is adjusted to be in \((0,1)\). Thus uniform random variables are guaranteed to be in \((0, 1)\).

Some of the random number generators implemented in R use more than one pseudorandom number per variable. This is, for instance, the case when we simulate Gamma distributed random variables.

`## [1] 1.192619`

`## [1] 2 -696993996 -1035426662 -378189083 -745352065`

`## [1] 0.2794622`

`## [1] 5 -696993996 -1035426662 -378189083 -745352065`

In the example above, the first Gamma variable required two pseudorandom numbers, while the second required three pseudorandom numbers. The detailed explanation is given in Section 4.3, where it is shown how to generate random variables from the Gamma distribution via rejection sampling. This requires as a minimum two pseudorandom numbers for every Gamma variable generated.

### 4.1.1 Implementing a pseudorandom number generator

The development of high quality pseudorandom number generators is a research field in itself. This is particularly true if one needs theoretical guarantees for randomized algorithms or cryptographically secure generators. For scientific computations and simulations correct statistical properties, reproducibility and speed are more important than cryptographic security, but even so, it is not trivial to invent a good generator, and the field is still developing. For a generator to be seriously considered, its mathematical properties should be well understood, and it should pass (most) tests in standardized test suites such as TestU01, see L’Ecuyer and Simard (2007).

R provides a couple of alternatives to the Mersenne Twister,
see `?RNG`

, but there is no compelling reason to switch to any of those for ordinary
use. They are mostly available for historical reasons.
One exception is the L’Ecuyer-CMRG generator, which is useful when
independent pseudorandom sequences are needed for parallel computations.

Though the Mersenne Twister is a widely used pseudorandom number generator, it has well known shortcomings. There are high quality alternatives that are simpler and faster, such as the family of shift-register generators and their variations, but they are not currently available from the base R package.

Shift-register generators are based on linear transformations of the bit representation of integers. Three particular transformations are typically composed; the \(\mathrm{Lshift}\) and \(\mathrm{Rshift}\) operators and the bitwise \(\mathrm{xor}\) operator. Let \(z = [z_{31}, z_{30}, \ldots, z_0]\) with \(z_i \in \{0, 1\}\) denote the bit representation of a 32-bit (unsigned) integer \(z\) (ordered from most significant bit to least significant bit). That is, \[z = z_{31} 2^{31} + z_{30} 2^{30} + \ldots + z_2 2^2 + z_1 2^{1} + z_0.\] Then the left shift operator is defined as \[\mathrm{Lshift}(z) = [z_{30}, z_{29}, \ldots, z_0, 0],\] and the right shift operator is defined as \[\mathrm{Rshift}(z) = [0, z_{31}, z_{30}, \ldots, z_1].\] The bitwise xor operator is defined as \[\mathrm{xor}(z, z') = [\mathrm{xor}(z_{31},z_{31}') , \mathrm{xor}(z_{30}, z_{30}'), \ldots, \mathrm{xor}(z_0, z_0')]\] where \(\mathrm{xor}(0, 0) = \mathrm{xor}(1, 1) = 0\) and \(\mathrm{xor}(1, 0) = \mathrm{xor}(0, 1) = 1\). Thus a transformation could be of the form \[\mathrm{xor}(z, \mathrm{Rshift}^2(z)) = [\mathrm{xor}(z_{31}, 0) , \mathrm{xor}(z_{30}, 0), \mathrm{xor}(z_{29}, z_{31}), \ldots, \mathrm{xor}(z_0, z_2)].\]

One example of a shift-register based generator is Marsaglia’s xorwow algorithm, Marsaglia (2003). In addition to the shift and xor operations, the output of this generator is perturbed by a sequence of integers with period \(2^{32}\). The state space of the generator is \[\{0, 1, \ldots, 2^{32} -1\}^{5}\] with \[f(\mathbf{z}) = (z_1 + 362437 \ (\mathrm{mod}\ 2^{32}), f_1(z_5, z_2), z_2, z_3, z_4),\] and \[h(\mathbf{z}) = 2^{-32} (z_1 + z_2).\] The number 362437 is Marsaglia’s choice for generating what he calls a Weyl sequence, but any odd number will do. The function \(f_1\) is given as \[f_1(z, z') = \mathrm{xor}(\mathrm{xor}(z, \mathrm{Rshift}^2(z)), \mathrm{xor}(z', \mathrm{xor}(\mathrm{Lshift}^4(z'), \mathrm{Lshift}(\mathrm{xor}(z, \mathrm{Rshift}^2(z)))))).\]

This may look intimidating, but all the operations are very elementary. Take the number \(z = 123456\), say, then the intermediate value \(\overline{z} = \mathrm{xor}(z, \mathrm{Rshift}^2(z))\) is computed as follows:

\[ \begin{array}{ll} z & \texttt{00000000 00000001 11100010 01000000} \\ \mathrm{Rshift}^2(z) & \texttt{00000000 00000000 01111000 10010000} \\ \hline \mathrm{xor} & \texttt{00000000 00000001 10011010 11010000} \end{array} \]

And if \(z' = 87654321\) the value of \(f_1(z, z')\) is computed like this:

\[ \begin{array}{ll} \mathrm{Lshift}^4(z') & \texttt{01010011 10010111 11111011 00010000} \\ \mathrm{Lshift}(\overline{z}) & \texttt{00000000 00000011 00110101 10100000} \\ \hline \mathrm{xor} & \texttt{01010011 10010100 11001110 10110000} \\ z' & \texttt{00000101 00111001 01111111 10110001} \\ \hline \mathrm{xor} & \texttt{01010110 10101101 10110001 00000001} \\ \overline{z} & \texttt{00000000 00000001 10011010 11010000} \\ \hline \mathrm{xor} & \texttt{01010110 10101100 00101011 11010001} \end{array} \]

Converted back to a 32-bit integer, the result is \(f_1(z, z') = 1454123985\). The shift and xor operations are tedious to do by hand but extremely fast on modern computer architectures, and shift-register based generators are some of the fastest generators with good statistical properties.

To make R use the xorwow generator we need to implement it as a user supplied
generator. This requires writing the C code that implements the generator,
compiling the code into a shared object file, loading it into
R with the `dyn.load`

function, and finally calling `RNGkind("user")`

to make R use this pseudorandom number generator. See `?Random.user`

for some details and an example.

Using the Rcpp package, and `sourceCpp`

, in particular, is usually much preferred
over manual compiling and loading. However, in this case we need to make
functions available to the internals of R rather than exporting functions to be
callable from the R console. That is, nothing needs to be exported from C/C++.
If nothing is exported, `sourceCpp`

will actually not load the shared object
file, so we need to trick `sourceCpp`

to do so anyway. In the implementation
below we achieve this by simply exporting a direct interface to the xorwow generator.

```
#include <Rcpp.h>
#include <R_ext/Random.h>
/* The Random.h header file contains the function declarations for the functions
that R rely on internally for a user defined generator, and it also defines
the type Int32 as an unsigned int. */
static Int32 z[5]; // The state vector
static double res;
static int nseed = 5; // Length of the state vector
// Implementation of xorwow from Marsaglia's "Xorshift RNGs"
// modified so as to return a double in [0, 1). The '>>' and '<<'
// operators in C are bitwise right and left shift operators, and
// the caret, '^', is the xor operator.
double * user_unif_rand()
{
Int32 t = z[4];
Int32 s = z[1];
z[0] += 362437;
z[4] = z[3];
z[3] = z[2];
z[2] = s;
// Right shift t by 2, then bitwise xor between t and its shift
t ^= t >> 2;
// Left shift t by 1 and s by 4, xor them, xor with s and xor with t
t ^= s ^ (s << 4) ^ (t << 1);
z[1] = t;
res = (z[0] + t) * 2.32830643653869e-10;
return &res;
}
// A seed initializer using Marsaglia's congruential PRNG
void user_unif_init(Int32 seed_in) {
z[0] = seed_in;
z[1] = 69069 * z[0] + 1;
z[2] = 69069 * z[1] + 1;
z[3] = 69069 * z[2] + 1;
z[4] = 69069 * z[3] + 1;
}
// Two functions to make '.Random.seed' in R reflect the state vector
int * user_unif_nseed() { return &nseed; }
int * user_unif_seedloc() { return (int *) &z; }
// Wrapper to make 'user_unif_rand' callable from R
double xor_runif() {
return *user_unif_rand();
}
// This module will export two functions to be directly available from R.
// Note: if nothing is exported, `sourceCpp` will not load the shared
// object file generated by the compilation of the code, and
// 'user_unif_rand' will not become available to the internals of R.
RCPP_MODULE(xorwow) {
Rcpp::function("xor_set.seed", &user_unif_init, "Seeds Marsaglia's xorwow");
Rcpp::function("xor_runif", &xor_runif, "A uniform from Marsaglia's xorwow");
}
```

We first test the direct interface to the xorwow algorithm.

`## [1] 0.9090892`

Then we set R’s pseudorandom number generator to be our user supplied generator.

All R’s standard random number generators will after the call `RNGkind("user")`

rely on the user provided generator, in this case the xorwow generator.
Note that R does an “initial scrambling” of the argument given to `set.seed`

before it is passed on to our user defined initializer. This
scrambling turns 24102019 used below into 3573076633 used above.

`## [1] -721890663 9136518 -310030769 1191753796 194708085`

`## [1] 0.9090892`

`## [1] -721528226 331069150 9136518 -310030769 1191753796`

The code above shows the state vector of the xorwow algorithm when seeded
by the `user_unif_init`

function, and it also shows the
update to the state vector after a single iteration of the xorwow algorithm.

Though the xorwow algorithm is fast and simple, a benchmark study (not shown)
reveals that using xorwow instead of the Mersenne Twister doesn’t impact the
run time in a notable way when using e.g. `runif`

. The generator is simply
not the bottleneck. As the implementation
of xorwow above is experimental and has not been thoroughly tested, we will
not rely on it and quickly reset the random number generator to its default value.

### 4.1.2 Pseudorandom number packages

To benefit from the recent developments in pseudorandom number generators
we can turn to R packages such as the dqrng
package. It implements pcg64 from the PCG family of
generators as well as Xoroshiro128+ and Xoshiro256+
that are shift-register algorithms. Xoroshiro128+ is the default and other
generators can be chosen using `dqRNGkind`

. The usage of generators from dqrng
is similar to the usage of base R generators.

`## [1] 0.6172152`

Using the generators from dqrng does not interfere with the base R generators as the state vectors are completely separated.

In addition to uniform pseudorandom variables generated by `dqrunif`

the
dqrng package can generate exponential (`dqrexp`

) and Gaussian (`dqrnorm`

)
random variables as well as uniform discrete distributions (`dqsample`

and
`dqsample.int`

). All based on the fast pseudorandom integer generators that
the package includes. In addition, the package has a C++ interface that makes it
possible to use its generators in compiled code as well.

```
## Unit: milliseconds
## expr min lq mean median uq max neval
## runif(1e+06) 25.3 26.8 28.3 28.5 29.2 32 100
## dqrunif(1e+06) 3.7 5.8 7.7 6.6 6.9 49 100
```

As the benchmark above shows, `dqrunif`

is about six times faster than
`runif`

when generating one million variables. The other generators provided
by the dqrng package show similar improvements over the base R generators.

### References

L’Ecuyer, Pierre, and Richard Simard. 2007. “TestU01: A c Library for Empirical Testing of Random Number Generators.” *ACM Trans. Math. Softw.* 33 (4): 22:1–22:40. https://doi.org/10.1145/1268776.1268777.

Marsaglia, George. 2003. “Xorshift Rngs.” *Journal of Statistical Software* 8 (14): 1–6. https://doi.org/10.18637/jss.v008.i14.