People tend to trivialize the concept of randomness in computing. While true randomness cannot be replicated by a computer, the occurrences of randomness in nature have provided us with considerable insight into the properties of a random sequence of numbers. We can replicate these properties in defined numerical sequences that behave tolerably similar to random sequences.
In this post, we will implement and examine a simple yet well known pseudorandom number generation algorithm known as the linear congruential method.
Randomness isn’t really something that can be defined, we can only observe certain properties of randomness such as the equal likelihood for any number within the domain to occur. In order to fulfill this definition in our generator, we also have to make sure that the random number is taken from a countable domain set. Beyond that however, randomness is more or less an ambiguous concept rather than anything tangible. It’s not at all insane (meh, maybe a little) to theorize that randomness is in fact a byproduct of a snowman pissing out a rainbow of low energy spectrum.
But weird pictures of snowmen aside, we can actually define what a random sequence of number may look like and fool the user into thinking that a non-random sequence is random because it appears to be random. For example,
0.0012512588885159 0.56358531449324 0.19330423902097 0.80874050111393 0.58500930814539 0.47987304300058 0.3502914517655 0.89596240119633 0.82284005249184 0.7466048158208 0.17410809656056 0.85894344920194 0.71050141911069
may look like a sequence of random numbers, but if you type in
math.randomseed(1) for i=1,13 do print(math.random()) end
you will always get this sequence of numbers.
Now I’m not saying that random sequences are unique, true randomness just cannot be generated from a preset pattern. But if all we’re asking for is something that looks like random sequence of numbers, then these pseudorandom number generators work pretty well.
1. The Linear Congruential PRNG
This simple, efficient, and still somewhat effective (for a PRNG that provides arbitrary numbers and can cycle through every element of a domain) algorithm was developed by D. Lehmer in 1951, which computes successive random numbers from an initial seed. The initial formulation is as follows:
X[i+1] = (b*X[i] + 1) mod m
Which is endowed with the property that the resulting X can never be larger than m-1. Also, for well chosen values of b and m, the PRNG will have a long period. Knuth suggests the following rules when choosing the b and m parameters:
- m needs to be large, commonly set to 2^32 (largest unsigned 32bit int) or a large number of base 2 so that modulus can be optimized to a bitwise and.
- b shouldn’t be too large or too small, it is usually 1 digit less than m
- b should end with …n21 where n is an even number. Admittedly this is a very weird requirement, but it has been shown to prevent the occurrence of a few “bad cases” uncovered by mathematical analysis.
Following these rules, the following is a simple implementation of a LCG:
a = 0 -- Seed function rand() a = (385934821*a + 1) % (2^32) return a end
which returns the following list:
1 385934822 192177344 2638077888 1098901248 17605376 1042075393 1810960640 853095680 3367332096 3509841152 2524184832 1521083648
Meh, looks kinda “random” to me. For all intensive purposes, this could pass off as a stream of random numbers in everyday use. (LCG’s are not usually used for cryptography as they’re not inherently secure)
2. A Slight Variation
Modern lightweight PRNG’s commonly implements a variation of the 1951 algorithm which introduces an increment parameter, c, that defines how the PRNG grows.
X[i+1] = (b*X[i] + c) mod m
In 1961, Greenberger also formalized a set of rules in order to maximize the noncyclic generation of numbers (IE: prevent the generation of the seed itself, which will cause the PRNG to cycle back)
- m and c are relatively prime, or that they have no common divisors / factors other than 1
- If there exists a prime number q that divides wholly into m, then the said q also divides into (b-1)
- If 4 divides into m, then 4 also divides into (b-1)
The rationales behind these and other requirements are discussed in heavier details elsewhere, just google the keywords “linear congruential generator”.
Shamelessly stolen from Wikipedia:
|Microsoft Visual/Quick C/C++||232||214013||2531011|
|MMIX by Donald Knuth||264||6364136223846793005||1442695040888963407|
|Random class in Java API||248||25214903917||11|
So to replicate the PRNG used by gcc, we can simply write
a = 0 -- Seed function rand() a = (1103515245*a + 12345) % (2^32) return a -- Note, we'll also need to mask the highest 2 bits (via mod 2^30), but for all intensive purposes... end
And to create a function that generates these PRNGs, we can write a factory function:
a = 0 -- Seed function randFactory(m, b, c) return function() a = (b*a + c) % m return a end end
However, a crucial advantage of the LCG is its ability to run multiple RNG streams without any significant overhead. Hence, in order to run concurrent instances of rand, we can cache the state of the RNG in a local variable that is an upvalue with respect to the actual rand function:
function randFactory(m, b, c, a) -- a is seed if not a then a = 0 end return function() a = (b*a + c) % m return a end end
so that here
rand = randFactory(2^32, 1103515245, 12345) for i=1,10 do print(rand()) end rand2 = randFactory(2^32, 1103515245, 12345) for i=1,10 do print(rand2()) end
would result in
12345 3554416254 2802067456 1358247936 2138638336 1459132416 1445521408 2518349824 4044081152 3908327424 12345 -- rand2() starts here 3554416254 2802067456 1358247936 2138638336 1459132416 1445521408 2518349824 4044081152 3908327424
The Glibc implementation actually consumes only 31 bits, this can be easily rectified via
a = 0 -- Seed function rand() a = (1103515245*a + 12345) % (2^32) return a%(2^31) -- The internal state is not pruned, still 32 bit end
In order to improve the effectiveness of the generator, some implementations produce more bits than are consumed. This is because the lower bits are relatively prone to collision with the seed so consuming only the higher bits will often improve the period of the RNG. In lua, bitwise shifts to the right can be implemented via division of (2^n), so we can replicate Java’s RNG via
a = 0 -- Seed function rand() a = (25214903917*a + 11) % (2^48) return a/(2^32) -- again, internal state is not modified end
And of course, the cheek in tongue solution:
rand = math.random