I wanted to write this as a sequel to "What is a property?" where I would talk about how Property-Based Testing libraries generate random structures, don't worry, I'll still do that! But I realized that without talking about randomness in computers, the writing would be incomplete. So the first part of this article will go into a core problem that we consider "solved" in PBT, which is designing good random number generators, with the latter part talking about implementing complex random data generators on top of such RNGs.
Random Number Generation in Computers
Randomness is a noisy topic. At a first approximation, it signifies uncertainty, or orderlessness, or more formally, high entropy. Of course, in computer programs, random generation is fake; we use PRNGs (pseudo-random number generators), high-entropy functions of which the relationship between the seed and the generated random number is as arbitrary as possible. A very simple example of a PRNG is a linear congruential generator (LCG):
;
; // 1083814273
; // 380859148
; // 2338846736
The randomness of the out of an LCG depends on its parameter choices, for instance
a = 1 and c = 1 results in a counter module m. Precursor of LCG (according to
Wikipedia) is apparently a
Lehmer random number generator
where c = 0, m is either a prime, or a power of a prime number; the initial seed
must be a coprime of m, and a must
be "an element of high multiplicative order modulo m". As I've said, I'm also learning
about random number generation as I write this post, so I created a visualization. There
are some presets, and you can also try other numbers if you would like.
Visualizing LCG quality
Pick a preset or tweak the parameters and watch what a, c, and m do to the output.
The left panel plots consecutive triples (xโ, xโโโ, xโโโ) (or pairs (xโ, xโโโ) if
you switch the view) โ if visible structure shows up there, the generator is leaking
patterns the spectral test would catch. The right panel is a histogram over [0, 1],
which tells you whether the output is at least uniform on the surface.
patterns here mean weak randomness โ try RANDU in 3D
There are a whole sequence of random number generators, there are new ones right up to 2023, it's an active area of research of which we frequently use its results, many of the algorithms show up on the random generation libraries and standart libraries of many PLs.
Of course, the discussion is incomplete with the famous lava lambs!

Once we have an algorithm that can take an arbitrary seed and produce a high-entropy
random number out of it, we want a high-entropy source of those seeds too. For
Cloudflare, it's apparently their pattern-absent lava lambs; for many of us, it's the OS
source of randomness /dev/urandom that
uses the only real source of randomness for a computer, the physical world, to produce
seeds.
Armed with true randomness and well-designed algorithms, we can generate uniform random 64 bit numbers, so are we done? Well, the issue is, we really don't wanna generate 64 bit numbers, do we? We want to generate booleans, floats, bounded numbers, collections, complex structures... We also don't want uniform randomness per se, we wanna be able to bias the random generator if needed, perhaps to fit into certain other distributions; luckily, we can build all of this from our simple primitive 64 random bits.
Generating Floats
Random floating-point numbers typically generated via generating a random float in the interval [0.0, 1.0), and scaling the number based on the range. Nima Badizadegan has a much better article on random fp generation, so I'll just give the gist here, the following is from the Haskell random library:
-- | Generates uniformly distributed 'Double' in the range \([0, 1]\).
-- Numbers are generated by generating uniform 'Word64' and dividing
-- it by \(2^{64}\). It's used to implement 'UniformRange' instance for
-- 'Double'.
--
-- @since 1.2.0
uniformDouble01M g = do
w64 <- uniformWord64 g
return $ fromIntegral w64 / m
where
m = fromIntegral (maxBound :: Word64) :: Double
Essentially, we generate a random 64 bit integer, divide it by the largest 64 bit integer possible in the domain of the floating point numbers:
If we have a different range, we can just scale up [0, 1) to [a, b):
Generating Booleans
Generating a uniform random boolean is easy, rand gives us 64 random bits, we can just
check any one of them;
but do you have the courage to generate biased booleans? That is where we can rely on the random float generation:
Generating Bounded Integers
We can easily model randomInt(a, b) as a + randomInt(0, b - a), so we can focus on
solving random generation for [0, a). A simple approximation is rand() % a, with a
classic problem, bias. When a is not a power of 2, then there is a part of the
interval that'll be hit a little bit more because the bins (k % a) will not have the
same number of elements, let's assume we have 4 bit integers, and our bound is 3 as you
can see in the widget.
Here's how the distribution follows:
0:
1:
2:
As you can see, we have 6/16 changes to get 0, but only 5/16 chance to get 1 or 2. There's a 1/16 (0.0625) bias towards generating zeroes. Of course this bias gets lower and lower as we get larger intervals to pick numbers from, because the denominator gets greater every time; so at 64 bits, this effect is mostly negligible. However, we still want to solve it. How? We just keep iterating until we get a random number we like!
rand() mod n โ pink bars are favored bucketsThe way "favored" numbers work is for a given sampling space 2^N and a bound a, the
numbers i < 2^N - k * a where k * a is the largest multiple of a that is smaller
than or equal to 2^N. For the simple case of N=4, a=3, k is 5, so
i < 2^4 - 5 * 3 gives us only 0. For any (N, a) though, this can never exceed 50%
of all inputs. The effect is largest at 2^(N-1) + 1, such as N=4 and a=9. Here,
all but 2 numbers within the bound have the bias, and all numbers higher than the bound
within the randomness source "are" the bias. Even in this worst case, it is slightly
less than 50% chance that we'll hit a biased part of the sampling space. So, we can just
reject any biased sampling, and keep sampling with minimal cost here's a an example in
Java
(taken from):
static uint32_t
Now that we can randomly generate simple types, let's see how we actually use these building blocks in Property-Based Testing.
Generating Random Data for Testing Programs
The first thing we need when generating structures is the ability to generate multiple things, dependently. Well, we could technically just write:
There's a small issue here, which is that the RNG is hidden. We would ideally like to
have the randomness state around so we can explicitly manipulate it. That's not that
hard, we can just pass the rng to every function that uses it:
Another popular technique is (originated from QuickCheck I assume) is to instead have
the generators return a recipe for generation (Gen<T>) instead of being functions that
expect an rng to compute T.
Then there is another function sample of the type Rng -> Gen<T> -> T that allows you
to pull out T cleanly after producing the generator without needing to thread in the
RNG. Another benefit of this approach is that you generator composition is now hidden
under bind, so you can actually do more than just generating random structures. For
instance, you can switch between different sources of randomness and styles of
randomness.
The simplest is using an in-place mutable RNG:
// c1 mutates rng
;
Using a splittable RNG is somewhat more safe; they are pure and parallelizable at the cost of being a bit slower.
;
The last popular option is to use a choice buffer or sequence, Hypothesis style:
;
In the randomness buffer approach, each generator moves the cursor along the buffer by reading a number of random bytes from it; the corollary of which is that you get internal shrinking and you can use an existing fuzzer to drive your tests. However, any one of these approaches give you roughly the same capabilities, generate something, update the RNG state, generate the next thing, and so on... So let's assume we want to write a list generator, here's what a uniform random list looks like:
The issue here is that however theoretically sound, we do not want uniform lists
distributed over unifornm lengths. There are two competing forces here, one is that
generating and testing small lists are much faster. So we can generate and test 1000
lists of length 10 in the time it takes to generate and test a list of length 10000.
This is coupled with the "small-scope theory" which asserts that many of the bugs we can
find via large test cases also show up for small cases, which is the basis for using
enumerative testing techniques that exhaust the search space from small to large inputs.
Of course, not all bugs emerge at small scales, and exhaustive search is very
expensive. We can also couple this fact with another theory, which says that large
inputs exhibit behaviors you would observe in a variety of smaller tests, so generating
and testing one large input can give you more information about the system then N
small inputs can.
These competing theories, both of which can be correct in different circumstances, mean
we can to generate both small and large inputs at the same time, and we actually
probably don't want to generate uniformly, we want bias towards specific parts of the
search space. For instance, Hypothesis has _constant_floats and _constant_strings
that they inject into an otherwise very random set of inputs. (sneak peek):
=
=
Another typical approach is the use of "size" in biasing. In the case of Hypothesis (as far as I know), it starts with a small number of small inputs and then promote to uniform randomness (with the exception of the constants). In the case of QuickCheck, the size function is a somewhat weird one:
computeSize MkState{replayStartSize = Just s,numSuccessTests = 0,numRecentlyDiscardedTests=0} = s
-- NOTE: Beware that changing this means you also have to change `prop_discardCoverage` as that currently relies
-- on the sequence produced by this function.
computeSize MkState{maxSuccessTests = ms, maxTestSize = mts, maxDiscardedRatio = md,numSuccessTests=n,numRecentlyDiscardedTests=d}
-- e.g. with maxSuccess = 250, maxSize = 100, goes like this:
-- 0, 1, 2, ..., 99, 0, 1, 2, ..., 99, 0, 2, 4, ..., 98.
| n `roundTo` mts + mts <= ms ||
n >= ms ||
ms `mod` mts == 0 = (n `mod` mts + d `div` dDenom) `min` mts
| otherwise =
((n `mod` mts) * mts `div` (ms `mod` mts) + d `div` dDenom) `min` mts
where
-- The inverse of the rate at which we increase size as a function of discarded tests
-- if the discard ratio is high we can afford this to be slow, but if the discard ratio
-- is low we risk bowing out too early
dDenom
| md > 0 = (ms * md `div` 3) `clamp` (1, 10)
| otherwise = 1 -- Doesn't matter because there will be no discards allowed
n `roundTo` m = (n `div` m) * m
Essentially, size is a counter to number of tests modulo maximum size, with a small edge
case that the last cycle would be incomplete so instead they increase step size to make
it complete too. In QuickChick, or in my other forks, I've usually just used log2(N)
with some downward bias to make sure we're not glossing over small inputs
(log2(16) = 4, so that means inputs up to size 3 are only tested for 15 inputs, not
great). I don't think PBT literature really agrees on what the right way to do this is,
it's also very hard to decide on what size means too. Here's the issue:
data Tree = Leaf Int | Node Tree Tree
genTree n :: Int -> Gen Tree
genTree n =
if n == 0 then
Leaf <$> arbitrary
else
oneof [Leaf <$> arbitrary, Node <$> genTree (n-1) <*> genTree (n-1)]
genTree' n :: Int -> Gen Tree
genTree' n =
if n == 0 then
Leaf <$> arbitrary
else
oneof [Leaf <$> arbitrary, Node <$> genTree' (n / 2) <*> genTree' (n / 2)]
Both of these generators are technically "correct", however the first one treats the
size variable as its depth, while the second treats it as the number of nodes. The
corollary of this is that the trees generated by the first one will be expontentially
larger compared to the second one; they'll be exponentially slower to generate and
execute, and for something like maxSize = 100, that slowness will be colossal. Some
frameworks, again such as Hypothesis, or Hedgehog, use a recursive combinator that
auto-handles the depth, hiding the sizing from the user. This is probably a good
starting point, but when you think about trees a bit you'll also probably realize
dividing the size n/2 is not the greatest idea. We'll end up with soo many balanced
trees. Ideally we want right skewed, left skewed, balanced trees that can generate a
variety of behaviors. This also goes back to the idea of
Swarm Testing, where we essentially
want to randomize the choices of randomness, otherwise you'll end up with random-looking
structures, but for instance you'll never end up with a list of all 1's. Will Wilson has
a great Papers We Love talk on this that I will link when it's available online. Another
approach I've seen is to use Boltzmann sampling for diversifying generation.
(1,
2,
3,
4).