Posted on :: 22 min read :: Tags:

I wanted to write this as a sequel to "What is a property?" where I would talk about how Property-Based Testing libraries generate random structures, don't worry, I'll still do that! But I realized that without talking about randomness in computers, the writing would be incomplete. So the first part of this article will go into a core problem that we consider "solved" in PBT, which is designing good random number generators, with the latter part talking about implementing complex random data generators on top of such RNGs.

Random Number Generation in Computers

Randomness is a noisy topic. At a first approximation, it signifies uncertainty, or orderlessness, or more formally, high entropy. Of course, in computer programs, random generation is fake; we use PRNGs (pseudo-random number generators), high-entropy functions of which the relationship between the seed and the generated random number is as arbitrary as possible. A very simple example of a PRNG is a linear congruential generator (LCG):

function lcg(seed: number): () => number {
  const a = 1664525;
  const c = 1013904223;
  const m = 2 ** 32;
  let state = seed;
  return () => {
    state = (a * state + c) % m;
    return state;
  };
}

const rand = lcg(42);
rand(); // 1083814273
rand(); // 380859148
rand(); // 2338846736

The randomness of the out of an LCG depends on its parameter choices, for instance a = 1 and c = 1 results in a counter module m. Precursor of LCG (according to Wikipedia) is apparently a Lehmer random number generator where c = 0, m is either a prime, or a power of a prime number; the initial seed must be a coprime of m, and a must be "an element of high multiplicative order modulo m". As I've said, I'm also learning about random number generation as I write this post, so I created a visualization. There are some presets, and you can also try other numbers if you would like.

Visualizing LCG quality

Pick a preset or tweak the parameters and watch what a, c, and m do to the output. The left panel plots consecutive triples (xโ‚™, xโ‚™โ‚Šโ‚, xโ‚™โ‚Šโ‚‚) (or pairs (xโ‚™, xโ‚™โ‚Šโ‚) if you switch the view) โ€” if visible structure shows up there, the generator is leaking patterns the spectral test would catch. The right panel is a histogram over [0, 1], which tells you whether the output is at least uniform on the surface.


patterns here mean weak randomness โ€” try RANDU in 3D
distribution over [0, 1]

There are a whole sequence of random number generators, there are new ones right up to 2023, it's an active area of research of which we frequently use its results, many of the algorithms show up on the random generation libraries and standart libraries of many PLs.

Of course, the discussion is incomplete with the famous lava lambs!

CF Lava Lambs

Once we have an algorithm that can take an arbitrary seed and produce a high-entropy random number out of it, we want a high-entropy source of those seeds too. For Cloudflare, it's apparently their pattern-absent lava lambs; for many of us, it's the OS source of randomness /dev/urandom that uses the only real source of randomness for a computer, the physical world, to produce seeds.

Armed with true randomness and well-designed algorithms, we can generate uniform random 64 bit numbers, so are we done? Well, the issue is, we really don't wanna generate 64 bit numbers, do we? We want to generate booleans, floats, bounded numbers, collections, complex structures... We also don't want uniform randomness per se, we wanna be able to bias the random generator if needed, perhaps to fit into certain other distributions; luckily, we can build all of this from our simple primitive 64 random bits.

Generating Floats

Random floating-point numbers typically generated via generating a random float in the interval [0.0, 1.0), and scaling the number based on the range. Nima Badizadegan has a much better article on random fp generation, so I'll just give the gist here, the following is from the Haskell random library:

-- | Generates uniformly distributed 'Double' in the range \([0, 1]\).
--   Numbers are generated by generating uniform 'Word64' and dividing
--   it by \(2^{64}\). It's used to implement 'UniformRange' instance for
--   'Double'.
--
-- @since 1.2.0
uniformDouble01M :: forall g m. StatefulGen g m => g -> m Double
uniformDouble01M g = do
  w64 <- uniformWord64 g
  return $ fromIntegral w64 / m
  where
    m = fromIntegral (maxBound :: Word64) :: Double

Essentially, we generate a random 64 bit integer, divide it by the largest 64 bit integer possible in the domain of the floating point numbers:

function randomFloat(): number {
  return rand() / 2 ** 64;
}

If we have a different range, we can just scale up [0, 1) to [a, b):

function randomFloatInRange(a: number, b: number): number {
  return a + (b - a) * randomFloat();
}

Generating Booleans

Generating a uniform random boolean is easy, rand gives us 64 random bits, we can just check any one of them;

function randomBoolean(): boolean {
  return (rand() & 1) === 0;
}

but do you have the courage to generate biased booleans? That is where we can rely on the random float generation:

function randomBiasedBoolean(p: number): boolean {
  return randomFloat() < p;
}

Generating Bounded Integers

We can easily model randomInt(a, b) as a + randomInt(0, b - a), so we can focus on solving random generation for [0, a). A simple approximation is rand() % a, with a classic problem, bias. When a is not a power of 2, then there is a part of the interval that'll be hit a little bit more because the bins (k % a) will not have the same number of elements, let's assume we have 4 bit integers, and our bound is 3 as you can see in the widget.

Here's how the distribution follows:

0: [0, 3, 6, 9, 12, 15]
1: [1, 4, 7, 10, 13]
2: [2, 5, 8, 11, 14]

As you can see, we have 6/16 changes to get 0, but only 5/16 chance to get 1 or 2. There's a 1/16 (0.0625) bias towards generating zeroes. Of course this bias gets lower and lower as we get larger intervals to pick numbers from, because the denominator gets greater every time; so at 64 bits, this effect is mostly negligible. However, we still want to solve it. How? We just keep iterating until we get a random number we like!

rand() mod n โ€” pink bars are favored buckets
rejection sampling โ€” flat within Monte Carlo noise

The way "favored" numbers work is for a given sampling space 2^N and a bound a, the numbers i < 2^N - k * a where k * a is the largest multiple of a that is smaller than or equal to 2^N. For the simple case of N=4, a=3, k is 5, so i < 2^4 - 5 * 3 gives us only 0. For any (N, a) though, this can never exceed 50% of all inputs. The effect is largest at 2^(N-1) + 1, such as N=4 and a=9. Here, all but 2 numbers within the bound have the bias, and all numbers higher than the bound within the randomness source "are" the bias. Even in this worst case, it is slightly less than 50% chance that we'll hit a biased part of the sampling space. So, we can just reject any biased sampling, and keep sampling with minimal cost here's a an example in Java (taken from):

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    uint32_t x, r;
    do {
        x = rng();
        r = x % range;
    } while (x - r > (-range));
    return r;
}

Now that we can randomly generate simple types, let's see how we actually use these building blocks in Property-Based Testing.

Generating Random Data for Testing Programs

The first thing we need when generating structures is the ability to generate multiple things, dependently. Well, we could technically just write:

function genNum(): number {
  let b = randBool(0.5);
  let c = b ? randBounded(0, 10) : randBounded(10, 20);
  return c;
}

There's a small issue here, which is that the RNG is hidden. We would ideally like to have the randomness state around so we can explicitly manipulate it. That's not that hard, we can just pass the rng to every function that uses it:

function genNum(rng): number {
  let b = randBool(rng, 0.5);
  let c = b ? randBounded(rng, 0, 10) : randBounded(rng, 10, 20);
  return c;
}

Another popular technique is (originated from QuickCheck I assume) is to instead have the generators return a recipe for generation (Gen<T>) instead of being functions that expect an rng to compute T.

function getNum(): Gen<number> {
  bind(randBool(0.5), (b) =>
  bind(c ? randBounded(0, 10) : randBounded(10, 20), (c) =>
    (ret c)
  ))
}

Then there is another function sample of the type Rng -> Gen<T> -> T that allows you to pull out T cleanly after producing the generator without needing to thread in the RNG. Another benefit of this approach is that you generator composition is now hidden under bind, so you can actually do more than just generating random structures. For instance, you can switch between different sources of randomness and styles of randomness.

The simplest is using an in-place mutable RNG:

// c1 mutates rng
type Gen<T> = (rng) => T;

function bind<T1, T2>(c1: Gen<T1>, c2: (T1) => Gen<T2>) {
  return (rng) => {
    const t1 = c1(rng);
    return c2(t1)(rng);
  };
}

Using a splittable RNG is somewhat more safe; they are pure and parallelizable at the cost of being a bit slower.

type Gen<T> = (rng: Rng) => T;

function bind<T1, T2>(c1: Gen<T1>, c2: (T1) => Gen<T2>): Gen<T2> {
  return (rng) => {
    const [r1, r2] = split(rng);
    return c2(c1(r1))(r2);
  };
}

The last popular option is to use a choice buffer or sequence, Hypothesis style:

type Gen<T> = (buf: ChoiceBuffer) => T;

function bind<T1, T2>(c1: Gen<T1>, c2: (T1) => Gen<T2>): Gen<T2> {
  return (buf) => {
    const t1 = c1(buf);
    return c2(t1)(buf);
  };
}

In the randomness buffer approach, each generator moves the cursor along the buffer by reading a number of random bytes from it; the corollary of which is that you get internal shrinking and you can use an existing fuzzer to drive your tests. However, any one of these approaches give you roughly the same capabilities, generate something, update the RNG state, generate the next thing, and so on... So let's assume we want to write a list generator, here's what a uniform random list looks like:

function genList<T>(rng, g: Gen<T>): T[] {
  const len = randPositiveInteger(rng);
  const out = [];
  for (let i = 0; i < len; i++) {
    out.push(g(rng));
  }
  return out;
}

The issue here is that however theoretically sound, we do not want uniform lists distributed over unifornm lengths. There are two competing forces here, one is that generating and testing small lists are much faster. So we can generate and test 1000 lists of length 10 in the time it takes to generate and test a list of length 10000. This is coupled with the "small-scope theory" which asserts that many of the bugs we can find via large test cases also show up for small cases, which is the basis for using enumerative testing techniques that exhaust the search space from small to large inputs. Of course, not all bugs emerge at small scales, and exhaustive search is very expensive. We can also couple this fact with another theory, which says that large inputs exhibit behaviors you would observe in a variety of smaller tests, so generating and testing one large input can give you more information about the system then N small inputs can.

These competing theories, both of which can be correct in different circumstances, mean we can to generate both small and large inputs at the same time, and we actually probably don't want to generate uniformly, we want bias towards specific parts of the search space. For instance, Hypothesis has _constant_floats and _constant_strings that they inject into an otherwise very random set of inputs. (sneak peek):

_constant_floats = (
    [
        ...
        3.402823466e38,
        9007199254740992.0,
        1 - 10e-6,
        2 + 10e-6,
        1.192092896e-07,
        2.2204460492503131e-016,
    ]
    + [2.0**-n for n in (24, 14, 149, 126)]  # minimum (sub)normals for float16,32
    + [float_info.min / n for n in (2, 10, 1000, 100_000)]  # subnormal in float64
)

_constant_strings = {
    ...
    # readable variations on text (bolt/italic/script)
    "๐“๐ก๐ž ๐ช๐ฎ๐ข๐œ๐ค ๐›๐ซ๐จ๐ฐ๐ง ๐Ÿ๐จ๐ฑ ๐ฃ๐ฎ๐ฆ๐ฉ๐ฌ ๐จ๐ฏ๐ž๐ซ ๐ญ๐ก๐ž ๐ฅ๐š๐ณ๐ฒ ๐๐จ๐ ",
    "๐•ฟ๐–๐–Š ๐––๐–š๐–Ž๐–ˆ๐– ๐–‡๐–—๐–”๐–œ๐–“ ๐–‹๐–”๐– ๐–๐–š๐–’๐–•๐–˜ ๐–”๐–›๐–Š๐–— ๐–™๐–๐–Š ๐–‘๐–†๐–Ÿ๐–ž ๐–‰๐–”๐–Œ",
    "๐‘ป๐’‰๐’† ๐’’๐’–๐’Š๐’„๐’Œ ๐’ƒ๐’“๐’๐’˜๐’ ๐’‡๐’๐’™ ๐’‹๐’–๐’Ž๐’‘๐’” ๐’๐’—๐’†๐’“ ๐’•๐’‰๐’† ๐’๐’‚๐’›๐’š ๐’…๐’๐’ˆ",
    "๐“ฃ๐“ฑ๐“ฎ ๐“บ๐“พ๐“ฒ๐“ฌ๐“ด ๐“ซ๐“ป๐“ธ๐”€๐“ท ๐“ฏ๐“ธ๐” ๐“ณ๐“พ๐“ถ๐“น๐“ผ ๐“ธ๐“ฟ๐“ฎ๐“ป ๐“ฝ๐“ฑ๐“ฎ ๐“ต๐“ช๐”ƒ๐”‚ ๐“ญ๐“ธ๐“ฐ",
    "๐•‹๐•™๐•– ๐•ข๐•ฆ๐•š๐•”๐•œ ๐•“๐•ฃ๐• ๐•จ๐•Ÿ ๐•—๐• ๐•ฉ ๐•›๐•ฆ๐•ž๐•ก๐•ค ๐• ๐•ง๐•–๐•ฃ ๐•ฅ๐•™๐•– ๐•๐•’๐•ซ๐•ช ๐••๐• ๐•˜",
    # upsidown text
    "ส‡วษฏษ ส‡แด‰s ษนolop ษฏnsdแด‰ ษฏวษนoหฅ",
    # reserved strings in windows
    "NUL",
    "COM1",
    "LPT1",
    # scunthorpe problem
    "Scunthorpe",
    # zalgo text
    "แนฐฬบฬบฬ•oอž ฬทiฬฒฬฌอ‡ฬชอ™nฬฬ—อ•vฬŸฬœฬ˜ฬฆอŸoฬถฬ™ฬฐฬ kรจอšฬฎฬบฬชฬนฬฑฬค ฬ–tฬอ•ฬณฬฃฬปฬชอžhฬผอ“ฬฒฬฆฬณฬ˜ฬฒeอ‡ฬฃฬฐฬฆฬฌอŽ ฬขฬผฬปฬฑฬ˜hอšอŽอ™ฬœฬฃฬฒอ…iฬฆฬฒฬฃฬฐฬคvฬปอeฬบฬญฬณฬชฬฐ-mฬขiอ…nฬ–ฬบฬžฬฒฬฏฬฐdฬตฬผฬŸอ™ฬฉฬผฬ˜ฬณ ฬžฬฅฬฑฬณฬญrฬ›ฬ—ฬ˜eอ™pอ rฬผฬžฬปฬญฬ—eฬบฬ ฬฃอŸsฬ˜อ‡ฬณอฬอ‰eอ‰ฬฅฬฏฬžฬฒอšฬฌอœวนฬฌอŽอŽฬŸฬ–อ‡ฬคtอฬฌฬคอ“ฬผฬญอ˜อ…iฬชฬฑnอ gฬดอ‰ ออ‰อ…cฬฌฬŸhอกaฬซฬปฬฏอ˜oฬซฬŸฬ–อฬ™ฬอ‰sฬ—ฬฆฬฒ.ฬจฬนอˆฬฃ",
    #
    ...
}

Another typical approach is the use of "size" in biasing. In the case of Hypothesis (as far as I know), it starts with a small number of small inputs and then promote to uniform randomness (with the exception of the constants). In the case of QuickCheck, the size function is a somewhat weird one:

computeSize :: State -> Int
computeSize MkState{replayStartSize = Just s,numSuccessTests = 0,numRecentlyDiscardedTests=0} = s
-- NOTE: Beware that changing this means you also have to change `prop_discardCoverage` as that currently relies
-- on the sequence produced by this function.
computeSize MkState{maxSuccessTests = ms, maxTestSize = mts, maxDiscardedRatio = md,numSuccessTests=n,numRecentlyDiscardedTests=d}
    -- e.g. with maxSuccess = 250, maxSize = 100, goes like this:
    -- 0, 1, 2, ..., 99, 0, 1, 2, ..., 99, 0, 2, 4, ..., 98.
    | n `roundTo` mts + mts <= ms ||
      n >= ms ||
      ms `mod` mts == 0 = (n `mod` mts + d `div` dDenom) `min` mts
    | otherwise =
      ((n `mod` mts) * mts `div` (ms `mod` mts) + d `div` dDenom) `min` mts
  where
    -- The inverse of the rate at which we increase size as a function of discarded tests
    -- if the discard ratio is high we can afford this to be slow, but if the discard ratio
    -- is low we risk bowing out too early
    dDenom
      | md > 0 = (ms * md `div` 3) `clamp` (1, 10)
      | otherwise = 1 -- Doesn't matter because there will be no discards allowed
    n `roundTo` m = (n `div` m) * m

Essentially, size is a counter to number of tests modulo maximum size, with a small edge case that the last cycle would be incomplete so instead they increase step size to make it complete too. In QuickChick, or in my other forks, I've usually just used log2(N) with some downward bias to make sure we're not glossing over small inputs (log2(16) = 4, so that means inputs up to size 3 are only tested for 15 inputs, not great). I don't think PBT literature really agrees on what the right way to do this is, it's also very hard to decide on what size means too. Here's the issue:

data Tree = Leaf Int | Node Tree Tree

genTree n :: Int -> Gen Tree
genTree n =
  if n == 0 then
    Leaf <$> arbitrary
  else
    oneof [Leaf <$> arbitrary, Node <$> genTree (n-1) <*> genTree (n-1)]

genTree' n :: Int -> Gen Tree
genTree' n =
  if n == 0 then
    Leaf <$> arbitrary
  else
    oneof [Leaf <$> arbitrary, Node <$> genTree' (n / 2) <*> genTree' (n / 2)]

Both of these generators are technically "correct", however the first one treats the size variable as its depth, while the second treats it as the number of nodes. The corollary of this is that the trees generated by the first one will be expontentially larger compared to the second one; they'll be exponentially slower to generate and execute, and for something like maxSize = 100, that slowness will be colossal. Some frameworks, again such as Hypothesis, or Hedgehog, use a recursive combinator that auto-handles the depth, hiding the sizing from the user. This is probably a good starting point, but when you think about trees a bit you'll also probably realize dividing the size n/2 is not the greatest idea. We'll end up with soo many balanced trees. Ideally we want right skewed, left skewed, balanced trees that can generate a variety of behaviors. This also goes back to the idea of Swarm Testing, where we essentially want to randomize the choices of randomness, otherwise you'll end up with random-looking structures, but for instance you'll never end up with a list of all 1's. Will Wilson has a great Papers We Love talk on this that I will link when it's available online. Another approach I've seen is to use Boltzmann sampling for diversifying generation. (1, 2, 3, 4).