Property-Based Testing with StreamData
Beyond example-based tests: generators, shrinking, and finding edge cases
Your test suite is lying to you.
Not maliciously. It's doing exactly what you asked; you wrote tests for the cases you thought of, and those cases pass. The problem is everything else---the edge cases hiding in the combinatorial explosion of possible inputs that no human could enumerate by hand.combinatorial-math
I've watched production systems fail on inputs that seemed obvious in hindsight. A Unicode character in a name field. A negative timestamp. An empty list where the code assumed at least one element. Each failure had a test suite that passed with flying colors; the tests verified what the developers imagined, not what users would actually do.
Property-based testing inverts the whole approach. Instead of testing specific examples, you define properties that should hold for all inputs, then let the computer generate thousands of test cases trying to break them. Checking a few points on a curve versus verifying the equation that defines it.
The Limits of Example-Based Testing
Traditional unit tests are example-based---you pick an input, call your function, assert the output:
test "adds two numbers" do
assert Calculator.add(2, 3) == 5
assert Calculator.add(0, 0) == 0
assert Calculator.add(-1, 1) == 0
end
Three cases. Your function handles integers from negative infinity to positive infinity. Three divided by infinity is not great coverage.
You could add more examples---maybe ten, maybe a hundred. But you're still guessing which inputs matter; you're almost certainly missing the weird ones, the inputs that expose integer overflow or rounding errors or the assumption buried three functions deep that a list is never empty.
Property-based testing asks a different question: what should be true for any valid input? For addition, several properties come to mind immediately---commutativity, identity, associativity:algebraic-properties
-
Commutativity:
add(a, b)should equaladd(b, a) -
Identity:
add(a, 0)should equala -
Associativity:
add(add(a, b), c)should equaladd(a, add(b, c))
These properties don't depend on specific values; they should hold whether you're adding 2 and 3 or 999,999 and -42. Generate random integers, verify the properties across thousands of combinations, and you get far more confidence than three hand-picked examples ever provide.
StreamData: Elixir's Property Testing Library
StreamData is Elixir's property-based testing library, maintained by the core team.streamdata-lineage Two things: generators for creating random data, and the check all macro for running property tests.
Add it to your mix.exs:
defp deps do
[
{:stream_data, "~> 1.0", only: [:dev, :test]}
]
end
Then import it in your test files:
defmodule MyApp.CalculatorTest do
use ExUnit.Case
use ExUnitProperties
property "addition is commutative" do
check all a <- integer(),
b <- integer() do
assert Calculator.add(a, b) == Calculator.add(b, a)
end
end
end
The check all macro generates random integers for a and b, then runs the assertion. By default it runs 100 cases per property.default-iterations If any case fails, StreamData reports the failing input and shrinks it to the minimal reproducer.
Writing Properties That Actually Test Something
The hard part isn't the syntax---it's figuring out what properties to test. Commutativity works for math; most business logic doesn't have clean mathematical properties. Four patterns show up again and again across different domains.
Round-trip properties: encode then decode; you should get the original value back.roundtrip-naming
property "JSON encoding round-trips" do
check all map <- map_of(string(:alphanumeric), integer()) do
assert map == map |> Jason.encode!() |> Jason.decode!()
end
end
Invariant properties: certain conditions should always hold after an operation. No exceptions.
property "sorting produces ordered output" do
check all list <- list_of(integer()) do
sorted = Enum.sort(list)
pairs = Enum.zip(sorted, Enum.drop(sorted, 1))
assert Enum.all?(pairs, fn {a, b} -> a <= b end)
end
end
Oracle properties: compare your implementation against a known-correct reference---slow but trusted.
property "my_sort matches Enum.sort" do
check all list <- list_of(integer()) do
assert MySorter.sort(list) == Enum.sort(list)
end
end
Idempotence: applying an operation twice produces the same result as applying it once. This one catches more bugs than you'd expect; normalization functions are notorious for breaking on the second pass.
property "normalizing email is idempotent" do
check all email <- email_generator() do
once = Email.normalize(email)
twice = Email.normalize(once)
assert once == twice
end
end
Generators: Built-In and Custom
StreamData ships with generators for the common types:
# Primitives
integer() # Any integer
positive_integer() # 1, 2, 3, ...
float() # Any float
boolean() # true or false
binary() # Random binary data
string(:alphanumeric) # Letters and numbers
atom(:alphanumeric) # Atoms from alphanumeric strings
# Collections
list_of(integer()) # [1, -3, 42, ...]
map_of(atom(:alphanumeric), string(:alphanumeric))
tuple({integer(), string(:alphanumeric)})
# Choosing from options
member_of([:pending, :active, :cancelled])
one_of([integer(), float()])
Real applications need custom generators. You build them by composing primitives with gen all---same shape as check all but it returns a generator instead of running assertions:gen-all-lazytree
def user_generator do
gen all name <- string(:alphanumeric, min_length: 1, max_length: 100),
email <- email_generator(),
age <- integer(18..120) do
%User{name: name, email: email, age: age}
end
end
def email_generator do
gen all local <- string(:alphanumeric, min_length: 1, max_length: 64),
domain <- string(:alphanumeric, min_length: 1, max_length: 63) do
"#{local}@#{domain}.com"
end
end
A word of caution on that email generator---it only produces .com addresses with alphanumeric local parts.email-generator-limits Good enough for testing most validation logic, but it won't catch bugs triggered by plus-addressing, internationalized domains, or the truly bizarre edge cases that RFC 5321 allows.
property "users can be serialized" do
check all user <- user_generator() do
assert {:ok, _} = User.to_json(user)
end
end
Shrinking: Finding the Minimal Failure
When a property fails, the random input that triggered it is usually large and noisy. Shrinking strips it down to the smallest input that still fails.integrated-shrinking
Consider this buggy function:
defmodule Buggy do
def process(list) when length(list) > 5 do
raise "Can't handle more than 5 elements"
end
def process(list), do: list
end
A property test might generate [42, -7, 999, 0, 8, -3, 100] as the failing input. Seven elements, arbitrary values, noise everywhere. StreamData shrinks this to [0, 0, 0, 0, 0, 0]---six zeros. The minimal list that triggers the bug; every element reduced to its simplest form.shrinking-zero
Shrinking works by trying progressively simpler values. For integers, it moves toward zero. For lists, it removes elements and shrinks what remains. For custom generators built with gen all, shrinking composes automatically from the component generators---you don't have to define shrink behavior yourself.
The output looks like this:
1) property users can be serialized (MyApp.UserTest)
test/my_app/user_test.exs:10
Failed with generated values (after 3 successful runs):
* Clause: user <- user_generator()
Generated: %User{name: "", email: "@.com", age: 18}
That shrunk result tells you exactly where to look. Empty name, malformed email. The original randomly generated user might have had a 50-character name obscuring what actually matters.
Testing Ecto Schemas and Changesets
Property testing is at its most useful when validating Ecto changesets. Instead of checking a handful of invalid inputs by hand, generate thousands:ecto-sandbox
defmodule MyApp.AccountTest do
use ExUnit.Case
use ExUnitProperties
alias MyApp.Accounts.User
property "valid users pass changeset validation" do
check all attrs <- valid_user_attrs() do
changeset = User.changeset(%User{}, attrs)
assert changeset.valid?, "Expected valid changeset for: #{inspect(attrs)}"
end
end
property "empty email fails validation" do
check all attrs <- valid_user_attrs() do
bad_attrs = Map.put(attrs, :email, "")
changeset = User.changeset(%User{}, bad_attrs)
refute changeset.valid?
assert {:email, _} = hd(changeset.errors)
end
end
property "age under 18 fails validation" do
check all attrs <- valid_user_attrs(),
bad_age <- integer(-1000..17) do
bad_attrs = Map.put(attrs, :age, bad_age)
changeset = User.changeset(%User{}, bad_attrs)
refute changeset.valid?
end
end
defp valid_user_attrs do
gen all name <- string(:alphanumeric, min_length: 1, max_length: 100),
email <- email_generator(),
age <- integer(18..120) do
%{name: name, email: email, age: age}
end
end
defp email_generator do
gen all local <- string(:alphanumeric, min_length: 1, max_length: 64),
domain <- string(:alphanumeric, min_length: 1, max_length: 63) do
"#{local}@#{domain}.com"
end
end
end
This finds edge cases that manual tests miss. Maybe your email regex rejects single-character local parts. Maybe age validation allows nil when it shouldn't. Maybe there's a length constraint you forgot about that only surfaces when StreamData generates a 100-character name. Property tests surface these by throwing variety at your code; they're doing the tedious work of exploring the input space so you don't have to.
Testing Business Logic
Business logic is where property-based testing earns its keep. Financial calculations, state machines, pricing engines---anywhere the logic is complex enough that enumerating cases by hand is a losing proposition.
A shopping cart:
defmodule MyApp.CartTest do
use ExUnit.Case
use ExUnitProperties
alias MyApp.Commerce.Cart
property "cart total equals sum of line items" do
check all items <- list_of(cart_item_generator(), min_length: 1) do
cart = Cart.new(items)
expected_total = items |> Enum.map(&(&1.price * &1.quantity)) |> Enum.sum()
assert Cart.total(cart) == expected_total
end
end
property "removing an item decreases total" do
check all items <- list_of(cart_item_generator(), min_length: 2),
index <- integer(0..(length(items) - 1)) do
cart = Cart.new(items)
item_to_remove = Enum.at(items, index)
original_total = Cart.total(cart)
updated_cart = Cart.remove_item(cart, item_to_remove.sku)
new_total = Cart.total(updated_cart)
assert new_total < original_total or item_to_remove.quantity == 0
end
end
property "applying discount never increases total" do
check all items <- list_of(cart_item_generator(), min_length: 1),
discount_percent <- integer(0..100) do
cart = Cart.new(items)
original_total = Cart.total(cart)
discounted = Cart.apply_discount(cart, discount_percent)
discounted_total = Cart.total(discounted)
assert discounted_total <= original_total
end
end
defp cart_item_generator do
gen all sku <- string(:alphanumeric, length: 8),
price <- positive_integer(),
quantity <- integer(0..100) do
%{sku: sku, price: price, quantity: quantity}
end
end
end
Those properties capture business rules without hard-coding prices or quantities. If someone changes the discount calculation and accidentally makes it increase prices for certain inputs---say, a rounding error on quantities above 50---the property test catches it. That's the kind of bug that slips through three hand-picked examples and shows up six months later in a customer invoice.
Practical Considerations
Property-based tests run slower than example-based tests; they're generating and evaluating hundreds of cases instead of three. You'll want to tune iteration counts---the default is 100, which works for most things. Lower it for expensive operations, raise it for critical paths:
property "critical calculation is correct", max_runs: 1000 do
# ...
end
When a property fails, StreamData prints the seed that generated the failing case. Pin it with initial_seed to reproduce the failure deterministically:seed-persistence
property "my property", initial_seed: 12345 do
# ...
end
One thing I've learned the hard way: start with simple generators. The instinct is to build generators that produce "realistic" data---proper names, well-formed emails, sensible prices. Resist it, at least initially. A generator that spits out empty strings and zeros finds more bugs than one producing polished test fixtures. The realistic generator feels more satisfying; the dumb one catches more issues.
Mix property tests and example tests freely. Properties verify general behavior across the input space; example tests document specific scenarios and edge cases you've already encountered in production. They're not competing approaches. Use both.example-complement
What Changes When You Think in Properties
The first few property tests take longer to write than example tests. You have to think differently---not "what's the right answer for this input?" but "what should always be true, regardless of input?" That shift is uncomfortable. It forces you to articulate invariants you've been relying on implicitly; assumptions buried so deep in the code that nobody thought to write them down.
But once you have generators for your core domain types, new properties come fast. And they catch things example tests never would---the Unicode character that breaks your parser, the boundary condition at integer limits, the empty collection that shouldn't have been empty but was.
I've seen property tests catch bugs that survived months in production, hiding behind edge cases nobody had thought to check.quickcheck-industry That's the gap between testing what you imagine and testing what's actually possible. The cases you think of are the easy ones. The hard ones---the ones that wake you up at 3 AM---are exactly the cases you'd never write by hand.