TDD with generative testing: an example in Ruby

Say I’m in retail, and the marketing team has an app that helps them evaluate the sales of various items. I’m working on a web service that, given an item, tells them how many purchases were influenced by various advertising channels: the mobile app, web ads, and spam email.

The service will look up item purchase records from one system, then access the various marketing systems that know about ad clicks, mobile app usage, and email sends. It returns how many purchases were influenced by each, and uses some magic formula to calculate the relevance of each channel to this item’s sales.

My goal is to test this thoroughly at the API level. I can totally write an example-based test for this, with a nice happy-path input and hard-coded expected output. And then, I need to test edge cases and error cases. When no channels have impact; when they all have the same impact; when one fails; when they timeout; et cetera et cetera.

Instead, I want to write a few generative tests. What might they look like?

When I’m using test-driven development with generative testing, I start at the outside. What can I say about the output of this service? For each channel, the number of influenced purchases can’t be bigger than the total purchases. And the relevance number should be between 0 and 100, inclusive. I can assert that in rspec.

expect(influenced_purchases).to be <= total_purchases
expect(relevance).to be >= 0
expect(relevance).to be <= 100

These kinds of assertions are called “properties”. Here, “property” has NOTHING TO DO with a field on a class. In this context, a property is something that is always true for specified circumstances. The generated input will specify the circumstances.

To test this, I’ll need to run some input through my service and then make these checks for each output circle. It needs some way to query the purchase and marketing services, and I’m not going to make real calls a hundred times. Therefore my service will use adapters to access the outside world, and test adapters will serve up data.

result =,

result.channels.each do |(channel, influence)|
  expect(influence.influenced_purchases).to be <= total_purchases
  expect(influence.relevance).to be >= 0
  expect(influence.relevance).to be <= 100

(relatively complete code sample here.)
To do this, I need purchases, events on each channel, and an item. My test needs to generate these 100 times, and then do the assertions 100 times. I can use rantly for this. The test looks like this:

it “returns a reasonable amount of influence” do
 property_of {
  … return an array [purchases, channel_events, item] …
 }.check do |(purchaseschannel_eventsitem)|
  total_purchases = purchases.size
  result =,

  result.channels.each do |(channel, influence)|
   expect(influence.influenced_purchases).to be <= total_purchases
   expect(influence.relevance).to be >= 0
   expect(influence.relevance).to be <= 100

(Writing generators needs a post of its own.)
Rantly will call the property_of block, and pass its result into the check block, 100 times or until it finds a failure. Its objective is to disprove the property (which we assert is true for all input) by finding an input value that makes the assertions fail. If it does, it prints out that input value, so you can figure out what’s failing.

It does more than that, actually: it attempts to find the simplest input that makes the property fail. This makes finding the problem easier. It also helps me with TDD, because it boils this general test into the simplest case, the same place I might have started with traditional TDD.

In my TDD cycle, I make this test compile. Then it fails, and rantly reports the simplest case: no purchases, no events, any old item. After I make that test pass, rantly reports another simple input case that fails. Make that work. Repeat.

Once this test passes, all I have is a stub implementation. Now what? It’s time to add properties gradually. Usually at this point I sit back and think for a while. Compared to example-based TDD, generative testing is a lot more thinking and less typing. How can I shrink the boundaries?

It’s time for another post on relative properties.

TDD is Dead! Long Live TDD!

Imagine that you’re writing a web service. It is implemented with a bunch of classes. Pretend this circle represents your service, and the shapes inside it are classes.

The way I learned test-driven development[1], we wrote itty-bitty tests around every itty-bitty method in each class. Then maybe a few acceptance tests around the outside. This was supposed to help us drive design, and it was supposed to give us safety in refactoring. These automated tests would give us assurance, and make changing the code easier.

It doesn’t work out that way. Tests don’t enable change. Tests prevent change! In particular, when I want to refactor the internals of my service, any class I change means umpteen test changes. And all these tests include example == actual, and I’ve gotta figure out the new magic values that should pass. No fun! These method- or class-level tests are like bars in a cage preventing refactoring.

Tests prevent change, and there’s a place I want to prevent unintentional change: it’s at the service API level. At the outside, where other systems interact with this service, where a change in behavior could be a nasty surprise for some other team. Ideally, that’s where I want to put my automated tests.

Whoa, that is an ugly cage. At the service level, there are often many possible input scenarios. Testing every single one of them is painful. We probably can’t even think of every relevant combination and all the various edge cases. Much easier to zoom in to the class level and test one edge case at a time. Besides, even if we did write the dozens of tests to cover all the possibilities, what happens when the requirements change? Then we have great big tests with long expected == actual assertions, and we have to rework all of those. Bars in a cage, indeed.

Is TDD dead? Maybe it’s time to crown a new TDD. There’s a style of testing that addresses both of the difficulties in API-level testing: it finds all the scenarios and tames the profusion of hard-coded expectations. It’s called generative testing.[2]

Generative testing says, “I’m not gonna think of all the possible scenarios. I’m gonna write code that does it for me.” We write generators, which are objects that know how to produce random valid instances of various input types. The testing framework uses these to produce a hundred different random input scenarios, and runs all of them through the test.

Generative testing says, “I’m not gonna hard-code the output. I’m gonna make sure whatever comes out is good enough.” We can’t hard-code the output when we don’t know what the input is going to be. Instead, assertions are based on the relationship between the output and input. Sometimes we can’t be perfectly specific because we refuse to duplicate the code under test. In these cases we can establish boundaries around the output. Maybe, it should be between these values. It should go down as this input value goes up. It should never return more items than requested, that kind of thing.

With these, a few tests can cover many scenarios. Fortify with a few hard-coded examples if needed, and now half a dozen tests at the API level cover all the combinations of all the edge cases, as well as the happy paths.

This doesn’t preclude small tests that drive our class design. Use them, and then delete them. This doesn’t preclude example tests for documentation. Example-based, expected == actual tests, are stories, and people think in stories. Give them what they want, and give the computer what it wants: lots of juicy tests in one.

There are obstacles to TDD in this style. It’s way harder. It’s tough to find the assertions that draw a boundary around the acceptable results. There’s more thinking, less typing here. Lots more thinking, to find the assertions that draw a boundary around the acceptable output. That’s the hardest part, and it’s also the best part, because the real benefit of TDD is that it stops you from coding a solution to a problem you don’t understand.

look for more posts on this topic, to go along with my talks on it. See also my video about Property Based Testing in Scala

[1] The TDD I learned, at the itty-bitty level with mock all the things, was wrong. It isn’t what Kent Beck espoused. But it’s the easiest. [2] Or property-based testing, but that has NOTHING to do with properties on a class, so that name confuses people. Aside from that confusion I prefer “property-based”, which speaks about WHY we do this testing, over “generative”, which speaks about how.