a Rug story: adding test cases

These days I work on Rug, Atomist’s library for coding code modifications.

Adding a feature, I start by creating a test. While it’s tempting to create a narrow test around the piece of code I want to change, it’s better to create an API-level test. Testing at the outside has a few benefits: it tells the story of why this feature is needed; it drives pleasing API design; and it places minimum constraints on the implementation. The cost is, it’s more work.

The API level of Rug is in TypeScript, where people write programs to modify other programs. The test compiles the TypeScript to JavaScript, and rug executes that inside the JVM, where our Scala code does the tricky work of implementing the Rug programming model — navigating the project, parsing code, and making atomic modifications (it all works or none of it is saved). This means that my API-level tests include a TypeScript file and a Scala file, plus a bunch of wiring to hook them together. I get tired of remembering how to do this. Plus, we’re constantly improving the programming model in TypeScript, so “the right way” is a shifting target.

Last year, I would have copied an existing test (which one is up to date? I don’t know! guess and hope it works), modified parts of it for my needs (and forgotten some), embedded the TypeScript code as a string in Scala (seems easier than making a .ts file), and tried to abstract away some of the repetitive bits that are shared between tests (even though that obscures the storyline of the test).

This year, I have a new tool. About the third time I needed a new test, I wrote a program to create it for me. I wrote a Rug! My AddTypeScriptTest Rug editor creates a new TypeScript file in test/resources, and a new Scala file in test/scala. It bases these off of sample files that exemplify the current standard in Rugs and their tests, performing all the modifications that I mess up in the copy-paste-modify strategy.

me:

rug edit -l AddTypeScriptTest class_under_test=com.atomist.rug.NewFeature

my Rug program:

  • copies SampleTypeScriptTest.scala to a new location. Changes the package name, the class name, and the location of the TypeScript file it will load.
  • copies SampleTypeScriptTest.ts to a new location. Changes the name of the class and the exported instance.

SampleTypeScriptTest.scala and SampleTypeScriptTest.ts form a real test in rug’s test suite, so I know that my baseline continues to work. When I update the style of them (as I did today), I can run the sample test to be sure it works (caught two errors today). I maximize their design to best tell the story of how rug goes from a TypeScript file to a Rug archive to running that program on a separate project and seeing the results. This helps people spinning up on Rug understand it. Repetition (of the Scala package name and the path to the test program, for instance) doesn’t hurt because a program is modifying them consistently (bonus: IntelliJ will ctrl-click into the referenced file on the classpath. It didn’t when that repetition was abstracted). If I want to change the way all these tests work, I can do that with a Rug editor too, since they’re consistent. Ahhhh the consistency: when a test breaks, and it looks exactly like the other tests except for meaningful differences, debugging is easier.

I created this Rug editor inside the rug project itself, since it’s only relevant to this particular project. Then I run the rug CLI in local mode, on the local project, and poof. I’ve used rug to modify rug using a Rug inside rug. Super meta! (It doesn’t have to be so incestuous. Other days, I use rug to modify any project using a Rug in any Rug archive.)

If you want to create a Rug to automate your own frequent tasks, install the Rug CLI and, from your project root, use this Rug: rug edit atomist-rugs:rug-editors:AddLocalEditor editorName=WhatDoYouWantToCallIt . Find your starting point in .atomist/editors/WhatDoYouWantToCallIt.ts

Pop into Atomist community slack with questions and we will be soooo happy to help you.

TDD with generative testing: an example in Ruby

Say I’m in retail, and the marketing team has an app that helps them evaluate the sales of various items. I’m working on a web service that, given an item, tells them how many purchases were influenced by various advertising channels: the mobile app, web ads, and spam email.

The service will look up item purchase records from one system, then access the various marketing systems that know about ad clicks, mobile app usage, and email sends. It returns how many purchases were influenced by each, and uses some magic formula to calculate the relevance of each channel to this item’s sales.

My goal is to test this thoroughly at the API level. I can totally write an example-based test for this, with a nice happy-path input and hard-coded expected output. And then, I need to test edge cases and error cases. When no channels have impact; when they all have the same impact; when one fails; when they timeout; et cetera et cetera.

Instead, I want to write a few generative tests. What might they look like?

When I’m using test-driven development with generative testing, I start at the outside. What can I say about the output of this service? For each channel, the number of influenced purchases can’t be bigger than the total purchases. And the relevance number should be between 0 and 100, inclusive. I can assert that in rspec.

expect(influenced_purchases).to be <= total_purchases
expect(relevance).to be >= 0
expect(relevance).to be <= 100

These kinds of assertions are called “properties”. Here, “property” has NOTHING TO DO with a field on a class. In this context, a property is something that is always true for specified circumstances. The generated input will specify the circumstances.

To test this, I’ll need to run some input through my service and then make these checks for each output circle. It needs some way to query the purchase and marketing services, and I’m not going to make real calls a hundred times. Therefore my service will use adapters to access the outside world, and test adapters will serve up data.

result = InfluenceService.new(TestPurchaseAdapter.new(purchases),
                make_adapters(channel_events)).investigate(item)

result.channels.each do |(channel, influence)|
  expect(influence.influenced_purchases).to be <= total_purchases
  expect(influence.relevance).to be >= 0
  expect(influence.relevance).to be <= 100
end

(relatively complete code sample here.)
To do this, I need purchases, events on each channel, and an item. My test needs to generate these 100 times, and then do the assertions 100 times. I can use rantly for this. The test looks like this:

it “returns a reasonable amount of influence” do
 property_of {
  … return an array [purchases, channel_events, item] …
 }.check do |(purchaseschannel_eventsitem)|
  total_purchases = purchases.size
  result = InfluenceService.new(TestPurchaseAdapter.new(purchases),
                make_adapters(channel_events)).investigate(item)

  result.channels.each do |(channel, influence)|
   expect(influence.influenced_purchases).to be <= total_purchases
   expect(influence.relevance).to be >= 0
   expect(influence.relevance).to be <= 100
  end
 end
end

(Writing generators needs a post of its own.)
Rantly will call the property_of block, and pass its result into the check block, 100 times or until it finds a failure. Its objective is to disprove the property (which we assert is true for all input) by finding an input value that makes the assertions fail. If it does, it prints out that input value, so you can figure out what’s failing.

It does more than that, actually: it attempts to find the simplest input that makes the property fail. This makes finding the problem easier. It also helps me with TDD, because it boils this general test into the simplest case, the same place I might have started with traditional TDD.

In my TDD cycle, I make this test compile. Then it fails, and rantly reports the simplest case: no purchases, no events, any old item. After I make that test pass, rantly reports another simple input case that fails. Make that work. Repeat.

Once this test passes, all I have is a stub implementation. Now what? It’s time to add properties gradually. Usually at this point I sit back and think for a while. Compared to example-based TDD, generative testing is a lot more thinking and less typing. How can I shrink the boundaries?

It’s time for another post on relative properties.

TDD is Dead! Long Live TDD!

Imagine that you’re writing a web service. It is implemented with a bunch of classes. Pretend this circle represents your service, and the shapes inside it are classes.

The way I learned test-driven development[1], we wrote itty-bitty tests around every itty-bitty method in each class. Then maybe a few acceptance tests around the outside. This was supposed to help us drive design, and it was supposed to give us safety in refactoring. These automated tests would give us assurance, and make changing the code easier.

It doesn’t work out that way. Tests don’t enable change. Tests prevent change! In particular, when I want to refactor the internals of my service, any class I change means umpteen test changes. And all these tests include example == actual, and I’ve gotta figure out the new magic values that should pass. No fun! These method- or class-level tests are like bars in a cage preventing refactoring.

Tests prevent change, and there’s a place I want to prevent unintentional change: it’s at the service API level. At the outside, where other systems interact with this service, where a change in behavior could be a nasty surprise for some other team. Ideally, that’s where I want to put my automated tests.

Whoa, that is an ugly cage. At the service level, there are often many possible input scenarios. Testing every single one of them is painful. We probably can’t even think of every relevant combination and all the various edge cases. Much easier to zoom in to the class level and test one edge case at a time. Besides, even if we did write the dozens of tests to cover all the possibilities, what happens when the requirements change? Then we have great big tests with long expected == actual assertions, and we have to rework all of those. Bars in a cage, indeed.

Is TDD dead? Maybe it’s time to crown a new TDD. There’s a style of testing that addresses both of the difficulties in API-level testing: it finds all the scenarios and tames the profusion of hard-coded expectations. It’s called generative testing.[2]

Generative testing says, “I’m not gonna think of all the possible scenarios. I’m gonna write code that does it for me.” We write generators, which are objects that know how to produce random valid instances of various input types. The testing framework uses these to produce a hundred different random input scenarios, and runs all of them through the test.

Generative testing says, “I’m not gonna hard-code the output. I’m gonna make sure whatever comes out is good enough.” We can’t hard-code the output when we don’t know what the input is going to be. Instead, assertions are based on the relationship between the output and input. Sometimes we can’t be perfectly specific because we refuse to duplicate the code under test. In these cases we can establish boundaries around the output. Maybe, it should be between these values. It should go down as this input value goes up. It should never return more items than requested, that kind of thing.

With these, a few tests can cover many scenarios. Fortify with a few hard-coded examples if needed, and now half a dozen tests at the API level cover all the combinations of all the edge cases, as well as the happy paths.

This doesn’t preclude small tests that drive our class design. Use them, and then delete them. This doesn’t preclude example tests for documentation. Example-based, expected == actual tests, are stories, and people think in stories. Give them what they want, and give the computer what it wants: lots of juicy tests in one.

There are obstacles to TDD in this style. It’s way harder. It’s tough to find the assertions that draw a boundary around the acceptable results. There’s more thinking, less typing here. Lots more thinking, to find the assertions that draw a boundary around the acceptable output. That’s the hardest part, and it’s also the best part, because the real benefit of TDD is that it stops you from coding a solution to a problem you don’t understand.

look for more posts on this topic, to go along with my talks on it. See also my video about Property Based Testing in Scala


[1] The TDD I learned, at the itty-bitty level with mock all the things, was wrong. It isn’t what Kent Beck espoused. But it’s the easiest. [2] Or property-based testing, but that has NOTHING to do with properties on a class, so that name confuses people. Aside from that confusion I prefer “property-based”, which speaks about WHY we do this testing, over “generative”, which speaks about how.

Gradle is easy: Groovy tests

Today in Unit Testing, an evil framework class hid my important return value away in a private field.

Groovy to the rescue!

So far, all our code and tests are in Java. It took only two lines in build.gradle, as specified in the gradle documentation, and poof.

I put my little test class in src/test/groovy/package/structure, typed “gradle build” and wa-la! The results from the groovy test show up right along with the Java tests. Sweet.


class SomethingTest {
def email = "blah@blha.com";

UsefulClass usefulClass
CheckEmail subject

@Before
void setup() {
usefulClass = mock(UsefulClass.class);
subject = new Something(userService);
subject.setEmailAddress(email);
}

@Test
void falseWhenEmailFound() {
when(usefulClass.find(email)).thenReturn(new Thingie());

Reply result = subject.methodUnderTest();

assertEquals result.entity , Boolean.FALSE // see? that mean old private field is right there for the testing.
}