Develop before define

First the loose thinking and the building up of a structure on unsound foundations and then the correction to stricter thinking and the substitutions a new underpinning beneath the already constructed mass.

Gregory Bateson on the advance of science. (From Steps to an Ecology of Mind)

This expresses a process I have observed in developers. We can develop something faster than we can define it.

That loose thinking includes the construction of loose code. We think with our fingers and eyes, keyboards and screens, editors and runtimes as well as with our brains. We try things, we draw them out or code them up. This eliminates a lot of impossible paths.

Then afterward, we shore up the useful ones. We put an API around it, error handling within, types throughout. We describe its interface and action in documentation.

Bateson grants permission to code loosely as an extension to thinking loosely, with the responsibility to return with rigor before we rope in other teams.

So do this, play in code the way we play in thought.

Then please realize that putting the foundations under it, defining the functionality so others can use it, is 10-100 times more time-consuming than your happy-path sketch.

Correctness

How important is correctness?

This is a raging debate in our industry today. I think the answer depends strongly on the kind of problem a developer is trying to solve: is the problem contracting or expanding? A contracting problem is well-defined, or has the potential to be well-defined with enough rigorous thought. An expanding problem cannot; as soon as you’ve defined “correct,” you’re wrong, because the context has changed.

A contracting problem: the more you think about it, the clearer it becomes. This includes anything you can define with math, or a stable specification: image conversion, what do you call it when you make files smaller for storage. There are others: ones we’ve solved so many times or used so many ways that they stabilize: web servers, grep. The problem space is inherently specified, or it has become well-defined over time.
Correctness is possible here, because there is such a thing as “correct.” Programs are useful to many people, so correctness is worth effort. Use of such a program or library is freeing, it scales up the capacity of the industry as a whole, as this becomes something we don’t have to think about.

An expanding problem: the more you think about it, the more ways it can go. This includes pretty much all business software; we want our businesses to grow, so we want our software to do more and different things with time. It includes almost all software that interacts directly with humans. People change, culture changes, expectations get higher. I want my software to drive change in people, so it will need to change with us.
There is no complete specification here. No amount of thought and care can get this software perfect. It needs to be good enough, it needs to be safe enough, and it needs to be amenable to change. It needs to give us the chance to learn what the next definition of “good” might be.

Safety
I propose we change our aim for correctness to an aim for safety. Safety means, nothing terrible happens (for your business’s definition of terrible). Correctness is an extreme form of safety. Performance is a component of safety. Security is part of safety.

Tests don’t provide correctness, yet they do provide safety. They tell us that certain things aren’t broken yet. Process boundaries provide safety. Error handling, monitoring, everything we do to compensate for the inherent uncertainty of running software in production, all of these help enforce safety constraints.

In an expanding software system, business matters (like profit) determine what is “good enough” in an expanding system. Risk tolerance goes into what is “safe enough.” Optimizing for the future means optimizing our ability to change.

In a contracting solution, we can progress through degrees of safety toward correctness, optimal performance. Break out the formal specification, write great documentation.

Any piece of our expanding system that we can break out into a contracting problem space, win. We can solve it with rigor, even make it eligible for reuse.

For the rest of it – embrace uncertainty, keep the important parts working, and make the code readable so we can change it. In an expanding system, where tests are limited and limiting, documentation becomes more wrong every day, the code is the specification. Aim for change.

Fun with Optional Typing: narrowing errors

After moving from Scala to Clojure, I miss the types. Lately I’ve been playing with Prismatic Schema, a sort of optional typing mechanism for Clojure. It has some surprising benefits, even over Scala’s typing sometimes. I plan some posts about interesting ones of those, but first a more ordinary use of types: locating errors.

Today I got an error in a test, and struggled to figure it out. It looked like this:[1]

expected: (= [expected-conversion] result)
  actual: (not (= [{:click {:who {:uuid “aeiou”}, :when #}, :outcome {:who {:uuid “aeiou”}, :when #, :what “bought 3 things”}}] ([{:click {:who {:uuid “aeiou”}, :when #}, :outcome {:who {:uuid “aeiou”}, :when #, :what “bought 3 things”}}])))

Hideous, right? It’s super hard to see what’s different between the expected and actual there. (The colors help, but the terminal doesn’t give me those.)

It’s hard to find the difference because the difference isn’t content: it’s type. I expected a vector of a map, and got a list of a vector of a map. Joy.

I went back and added a few schemas to my functions, and the error changed to

  actual: clojure.lang.ExceptionInfo: Output of calculate-conversions-since does not match schema: [(not (map? a-clojure.lang.PersistentVector))]

This says my function output was a vector of a vector instead of a map. (This is one of Schema’s more readable error messages.)

Turns out (concat (something that returns a vector)) doesn’t do much; I needed to (apply concat to-the-vector).[2]

Clojure lets me keep the types in my head for as long as I want. Schema lets me write them down when they start to get out of hand, and uses them to narrow down where an error is. Even after I spotted the extra layer of sequence in my output, it could have been in a few places. Adding schemas pointed me directly to the function that wasn’t doing what I expected.

The real point of types is that they clarify my thinking and document it at the same time. They are a skeleton for my program. I like Clojure+Schema because it lets me start with a flexible pile of clay, and add bones as they’re needed.

—–
[1] It would be less ugly if humane-test-output were activated, but I’m having technical difficulties with that at the moment.
[2] here’s the commit with the schemas and the fix.

One level deeper

It is often said that the developer should understand one level deeper than she’s working. If she’s writing Java, she should know how the JVM works. If he’s using a container, he should know conceptually what’s going on inside the container.

This statement is true for more than just runtimes and frameworks, but all the abstractions and innovations we’re building on. If we understand why our language provides certain features, then we can know when to use those features.

For instance, if we know the purposes of static typing, then we can know when to use it to the varying degrees available.
These purposes include:
1) Guarantee certain errors will not happen.
2) Document what each value means.
3) Identify places where a change impacts the code.
4) Prevent other coders from accidentally misusing our abstractions.

Based on the importance of the above purposes to our project — how bad is a runtime error? how many other developers need to use this? how confusing is it? — we might choose to use different levels of static typing.

In Java, for instance, we might shove information into a List of Strings all day long. Or, we might create custom datatypes to express an ordered collection of property names vs a nonempty collection of error messages. The more specifically we express our types, the more checking the compiler can do.

Yesterday I was bit in the butt when I instantiated an (unmodifiable) empty map and later tried to add to it. Why is there only one interface for Map? Why do I get UnsupportedOperationException instead of a red underline in my IDE saying “method ‘put’ does not exist on interface ReadableMap” ? When I have a read-only Map, I would like its type to express that.

Obtaining maximum benefit from the type system generally involves some attention on the part of the programmer, as well as a willingness to make good use of the facilities provided by the language.” — Pierce, Types and Programming Languages

The designers of Collections.emptyMap() did not put this level of attention into the type system.

Yet, in some cases typing this specificity is overkill. Pierce again:

The tension between conservativity and expressiveness is a fundamental fact of life in the design of type systems.

This means that static type checking sometimes prevents you from doing things that are perfectly valid. Maybe it’s just fine to pass in a constant string as a FirstName, but if your method expects a FirstName instead of String, extra code is required by Java’s type system.
Yesterday I took advantage of erasure to return a raw type when I didn’t need the data governed by the type parameter. Dirty, I know – but the statically checked type was overly restrictive in that case.

Understanding the purposes of static type checking can help us know what level of effort we should go to be as expressive as possible.

This is one example of getting past how to use the type system, into the why of its existence, so that we know when to use it. Understand one level of abstraction below where we work.