Horizonal goals

Video version here

There’s this great, short book by John Kay called Obliquity. It’s about goals that you can’t achieve by aiming for them directly; you have to look for an oblique goal that will happen to get you there. Like, you can’t aim for “happiness;” you have to find something such that aiming for it makes you happy, like raising children or writing or helping people who are hurting.

This book gives a name to some parts of my seamaps. The star at the top is the “high-level objective,” the unquantifiable goal which can never be achieved. Aiming for it sends us in a direction which happens to obliquely fill a goal such as “happiness” or “profit.” Goals such as “change the way development is done” or “find the optimal combination of music and words” or “address the observability needs of modern architectures” These are horizonal goals; as we make progress, the state of the art moves. We can never reach the horizon, but aiming for it takes us interesting places.

The mountains in the seamap are milestones. They’re achievable, measurable goals that we work toward because they’re in the direction of our high-level objective. Periodically we climb up and look around, take stock of whether our current direction is still going toward our star, and if not, change our milestone goals.

There are many smaller milestones on the way to the bigger one. Each offers an opportunity to take stock and possibly shift direction. There are actions that we take to move toward these goals. This is us in the boat, rowing.

Obliquity adds another element: necessary states. A necessary state to moving toward the next feature is: tests are passing. A necessary state for teamwork is that we are getting along with each other. Many of the actions we take are aimed at maintaining or restoring necessary states. These are like the whirlpools in my seamap; we have to smooth them out before we can row in the direction of our choice.

For example, here is a seamap for my current activity:

high-level objective: change the way people think about programming. Goal: explain Symmathecist. Subgoal: explain Horizonal. Necessary state: don't be too drunk. Action: type this post before opening wine.  

I will now hit “publish” and go open a bottle of wine.


Developers have a love-hate relationship with code re-use. As in, we used to love it. We love our code and we want it to run everywhere and help everyone. We want to get faster with time by harnessing the work of our former selves.
And yet, we come to hate it. Reuse means dependencies. It means couplings. It means surprises, when changing code impacts something we did not expect, or else it means don’t touch it, it’s too scary. It means trusting code we don’t understand because it’s code didn’t write.

Here’s the thing: sharing code is dangerous. Do it sparingly.

When reuse is bad

Let’s talk about sharing code. Take a business, developing software for its employees or its customers. Let’s talk about code within an organization that is referenced in more than one service, or by multiple flows in a monolith. (Monolith is defined as “one deployable unit maintained by more than one small team.”)

Let’s see some pictures. Purple Service here has some classes or functions that it finds useful, and the team thinks these would be useful elsewhere. Purple team breaks this code out into a library, the peachy circle.

purple circle, peach circle inside

Then someone from Purple team joins Blue team, and uses that library in Blue Service. You think it looks like this:

peach circle under blue and purple circles

Nah, it’s really more like this:

purple circle with peach circle inside. Blue circle has a line to peach circle

This is called coupling. When Purple team changes their library, Blue team is affected. (If it’s a monolith, their code changed underneath them. I hope they have good tests.)
Now, you could say, Blue team doesn’t have to update their version. The level of reuse is the release, we broke out the library, so this is fine.

picture of purple with orange circle, blue with peach circle.

At that point you’ve basically forked, the code isn’t shared anymore. When Blue team needs to make their own changes, they first must upgrade, so they get surprised some unpredictable time later. (This happened to us at Outpace all the time with our shared “util” libraries and it was the worst. So painful. Those “timesavers” cost us a lot of time and frustration.)

This shared code is a coupling between two services that otherwise have nothing to do with each other. The whole point of microservices was to decouple! To make it so our changes impact only code that our team operates! dead. and for what?

To answer that, consider the nature of the shared code. Why is it shared?
Perhaps it is unrelated to the business: it is general utilities that would otherwise be duplicated, but we’re being DRY and avoiding the extra work of writing and testing and debugging them a second time. In this case, I propose: cut and paste. Or fork. Or best of all, try a more formalized reuse-without-sharing procedure [link to my next post].

What if this is business-related code? What if we had good reason to DRY it out, because it would be wrong for this code to be different in Purple Service and Blue Service? Well sorry, it’s gonna be different. Purple and Blue do not have the same deployment schedules, that’s the point of decoupling into services. In this case, either you’ve made yourself a distributed monolith (requiring coordinated deployments), or you’re ignoring reality. If the business requires exactly one version of this code, then make it its own service.

picture with yellow, purple, and blue circles separate, dotty lines from yellow to purple and to blue.

Now you’re not sharing code anymore. You’re sharing a service. Changes to Peachy can impact Purple and Blue at the same time, because that’s inherent in this must-be-consistent business logic.

It’s easier with a monolith; that shared code stays consistent in production, because there is only one deployment. Any surprises happen immediately, hopefully in testing. In a monolith, if Peachy is utility classes or functions, and Purple (or Blue) team wants to change them, the safest strategy is: make a copy, use the copy, and change that copy. Over time, this results in less shared code.

This crucial observation is #2 in Modern Software Over-engineering Mistakes by RMX.

“Shared logic and abstractions tend to stabilise over time in natural systems. They either stay flat or relatively go down as functionality gets broader.”

Business software is an expanding problem. It will always grow, and not with more of the same: it will grow in ways you didn’t plan for. This kind of code must optimize for change. Reuse is the enemy of change. (I’m talking about reuse of internal code.)

Back in the beginning, Blue team reused the peach library and saved time writing code. But writing code isn’t the expensive part, compared to changing code. We don’t add features faster as our systems get larger and we have more code hypothetically available for re-use. We add features more slowly, because every change has more impacts and is less safe. Shared code makes change less safe. The only code safe to share is code that doesn’t change. Which means no versioning. Heck, you might as well have cut and pasted it.

When reuse is good

We didn’t advance as an industry by rewriting, or cut and pasting, everything we need over and over. We build on libraries published by developers and companies all over the globe. They release them, we reuse them. Yes, we get into dependency hell, but it beats writing your own web framework. We get reuse not only of the code, but of understanding: Rails knowledge transfers between employers.

There is a tipping point where reuse is magical.

I argue that this point is well past a release, past a separate jar.
It is past a stable API
past a coherent abstraction
past automated tests
past solid documentation…

All these might be achieved within the organization if responsibility for the shared utilities lives in a separate team; you can try to use Conway’s Law to enforce architectural boundaries, but within an org, those boundaries are soft. And this code isn’t your business, and you don’t have incentives to spend the time on these. Why have backwards compatibility when you can perform human coordination instead? It isn’t worth it. In my past organizations, shared code has instead been the responsibility of no one. What starts out as “leverage” becomes baggage, as all the Ruby code is tied to an old version of Sinatra. Some switch to Go to get a clean slate.
Break those chains! Copy the pieces you need out of that internal library and make them yours.

At the level of winning reuse, that code has its own marketing department
its own sales team
its own office manager
its own stock price.

The level of reuse is the company.

(Pay for software.)

When the responsible organization succeeds by making its code stable and backwards-compatible and easy to work with and well-documented and extensively tested, that is code I want to reuse!

In addition to SaaS companies and vendors, there are organizations built around open-source software. This is why we look for packages and frameworks with a broad community around them. Or better, a foundation for keeping shared projects healthy. (Contribute to them.)


Reuse is dangerous because it introduces coupling. Share business code only when that coupling is inherent to the business domain. Share library and utility code only when it is maintained by an organization dedicated to publishing that code. (Same with services. If you can pay for infrastructure-level tools, you’ll get better tools without distracting your organization.)

Why did we want to reuse internal code anyway?
For speed, but speed of change is more important.
For consistency, but that means coupling. Don’t hold your teams back with it.
For propagation of bug fixes, which I’ve not seen happen.

All three of these can be automated [LINK to my next post] without dependencies.

Next time you consider making your code reusable, ask “who will I sell this to?”
Next time someone (including you) suggests you reuse their code, ask “who publishes that?” and if they say “me,” copy it instead.

It’s Atomist Time!

I’m hella excited to get to work on Atomist full-time starting now (January 2017). Why? What do they do? Oh let me tell you!
I love developing software, not least because we (as an industry) have not yet figured out how to software. We know it’s powerful, but not yet how powerful. Software is like engineering except that the constraints aren’t in what we can build, but what we can design and specify. Atomist is expanding this capacity.
Atomist builds tooling to smooth some bumps in software development. There are three components that I’m excited about, three components that open new options in how we develop software.

Component 1: Code that changes code

First, there are code editors, called Rugs. On the surface, these automate the typing part. Like code generators, except they continue to work with the code after you modify it. Like refactorings in an IDE, except they appear as a pull request, and then you can continue development on that branch. If you have some consistent code structure (and if you use a framework, you do), Rugs can perform common feature-adding or upgrading or refactoring operations. Use standard Rugs to, say, add graph database support to an existing Spring Boot project. Customize Rugs to set up your Travis build uniformly in your projects. Create your own Rugs to implement metrics integration according to your company’s standards — and to upgrade existing code when those standards change.
On the surface this is an incremental improvement over existing code generation and IDE refactoring tools. Yet, I see it as something more. I see it as a whole new answer to the question of “indirection or repetition?” in code. Take for instance: adding a field to a Rails app makes us change the controller, the model, and four other places. Or creating a new service means changing deployment configuration, provisioning, and service discovery. Whenever a single conceptual change requires code changes in multiple spots, we complain about the work and we make mistakes. Then we start to get clever with it: we introduce some form of indirection that localizes that change to one place. Configuration files get generated in the build, Ruby metaprogramming introduces syntax that I can’t even figure out how it’s executable — magic happens. The code gets less explicit, so that we can enforce consistency and make changing it … well, I’m not gonna say “easier” because learning to cast the spell is tricky, but it is less typing.
Atomist introduces a third alternative: express that single intention (“create a new service” or “add this field”) as a Rug editor. This makes writing it one step, and then the editor makes all those code changes in a single commit in a branch. From there, customize your field or your new service; each commit that you make shows how your feature is special. The code remains explicit, without additional magic. When I come back and read it, I have some hope of understanding what it’s doing. When I realize that I forgot something (“oops! I also need to add that service to the list of log sources”) then I fix it once, in the NewService.rug editor. Now I never forget, and I never have to remember.
I love this about developing with Rugs: as I code, I’m asking myself, “how could I automate this?” and then what I learn is encoded in the Rug, for the benefit of future-me and (if I publish it) of future-everyone-else. That is when I feel productive.

Component 2: Coordination between projects

Editors are cute when applied to one project. When applied across an organization, they start to look seriously useful. Imagine: A library released a security update, and we need to upgrade it across the organization. Atomist creates a pull request on every project that uses that library. The build runs, maybe we even auto-merge it when the build passes. Or perhaps there are breaking changes; the editor can sometimes be taught how to make those changes in our code.
And if a Rug can change the way we use a library, then it can change the way we use ours. This is cross-repository refactoring: I publish an internal library, and I want to rename this function in the next version. Here’s my game: I publish not only the new version of my library, but an editor – and then I ask Atomist to create pull requests across the organization. Now it is a quick code review and “accept” for teams to upgrade to the new version.
Atomist coordinates with teams in GitHub and in Slack. Ask Atomist in Slack to start that new feature for you, or to check all repositories in the organization and create pull requests. Atomist can also coordinate with continuous integration. It ties these pieces together across repositories, and including humans. It can react to issues, to build results, to merges; and it can ping you in Slack if it needs more information to act appropriately. I have plans to use this functionality to link libraries to the services that use them: when the build passes on my branch, go build the app that uses my library with this new version, and tell me whether those tests pass.
This is cross-repository refactoring and cross-repository build coordination. This gives companies an alternative to the monorepo, to loading all their libraries and services into one giant repository in order to test them together. The monorepo is a lie: our deployments are heterogenous, so while the monorepo is like “look at this lovely snapshot of a bunch of code that works together” the production environment is something different. The monorepo is also painful because git gets slow when the repository gets large; because it’s hard to tell which commits affect which deployed units; and because application owners lose control over when library upgrades are integrated. Atomist will provide a layer on top of many repositories, letting us coordinate change while our repositories reflect production realities.
Atomist tooling will make multirepo development grow with our codebases.

Component 3: is still a secret

I’m not sure I can talk about the third piece of possibility-expanding tooling yet. So have this instead:
Automated coordination among systems and people who interact with code — this is useful everywhere, but it’s a lot of work to create our own bots for this. Some companies put the resources into creating enough automation for their own needs. No one business-software-building organization has a reason to develop, refine, and publish a general solution for this kind of development-process automation. Atomist does.
When it becomes easy for any developer to script this coordination and the reactions just happen — “Tell me when an issue I reported was closed” “Create a new issue for this commit and then mark it closed as soon as this branch is merged” — then we can all find breakages earlier and we can all keep good records. This automates my work at a higher level than coding. This way whenever I feel annoyed by writing a status report, or when I forget to update the version in one place to match the version in another, my job is not to add an item to a checklist. My job is to create an Atomist handler script to make that happen with no attention from me.

My secret

I love shaving yaks. Shaving them deeply, tenderly, finding the hidden wisdom under their hair. I love adding a useful feature, and then asking “How could that be easier?” and then “How could making that easier be easier?” This is Atomist’s level of meta: We are making software to make it easier for you to make your work easier, as you work to make software to make your customers’ lives easier.
I think we’re doing this in depths and ways other development tools don’t approach. At this level of meta (software for building software for building software for doing work), there’s a lot of leverage, a lot of potential. This level of meta is where orders-of-magnitude changes happen. Software changes the world. I want to be part of changing the software world again, so we can change the real world even faster.
With Atomist, I get to design and specify my own reality, the reality of my team’s work. (Atomist does the operations bit.) Without spending tons of time on it! Well, I get to spend tons of time on it because I get to work for Atomist, because that’s my thing. But you don’t have to spend tons of time on it! You get to specify what you want to happen, in the simplest language we can devise.
We’re looking for teams to work with us on alpha-testing, if you’re interested now. (join our slack, or email me) Let’s learn together the next level of productivity and focus in software development.

Scaling Intelligence

You can watch the full keynote from Scala eXchange 2015 (account creation required, but free). The talk includes examples and details; this post is a summary of one thread.

Scala is a scalable language, from small abstractions to large ones. This helps with the one scaling problem every software system has: scaling the feature set while still fitting it in our heads. Scaling our own intelligence.

Scala offers complicated powerful language features built from combinations of simpler language features. The aim is a staircase of learning: gradually learn features as you need them. The staircase starts in the green grass of toy programs, moves through the blue sky of useful business software, and finally into the outer space of abstract libraries and frameworks. (That dark blob is supposed to represent outer space.)

This is not how people experience the language.

The green grass is great: Odersky’s Coursera courses, Atomic Scala. Next, we want to write something useful for work: the blue sky. It is time to use libraries and frameworks. I want a web app, so I bring in Spray. Suddenly I need to understand typeclasses and the magnet pattern. The magnet pattern? The docs link to a post on this. It’s five thousand words long. I’m shooting into outer space — I don’t want to be an astronaut yet!

The middle of the staircase is missing.

Who can repair this? Not the astronauts, the compiler and library authors. They can write posts around program language theory, defining one feature in terms of a bunch of other concepts I don’t understand yet. I need explanations by people who share my objectives, people a little bit ahead of me in the blue sky, who recently learned how to use Spray themselves. I don’t need research papers, I need StackOverflow. Blog posts, not textbooks.

This is where we need each other. As a community, we can fill this staircase. At a macro level, we scale intelligence with teaching.

Scala as a language is not enough. We don’t work in languages, especially not in the blue sky. We work in language systems, including all the libraries and tooling and all the people. The resources we create, and the personal interactions in real life and online. When we teach each other, we scale our collective intelligence, we scale our community.

Scaling the community is important, because only a large, diverse group can answer two crucial questions. To make the language and libraries great, we need to know about each feature: is this useful? and to make this staircase solid, we need to know about each source and document: is this clear?

Useful isn’t determined by the library author, but by its users. Clear isn’t determined by the writer, but by the reader. If you read the explanation of Futures on the official Scala site and you don’t get it, if you feel stupid, that is not your fault. When documentation is not clear to you, its maintainers fail. Teaching means entering the context of the learner, and starting there. It means reaching for the person a step or two down, and pulling them up to where you are.

Michael Bernstein described his three years of learning Haskell. “I tried over and over again to turn my self doubt into a pure functional program, and eventually, it clicked.”
Ouch. Not everyone has this tenacity. Not everyone has three years to spend becoming an astronaut. Teaching makes the language accessible to more people. At the same time, it makes everyone’s life easier — what might Mr Bernstein have accomplished during that year?

Scala, the language system, does not belong to Martin Odersky.  It belongs to everyone who makes Scala useful. We can each be part of this.

Ask and answer questions on StackOverflow. Blog about what you learned, especially about why it was useful.[1] Request more detail — if something is not clear to you, then it is not clear. Speak at your local user group.[2] The less type theory you understand, the more people you can help!

Publish your useful Scala code. We need examples from the blue sky. If you do, tweet about it with #blueSkyScala.

It is up to all of us to teach each other, to scale our intelligence. Then we can make use of those abstractions that Scala builds up. Then it will be a scalable language.

[1] example: Remco Beckers’s post on Option and Either and Try.
[2] example: Heather Miller’s talk compensates for bad documentation around Scala Futures.

Logs are like onions

Or, What underlying implementation is clojure.tools.logging using?

Today I want to change the logging configuration of a Clojure program. Where is that configuration located? Changing the obvious resources/log4j.properties doesn’t seem to change the program’s behavior.

The program uses clojure.tools.logging, but that’s a wrapper around four different underlying implementations. Each of those implementations has its own ideas about configuration. How can I find out which one it uses?

Add a println to your program[1] to output this:

(.name clojure.tools.logging/*logger-factory*)         

In my case the output is:


This is clojure logging’s first choice of factories. If it can instantiate this, it’ll use it. Now I can google slf4j and find that it… is also a facade on top of multiple logging implementations.
Digging into the slf4j source code reveals this trick:

(class (org.slf4j.LoggerFactory/getILoggerFactory)) 

which prints:


so hey! I am using log4j after all! Now why doesn’t it pick up resources/log4j.properties?
Crawling through the log4j 1.2 (slf4j seems to use this version) source code suggests this[2]:

(org.apache.log4j.helpers.Loader/getResource “log4j.properties”)

which gives me


So hey, I finally have a way to trace where logging configuration comes from! 
In the end, my guess of resources/log4j.properties was correct. I forgot to rebuild the uberjar that I was running. The uberjar found the properties file in itself:


Bet I’d have realized that a few hours earlier if I were pairing today. And then I wouldn’t have made this lovely post.

[1] or run it in the cider REPL in emacs, in your namespace
[2] actually it checks for log4j.xml first; if that’s found it’ll choose the xml file over the .properties.

A victory for abstraction, re-use, and small libraries

The other day at Outpace, while breaking some coupling, Eli and I decided to retain some information from one run of our program to another. We need to bookmark how far we read in each input data table. How can we persist this small piece of data?

Let’s put it in a file. Sure, that’ll work.[1] 

Next step, make an abstraction. Each of three configurations needs its own “how to read the bookmark” and “how to write the bookmark.”[2] What can we name it?

After some discussion we notice this is basically a Clojure atom – a “place” to store data that can change – except persistent between runs.

Eli googles “clojure persist atom to disk” and bam! He finds a library. Enduro, by @alandipert. Persistent atoms for Clojure, backed by a file or Postgres. Complete with an in-memory implementation for testing. And thread safety, which we would not have bothered with. Hey, come to think of it, Postgres is a better place to store our bookmarks.

From a need to an abstraction to an existing implementation! with better ideas! win!

Enduro has no commits in the last year, but who cares? When a library is small enough, it reaches feature-completion. For a solid abstraction, there is such a thing as “done.”

Now, it happens that the library isn’t as complete as we hoped. There are no tests for the Postgres implementation. The release! method mentioned in the README doesn’t exist.

But hey, we can add these to the library faster and with less risk than implementing it all ourselves. Alan’s design is better than ours. Building on a solid foundation from an expert is more satisfying that building from scratch. And with pull requests, everybody wins!

This is re-use at its best. We paused to concentrate on abstraction before implementation, and it paid off.

[1] If something happens to the file, our program will require a command-line argument to tell it where to start.

[2] In OO, I’d put that in an object, implementing two single-method interfaces for ISP, since each function is needed in a different part of the program. In Clojure, I’m more inclined to create a pair of functions. Without types, though, it’s hard to see the meaning of the two disparate elements of the pair. The best we come up with is JavaScript-object-style: a map containing :read-fn and :write-fn. At least that gives them names.

GOTO Amsterdam: Respect the past, renew the present

GOTO Amsterdam started with a retrospective on Java, and ended with the admonition that even Waterfall was an advancement in its time. The conference encouraged building on the past, renewing and adding for growth.

As our bodies renew themselves, replacing cells so we’re never quite the same organism; so our software wants renewal and gradual improvement, small frequent changes, all the way out to deployment. Chad Fowler reports that the average uptime of a node at Wunderkind is in hours, and he’d like to make it shorter. Code modules are small, data is small, and servers are small. Components fail and the whole continues. Don’t optimize the time between failures: optimize the time to recovery. And monitor! Testing helps us develop with confidence, but monitoring lets us deploy with confidence.

At Etsy as well, changes are small and deployments frequent. Bits of features are deployed to production before they’re ever activated. Then they’re activated gradually, a separate process from deployment, and carefully monitored. Etsy has one giant monolithic PHP app, and yet they’ve achieved 50/day deployments with great uptime. Monitoring removes fear: “It’s not about how often you deploy your app. It’s do you feel comfortable deploying from trunk, right now?”

That doesn’t happen all at once. It builds up through careful tooling and through careful consideration of each production outage, in post-mortems where everyone listens and root causes are investigated to levels impossible in a blame-assigning culture.  Linda Rising said, “The real problem in our organizations is nobody wants to talk about how we might be doing things wrong.” At Etsy, they talk about failure.

Even as we’re deploying small changes and gradually improving our code, there’s another way our code can renew and grow: the platform underneath.  Georges Saab told us part of the magic of Java, the way the JIT compiler team works with the people creating the latest hardware. By the time that hardware is released, the JVM is optimized for the latest improvements. Even beyond the platform, Java developers moved the industry away from build-it-yourself toward finding an open-source solution, building on the coding and testing and design efforts of others. And if we update those libraries, they’re renewing as well. We are not doing this alone.

And now in Java 8, there are even more opportunities for library-level optimization, as Stream processing raises the level of abstraction, letting us declare our intentions with a lambda expression instead of specifying the steps. Tell the language what you want it to do, not how, and it can optimize. Dan North used this technique back when he invented DevOps (I’m mostly kidding): look at the outcome you want, and ask how to get there. The steps you’ve used before are clues, not the plan.

Yet be careful with higher levels of abstraction: Horia Dragomir reminded us this can also hurt performance. This happens when the same code
compiles for Android and iPhone. There’s a Japanese concept called bokeh (pronounced like bouquet) of blurring parts of an image to bring others into focus. Abstraction can do that for us, if we’re careful as the photographer.

In the closing keynote, Linda Rising reminded us, to our chagrin: people don’t make decisions based on data. We make decisions based on stories. What stories are we telling ourselves and each other? Do our processes really work? There aren’t empirical studies about our precise situation. The best we can do is to keep trying new tweaks and different methods, and find out what works for us. Like a baby putting everything in their mouth.

We can acquire more data, and choose to use this for better decisions. At Etsy every feature implementation comes with monitoring: How will you know it’s working? How will you know if it breaks? Each feature has a dashboard. And then in the post-mortems, a person can learn “how immensely hard it is to fight biases.” If we discard blame, we can reveal our mistakes, and build on each others’ experiences.

Overcome fear: experience the worst-case scenario. Keep changing our code, and ourselves: “As a person, if you can’t change, you might as well be dead.” It’s OK to be wrong, when you don’t keep being wrong.

As Horia said, “You’re there, you’re on the shoulders of giants. You need to do your own thing now. Add your own twist.”

This post based on talks by Linda Rising, Chad Fowler, Georges Saab and Paul Sandos, Horia Dragomir, Daniel Schauenberg; conversations with Silvana Wasitova and Kevlin Henney, all at GOTO Amsterdam 2014. Some of these may be online eventually

The Silver Pill

There is no silver bullet. What if there is a silver pill?

It is no single change that can rocket our productivity. It is a change in the rate of change.
There are two outputs of everything we write: some code, and a new version of ourselves. If we stop thinking of our product as the code, and focus also on improving ourselves with everything we write, then we increase our own productivity in all future code. Then our abilities grow with compound interest.
The other day, I asserted that our code should be concrete, because it is more clear and maintainable. Daniel Spiewak argued, abstract early! This policy has benefited him: once he has formed the abstraction, then the next time a seemingly disparate requirement comes up that he can boil down to the same abstraction, he can tell immediately and without experimentation what problems are inside. 
He was right, because what we do in ourselves is more valuable than what we do in the code. So what if that carefully abstracted code gets deleted two days later? The patterns created in the brain pay off for the rest of his life. And he can build to higher-level abstractions he’d never reach without that investment.
I’ve lived this payoff in another way: when I start a job, I’m less productive than other new developers are for 2-4 months. They want to jump right in and be productive. They’re focused on their current code output. I want to understand the system, so I ask a ton of questions and dig around in the code to find the root cause of problems. This makes me slower at first, but by 6 months in, I’m one of the most productive people on the whole team, and still improving. The code we write pays off today, but learning pays off every day for the rest of our career.

It’s the difference between building wheels, and building a machine that can make wheels. When we keep improving the builder of the machine, then production accelerates. From position to velocity to acceleration: raise the second derivative and the limit is infinity.
Trivial example: today, git merge came back with a pile of conflicts. I flipped through git documentation and asked a friend, learning about git’s concepts of file status. This cost twenty minutes today, and it makes all future dealings with merge conflicts a bit easier. Now I know that git status -s will give me a grep-friendly summary.
Daniel is right — spending time on code that never deploys to production is wasteful only if we learn nothing while writing it. The silver pill is: time spent coding is wasted if we learn nothing from it. The return value of our day is the self we become for the next day, while code is a handy side effect.

Causality: tougher than it looks, but we can take it on

We like to take a hunk of data, graph one factor against another, demonstrate correlation, and infer causality. This naive form of analysis is appealing in its simplicity, but it doesn’t cut it in the real world. With Big Data, we can identify correlation out the wazoo, but it’s time to get way more sophisticated in our causality analysis.

With data as big as we can get it today, the scientific method doesn’t work anymore. (Don’t take my word for it. Listen to Sandy Pentland.)

A correlation between two factors is judged statistically significant if there is less than a 5%, or 1%, or 0.5% chance that the results would come out this way by chance. At the strictest level, this means 1:200 false hypotheses will show up as true out of randomness. With tremendous data, we can test effectively infinite hypotheses. Plenty of them will look significant when they are not. As Sandy puts it, you can learn that people who drive Fords on Thursdays are more likely to get the flu. The correlation exists, but it’s bullshit.
With big data, it’s time to bring the word “significant” back to its regular-people meaning. We have to look for causality. We have to look for the micropatterns that lead to better health, smoother traffic, lower energy use. No more “this happened and this happened to the same people, so they must be related!” Causality delineates the difference between truth and publishability of an academic paper.

How can we find that causality? It is complex: many influences together trigger each event, and each of these factors are triggered by many influences including each other. How are we to analyze this?

A painfully simplified example: Jay’s new web site

Manufacturing has a tool that could be useful. Quality Function Deployment, and in particular the House of Quality tool, addresses the chains and webs of causality. As Chad Fowler explained yesterday at 1DevDayDetroit, the House of Quality starts with desired product characteristics. It identifies the relative importance of each characteristic; a list of measurable factors that influence the characteristics; and which factors influence which characteristics, how much, and in what direction. Magic multiplication formulas then calculate which factors are the most important to the final product.

But don’t stop there. Take the factors and turn them into the target characteristics in the next House of Quality. Find factors that influence this new, more detailed set of characteristics. Repeat the determination of what factors influence what characteristics and how much.

The factors from Iteration 1 become the goals in Iteration 2.

Iterate until you get down to factors specific enough that they can be controlled in a production facility. Actionable, measurable steps are then apparent, along with a priority for each based on how much they influence the highest-level product characteristics. Meanwhile, you have created a little network of causalities.

This kind of causality analysis is a lot of work. Creating this sad little example made my brain hurt. This analysis is no simple graph of heart attacks vs strawberry consumption across populations. On the upside, Big Data drastically expands our selection of measurable factors. If we can identify causality at a level this detailed, we can get a deeper level of information. We can get closer to truth.

Abstraction goes both ways

Abstraction is critical to programming. It is the core activity we use to make more interesting and complex programs.

Most of us understand abstraction as finding commonalities between different concepts, and modeling these as an inheritance hierarchy. If you are a retail store, then toy cars, trash cans, and pillows are all sellable items.  Thus, abstraction is finding similarities in disparate things. Identifying patterns.

But there is another side: breaking concepts down into components. While the trash cans and toy cars are the same at the cash register, they are different when stocked on shelves. That GUI button has a view, a model, and some event triggers. Thus, abstraction is also finding differences in what appears to be same. Breaking what appears to be an indivisible whole into disparate components. Design is the process of breaking things apart.

This second application of abstraction is important in analyzing cause and effect. We started an agile process, and then a bug made it into production. Agile is a failure! But really, did our process changes have anything to do with it? Perhaps the two events were unrelated. Or perhaps the same frustrations led to both. As humans we get overexcited about X happened, and then Y happened; therefore X led to Y. We are too quick to form patterns where none exit. Laurie won both poker nights, therefore Laurie is skilled at poker. It is challenging to break Laurie’s play into components of luck and skill – skill we can control, so it’s a much more appealing cause. Breaking a cause or effect into multiple components requires abstraction, and if we use it, we will be better at programming and at life.