Morning Stance

It is 7:09. One child is out, and I have returned to bed. Alexa will wake me at 7:15.

Six minutes: I could make my bed or do tiny morning yoga. Six minutes of rest is useless; I’ll feel worse afterward. What am I likely to do?

I picture the probability space in front of me. Intention, habit, and a better start to the day push me toward yoga. Yet there’s a boundary there, a blockage: it is my current stance.

At 7:09, if I were standing, I’d likely do yoga. But at 7:09 and horizontal, I’m gonna stay horizontal. Only a change in surrounding conditions (beep, beep, beep!) will trigger motion.

Cat Swetel talks about stances. By changing your stance, you change your inclinations.

It is 7:10. I choose to change my stance. I stand up.

I make my bed.

One deliberate change of stance, and positive habits and intentions take it from there.

Developer aesthetic: a command line

Today I typed psql to start a database session. That put me in the wrong place, so I typed \connect org_viz to get into the database I wanted.

But then I stopped myself, quit psql, and typed psql -d org_viz at the command prompt.

Why?

It smooths my work. I knew I would exit and re-enter that database session several times today, and this way pushing up-arrow to get to the last command would get me to the right command. No more “oh, right, I have to \connect” for today.

It makes my work more reproducible. As a dev, every command I type at a shell or REPL is either an experiment or an action. If it’s an experiment, I’ll do different things as fast as I can. If it’s an action, I want it to express my intention.

What I’m not doing is meandering around a toilsome path to complete some task that I know perfectly well how to do. Once known, all those steps belong in one repeatable, intention-expressing automation.

Correcting the command I typed is a tiny thing. It expresses a development aesthetic: repeatability. If I’m not exploring, I’m executing, and I execute in a repeatable fashion. I executed that tiny command to open the database I wanted. Then I re-used it a dozen times. Frustration saved, check. Developer aesthetic satisfied, check.

Don’t build systems. Build subsystems.

Always consider your design a subsystem.

Jabe Bloom

When we build software, we aren’t building it in nowhere. We aren’t building a closed system that doesn’t interact with its environment. We aren’t building it for our own computer (unless we are; personal automation is fun). We are building it for a purpose. Chances are, we build it for a unique purpose — because why else would they pay us to do it?

Understanding that surrounding system, the “why” of our product and each feature, makes a big difference in making good design decisions within the system.

It’s like, the system we’re building is our own house. We build on a floor of infrastructure other people have created (language, runtime, dependency manager, platform), making use of materials that we find in the world (libraries, services, tools). We want to understand how those work, and how our own software works. This is all inside our house.

To do that well, keep the windows open. Look outside, ask questions of the world. What purpose is our system serving? What effects does it have, and what effects from other subsystems does it strengthen?

Whenever you’re designing something, the first step is: What is the system my system lives in? I need to understand that system to understand what my system does.

Jabe Bloom

It is a big world out there, and these are questions we can never answer completely. It’s tempting to stay indoors where it’s warm. We can’t know everything, but we gotta try for more.

Nested learning loops at Netflix

Today in a keynote at Spring One, Tom Gianos from Netflix talked about their internal data platform. He listed several components, ending with quick mention of the “Insights Services” team, which studies how the platform is used inside Netflix. A team of people that learns about how internal teams use an internal platform to learn about whatever they’re doing. This is some higher-order learning going on.

It’s like, a bunch of teams are making shows for customers. They want to get better at that, so they need data about how the shows are being watched.

So, Netflix builds a data platform, and some teams work on that. The data platform helps the shows teams (and whatever other teams, I’m making this up) complete a feedback loop, so they can get better at making shows.

diagram: customers get shows from the show team; that interaction sends something to the data platform, which sends something to the shows team. That interaction (between the shows team and the data platform) sends something to the Insights Services team, which sends info to the data platform team.

Then the data platform teams want to make a better data platform, so an Insights Services team collects data about how the data platform itself is used. I’m betting they use the data platform for that. I also bet they talk to people on the shows teams. Then Insights Services closes that feedback loop with the data platform team, so that Netflix can get better at getting better at making shows.

Essential links in this loops include telemetry in all these platforms. The software that delivers shows to customers is emitting events. The data platform jobs are emitting events about what they’re doing and for whom.

When a human does a job, reporting what they’re doing is extra work for them. (Usually flight attendants write drink orders on paper, or keep them in memory. The other day I saw them entering orders into iPads. Guess which was faster.) In any human system, gathering information costs money, time, and customer service. In a software system, it’s a little extra network traffic. Woo.

Software systems give us the ability to study them. To really find out what’s going on, what was working, and what wasn’t. The Insights Services team, as part of the data platform organization, can form hypotheses and then test them, adding telemetry as needed. As a team with internal customers, they can talk to the humans to find out what they’re missing. They can get both the data they think they need, and a glimpse into everything else.

Software organizations are a beautiful opportunity for learning about systems. We can do science here: a kind of science where we don’t try to find universal laws, and instead try to find the forces at work in our local situation, learn them and then sometimes change them.

When we get better at getting better — wow. That adds up to some serious acceleration over time. With learning loops about learning loops, Netflix has impressive and growing advantages over competitors.

Don’t just keep trying; report your limits

The other day we had problems with a service dying. It ran out of memory, crashing and failing to respond to all open requests. That service was running analyses of repositories, digging through their files to report on the condition of the code.

It ran out of memory trying to analyze a particular large repository with hundreds of projects within it. This is a monorepo, and Atomist is built to help with microservices — code scattered across many repositories.

This particular day, Rod Johnson and I paired on this problem, and we found a solution that neither of us would have found alone. His instinct was to work on the program, tweaking the way it caches data, until it could handle this particular repository. My reaction was to widen the problem: we’re never going to handle every repository, so how do we fail more gracefully?

The infrastructure of the software delivery machine (the program that runs the analyses) can limit the number of concurrent analyses, but it can’t know how big a particular one will be.

However, the particular analysis can get an idea of how big it was going to be. In this case, one Aspect finds interior projects within the repository under scrutiny. My idea was: make that one special, run it first, and if there are too many projects, decline to do the analysis.

Rod, as a master of abstraction, saw a cleaner way to do it. He added a veto functionality, so that any Aspect can declare itself smart enough to know whether analysis should continue. We could add one that looks at the total number of files, or the size of the files.

We added a step to the analysis that runs these Vetoing Aspects first. We made them return not only “please stop,” but a reason for that stop. Then we put that into the returned analysis.

The result is: for too-large repositories, we can give back a shorter analysis that communicates: “There are too many projects inside this repository, and here is the list of them.” That’s the only information you get, but at least you know why that’s all you got.

And nothing else dies. The service doesn’t crash.

When a program identifies a case it can’t handle and stops, then it doesn’t take out a bunch of innocent-bystander requests. It gives a useful message to humans, who can then make the work easier, or optimize the program until it can handle this case, or add a way to override the precaution. This is a collaborative automation.

When you can’t solve a problem completely, step back and ask instead: can I know when to stop? “FYI, I can’t do this because…” is more useful than OutOfMemoryError.

Stick with “good enough,” until it isn’t

In business, we want to focus on our core domain, and let everything else be “good enough.” We need accounting, payroll, travel. But we don’t need those to be special if our core business is software for hospitals.

As developers, we want to focus on changing our software, because that is our core work. We want other stuff, such as video conferencing, email, and blog platforms to be “good enough.” It should just work, and get out of our way.

The thing is: “good enough” doesn’t stay good enough. Who wants to use Concur for booking travel? No one. It’s incredibly painful and way behind modern web applications that we use for personal travel. Forcing them into an outdated travel booking system holds your people back and makes recruiting a little harder.

When we rent software as a service, then it can keep improving. I shuddered the last time I got invited to a WebEx, but it’s better than it used to be. WebEx is not as slick as Zoom, but it was fine.

There is a lot of value in continuing with the same product that your other systems and people integrate with, and having it improve underneath you. Switching is expensive, especially in the focus it takes. But it beats keeping the anachronism.

DevOps says, “If it hurts, do it more.” This drives you to improve processes that are no longer good enough. Now and then you can turn a drag into a competitive advantage. Now and then, like with deployment, you find out that what you thought was your core business (writing code) is not core after all. (Operating useful software is.)

Limiting what you focus on is important. Let everything else be “good enough,” but check it every once in a while to make sure it still is. Ask the new employee, “What around here seems out of date compared to other places you’ve worked?” Or try a full week of mob programming, and notice when it gets embarrassing to have six people in the same drudgery.

You might learn something important.

Library vs service: who controls change?

When you have a common piece of functionality to share between two apps, do you make a library for them to share, or break out a service?

The biggest difference between publishing a library or operating a service is: who controls the pace of change.

If you publish a library for people to use, you can put out new versions, but it is up to each application’s team to incorporate that new version. Upgrades are gradual and may never fully happen.

If you operate a service, you control when upgrades happen. You put the new code in production, and poof, everyone is using it. You can upgrade multiple times per day, and you control when each is complete.

If you need certain logic to be consistent between applications, consider making a service. Control the pace of change.

(For one flipside, see: From Complicated to Complex)

From complicated to complex

Avdi Grimm describes how the book Vehicles illustrates how simple parts can compose a very complex system.

Another example is Conway’s Game of Life: a nice organized grid, uniform time steps, and four tiny rules. These simple pieces combine to make all kinds of wild patterns, and even look alive!

These systems are complex: hard to predict; forces interact to produce surprising behavior.

They are not complicated: they do not have a bunch of different parts, nor a lot of little details.

In software, a monolith is complicated. It has many interconnected parts, each full of details. Conditionals abound, forming intricate rules that may surprise us when we read them.

But we can read them.

As Avdi points out, microservices are different. Break that monolith into tiny pieces, and each piece can be simple. Put those pieces together in a distributed system, and … crap. Distributed systems are inherently complex. Interactions can’t be predicted. Surprising behaviors occur.

Complicated programs can be understood, it’s just a lot to hold in your head. Build a good enough model of the relevant pieces, and you can reason out the consequences pf change.

Complex systems can’t be fully understood. We can notice patterns, but we can’t guarantee what will happen when we change it.

Be careful in your aim for simplicity: don’t trade a little complication for a lot of complexity.

A lot of rules

A Swede, on American Football: “Are there any rules? It looks like they stand in two lines, someone throws the ball backwards, and then it’s a big pile.”

Me: “141 pages, last I checked. It takes a lot of rules to look like there aren’t any.”

Later, people talked about error messages. There are so many different ones! … or there should be. Actually they all fall back to “Something went wrong” when the server responds with non-success.

Was it a transient error? Maybe the client should retry, or ask the person to try again later. Was it a partial failure? Maybe the client should refresh its data. Was it invalid input? Please, please tell the person what they can do about it. In all supported languages.

The happy path may be the most frequent path, but it is one of thousands through your application. As software gets more complicated, the happy path becomes unlikely. (99.9% success in a thousand different calls has a 37% happy path, if the person gets everything right.) What makes an app smooth isn’t lack of failure, it’s accommodation of these alternate paths. Especially when people are being humans and don’t follow your instructions.

Each error is special. Each chance to smooth its handling is precious. People rarely report confusion, so jump on it when they do.

This alternate-path code gets gnarly. You may have 141 pages of it, and growing every year.

It takes a lot of rules to make an app “just work.”

In defense of rationality and dynamic programming

Karl Popper defines rationality as: basing beliefs on logical arguments and evidence. Irrationality is everything else.

He also defines comprehensive rationality as: only logical arguments and evidence are valid basis for belief. But this belief itself can only be accepted by choice or faith, so comprehensive rationality is self-contradictory. It also excludes a lot of useful knowledge. This, he says, is worse than irrationality.

It reminds me of some arguments for typed functional programming. We must have proof our programs are correct! We must make incorrect programs impossible to represent in compiling code! But this excludes a lot of useful programs. And do we even know what ‘correct’ is? Not in UI development, that’s for sure.

Pure functional programming eschews side effectsy (like printing output or writing data). Yet it is these side effects that make programs useful, that let’s them impact the world. Therefore, exclusively pure functional programming is worse than irrationality (all dynamic, side-effecting programming).

Popper argues instead for critical rationality: Start with tentative faith in premises, and then apply logical argument and evidence to see if they hold up. Consider new possible tenets, and see whether they hold up better. This kind of rationality accepts that there is no absolute truth, no perfect knowledge, only better. Knowledge evolves through critical argument. We can’t get to Truth, but we can free ourselves from some falsehoods.

This works in programming too. Sometimes we do know what ‘correct’ is. In those cases, it’s extremely valuable to have code that you know is solid. You know what it does, and even better, you know what it won’t do. Local guarantees of no mutation and no side effects are beautiful. Instead of ‘we don’t mutate parameters around here, so I’ll be surprised if that happens’ we get ‘it is impossible for that to happen so I put it out of my mind.’ This is what functional programmers mean when they talk about reasoning about code. There is no probably about it.

Where we can use logical arguments and evidence, let’s. For everything else, there’s JavaScript.

Everything is a big circle. Inside are some blobs of nonsense and a nice rectangle of purely functional programs. Some of the area between FP and nonsense is useful.