How can we develop and operate increasingly useful software?

Most software gets harder to change as it ages. Making modern applications, it is not enough to write a system and put it out there. We need continual improvement and adaptation to the growing world.

How can we develop and operate increasingly useful software?

To answer this, I need a mental model of how software teams work.

My model is symmathesy: a learning system of learning parts. A great software team is made of people, software, and tools. The people are always learning, the software is changing, the tools are improving.

With this view, let’s look at the question.

How can we (the people, tools, & software in a symmathesy) develop & operate (for learning, these must be together) increasingly (the ongoing quest) useful (the purpose; to learn, we need connection to its use) software (output of our system to the wider world)? People are products of our interactions; the future matters, so we need healthy growth; and code is a product of our interactions with it — so the route we take, or the process, matters.

The external output of the team is: useful software, running in production. To make it increasingly useful, grow the symmathesy. Growth in a system is defined as an increase in flow. In a symmathesy, that means growth is an increase in learning.

The people learn about the software by operating it, mediated by tools. Then we teach the software and tools by developing. Unless development & operations are in the same team, learning is blocked.

To increase usefulness, the team needs to learn about use. We can get input from outside, and study events emitted by the running software, mediated by tools.

What we know about usefulness leads the team to design change, and move the system to the next version of itself. That next version may be more useful to the outside, or it may be better at learning and teaching, the better to increase knowledge inside the team.

Someone today told me, “All great things in life come from compounding interest” — from feedback loops. (Compound interest is the simplest familiar example of a feedback loop.)

In a great software team, that “compound interest” is our relevant knowledge. Of the software, of its environment, of the people and systems who use it.

Maximize learning to increase the usefulness of software, at an accelerating pace.

Basements and galleries

In a big new project, where do you start?

My dad bought this old building, and he’s turning it into an antique mall. It’s a huge project: two floors of shopping plus a gallery and museum at the top. Yesterday he showed me what he’s done so far.

In the basement, he added support beams and repaired the sprinkler system. They painted all the water pipes red, air blue, and electric gray. Paint is stored where it won’t freeze. The elevator machinery is oiled and accessible.

On the third floor, where art and artifacts will draw people through the store, he has crafted walls of sunrise, sunset, and nightfall from wood siding in dozens of different stains.

He started with infrastructure and with beauty. The underpinnings of the whole store, made observable with color coded pipes. And the unique features that will make this store different from any other.

In a big project, there’s not one place to start; usually we need to make little circles. Foundations, surface, various points in between.

Emphasize the parts that make the work smoother: a clean workspace, visibility into what’s happening, fire prevention. And the parts that make the work fulfilling: the beauty, so we remember why we’re doing this project at all.

Increase capacity, move slower

If the highways are crowded, and they build more lanes, the highways get more crowded.

If development is slow, and you add resources, development gets slower.

Adding people to a project increases the capacity for activity. Activity doesn’t translate to outcomes.

In these cases, you’re adding capacity to the system for cars, or for work, but those aren’t what makes the system run faster. Instead, adding capacity for traffic or for activity leads the system to change in ways that generate more traffic or activity. Which gets in the way of flow.

(lots more examples in this article)

What you want instead is to make flow easier. Add trains, intersperse commerce and residential. Add continuous delivery, add support structures to make progress easier. Don’t add more capacity for work! Doing work isn’t the point! Make the path shorter, instead.

Game science, programming science

In Pokémon Go last summer, a new feature popped up: bad guys. Some Pokéstops turned dark, and Team Rocket appears there, and they want to fight me. And they always win, dammit!

the Team Rocket Grunt gloats over his victory.

This much I can discover within the game. If I want to know more — like how to win the fights — I turn to the internet.

People on the internet have cataloged all the Pokémon. Each has a type, and each type is weak to attacks of certain other types. They’ve also noticed that when the Team Rocket Grunt boasts, they reveal something about the type of Pokémon they have. After they boast, I get to choose which of my Pokémon to fight their Pokémon with.

People observed these properties of the in-game world, and they reasoned about which Pokémon will be effective against which Team Rocket Grunts, and they’ve tested these guesses in-game, and they’ve posted the results online. I can use their analysis to increase my effectiveness in the world of Pokémon Go.

People take it even farther: they’ve deduced the hidden talent levels of individual Pokémon based on observable properties, and created calculators so that you can evaluate your Pokémon. There is systematic testing and measurement here, and logical deduction, and math.

This is science. People do science to describe a Nature that exists inside this game world. And then they publish their results.

It’s a different culture from the Science we learn in school, that institution/body of knowledge/discipline/people that studies Nature in the physical world. But it is science: it is people using observation, causal hypotheses, tests, and discussion to increase a shared body of knowledge about how a world works.

In software development, StackOverflow works this way too. Questions and answers on StackOverflow are scientific publications, sharing observations and knowledge and recommendations about a Nature that exists inside a particular programming language, tool, or library.

Every piece of software constructs its own little reality. Collaborating online, we study them together. We do science.

Work with the business, not for it

Scientists should be on tap but not on top.

Winston Churchill

In the Cold War, political and technical considerations were no longer separable. The President got a Science Advisory Committee, but “apparently… scientists must not concern themselves with devising and proposing policies; they ought to limit themselves to answering such technical questions as they may be asked.” (Leo Szilard, physicist)

Yikes! Sounds like old style software development, where the programmers receive the requirements from the business.

I think we’ve learned better than that. Many of the most successful companies are led by technical people. We need the business experts and software developers working together. The business doesn’t know all the questions developers can answer, and devs don’t know what questions to ask the business — until we start implementing. Then, necessary questions rise to the surface, and lead to discussions which include more useful questions.

If developers are “on tap” as a resource, we can’t create anything better than you can specify (and believe me, that list of requirements is no specification). Our collective imagination is better than either alone.

Asking useful questions is the hard part. Collaborate on it.

Layers in software: from data to value

Then

Back in the 2000s, we wrote applications in layers.

Presentation layer, client, data transfer, API, business logic, data access, database. We maintained strict separation bet ween these layers, even though every new feature changed all of them. Teams organized around these layers. Front end, back end, DBAs.

Each layer of software is a wide box, next to its team.
They stack on top of each other: frontend stuff, backend stuff, database, each with its team.
At the top are some customers. Value flows from them to the db and back, crossing all the layers.
Business value exists only by flowing through all the layers to the DB and back.

Layers crisscrossed the flow of data.

Responsibility for any one thing to work fell across many teams.

Interfaces between teams updated with every application change.

Development was slow and painful.

Now

In 2019, we write applications in layers.

A business unit is supported by a feature team. Feature teams are supported by platforms, tooling, UI components. All teams are supported by software as a service from outside the company.

Feature teams at the top of the software are multicolored, with multiple components in their software.
Under them are platform and component teams, each different.
Under them are nice square boxes of external services.
Business value flows through the top layer (feature teams), staying close to the business people.
Developer value flows between the feature teams, through the internal teams, to external services and back.
Business value is concentrated in the feature teams; developer value flows through support teams and external services.

Back in the day, front end, back end, operations, and DBAs separated because they needed different skills. Now we accept that a software team needs all the skills. We group by responsibility instead — responsibility for business value, not for activities.

Supporting teams provide pieces in which consistency is essential: UI components and some internal libraries.

Interfaces between teams change less frequently than the software changes.

Layers crisscross the flow of value.

DevEx

Feature teams need to do everything, from the old perspective. But that’s too hard for one team — so we make it easier.

This is where Developer Experience (DevEx) teams come in. (a.k.a. Developer Productivity, Platform and Tools, or inaccurately DevOps Teams.) These undergird the feature teams, making their work smoother. Self-service infrastructure, smooth setup of visibility and control for production software. Tools and expertise to help developers learn and do everything necessary to fulfill each team’s purpose.

Internal services are supported by external services. Managed services like Kubernetes, databases, queueing, observability, logging: we have outsourced the deep expertise of operating these components. Meanwhile, internal service teams like DevEx have enough understanding of the details, plus enough company-specific context, to mediate between what the outside world provides and what feature teams need.

This makes development smoother, and therefore faster and safer.

We once layered by serving data to software. Now we layer by serving value to people.

Morning Stance

It is 7:09. One child is out, and I have returned to bed. Alexa will wake me at 7:15.

Six minutes: I could make my bed or do tiny morning yoga. Six minutes of rest is useless; I’ll feel worse afterward. What am I likely to do?

I picture the probability space in front of me. Intention, habit, and a better start to the day push me toward yoga. Yet there’s a boundary there, a blockage: it is my current stance.

At 7:09, if I were standing, I’d likely do yoga. But at 7:09 and horizontal, I’m gonna stay horizontal. Only a change in surrounding conditions (beep, beep, beep!) will trigger motion.

Cat Swetel talks about stances. By changing your stance, you change your inclinations.

It is 7:10. I choose to change my stance. I stand up.

I make my bed.

One deliberate change of stance, and positive habits and intentions take it from there.

Developer aesthetic: a command line

Today I typed psql to start a database session. That put me in the wrong place, so I typed \connect org_viz to get into the database I wanted.

But then I stopped myself, quit psql, and typed psql -d org_viz at the command prompt.

Why?

It smooths my work. I knew I would exit and re-enter that database session several times today, and this way pushing up-arrow to get to the last command would get me to the right command. No more “oh, right, I have to \connect” for today.

It makes my work more reproducible. As a dev, every command I type at a shell or REPL is either an experiment or an action. If it’s an experiment, I’ll do different things as fast as I can. If it’s an action, I want it to express my intention.

What I’m not doing is meandering around a toilsome path to complete some task that I know perfectly well how to do. Once known, all those steps belong in one repeatable, intention-expressing automation.

Correcting the command I typed is a tiny thing. It expresses a development aesthetic: repeatability. If I’m not exploring, I’m executing, and I execute in a repeatable fashion. I executed that tiny command to open the database I wanted. Then I re-used it a dozen times. Frustration saved, check. Developer aesthetic satisfied, check.

Don’t build systems. Build subsystems.

Always consider your design a subsystem.

Jabe Bloom

When we build software, we aren’t building it in nowhere. We aren’t building a closed system that doesn’t interact with its environment. We aren’t building it for our own computer (unless we are; personal automation is fun). We are building it for a purpose. Chances are, we build it for a unique purpose — because why else would they pay us to do it?

Understanding that surrounding system, the “why” of our product and each feature, makes a big difference in making good design decisions within the system.

It’s like, the system we’re building is our own house. We build on a floor of infrastructure other people have created (language, runtime, dependency manager, platform), making use of materials that we find in the world (libraries, services, tools). We want to understand how those work, and how our own software works. This is all inside our house.

To do that well, keep the windows open. Look outside, ask questions of the world. What purpose is our system serving? What effects does it have, and what effects from other subsystems does it strengthen?

Whenever you’re designing something, the first step is: What is the system my system lives in? I need to understand that system to understand what my system does.

Jabe Bloom

It is a big world out there, and these are questions we can never answer completely. It’s tempting to stay indoors where it’s warm. We can’t know everything, but we gotta try for more.

Nested learning loops at Netflix

Today in a keynote at Spring One, Tom Gianos from Netflix talked about their internal data platform. He listed several components, ending with quick mention of the “Insights Services” team, which studies how the platform is used inside Netflix. A team of people that learns about how internal teams use an internal platform to learn about whatever they’re doing. This is some higher-order learning going on.

It’s like, a bunch of teams are making shows for customers. They want to get better at that, so they need data about how the shows are being watched.

So, Netflix builds a data platform, and some teams work on that. The data platform helps the shows teams (and whatever other teams, I’m making this up) complete a feedback loop, so they can get better at making shows.

diagram: customers get shows from the show team; that interaction sends something to the data platform, which sends something to the shows team. That interaction (between the shows team and the data platform) sends something to the Insights Services team, which sends info to the data platform team.

Then the data platform teams want to make a better data platform, so an Insights Services team collects data about how the data platform itself is used. I’m betting they use the data platform for that. I also bet they talk to people on the shows teams. Then Insights Services closes that feedback loop with the data platform team, so that Netflix can get better at getting better at making shows.

Essential links in this loops include telemetry in all these platforms. The software that delivers shows to customers is emitting events. The data platform jobs are emitting events about what they’re doing and for whom.

When a human does a job, reporting what they’re doing is extra work for them. (Usually flight attendants write drink orders on paper, or keep them in memory. The other day I saw them entering orders into iPads. Guess which was faster.) In any human system, gathering information costs money, time, and customer service. In a software system, it’s a little extra network traffic. Woo.

Software systems give us the ability to study them. To really find out what’s going on, what was working, and what wasn’t. The Insights Services team, as part of the data platform organization, can form hypotheses and then test them, adding telemetry as needed. As a team with internal customers, they can talk to the humans to find out what they’re missing. They can get both the data they think they need, and a glimpse into everything else.

Software organizations are a beautiful opportunity for learning about systems. We can do science here: a kind of science where we don’t try to find universal laws, and instead try to find the forces at work in our local situation, learn them and then sometimes change them.

When we get better at getting better — wow. That adds up to some serious acceleration over time. With learning loops about learning loops, Netflix has impressive and growing advantages over competitors.