Step 1: On one team, put the people with the knowledge and control necessary to change the software, see the results, change it, see the results.
Step 2: Use automation to take extrinsic cognitive load off this team, so that it needs fewer people.
That’s it, that’s DevOps.
Step 1 describes the cultural change that leads to flow. Delivering change requires no handoffs or approvals from outside the team; the impact of change flows back to the team. Act and learn.
Step 2 is where tools come in. If all you do is improve your tooling, well, that helps a little, but it doesn’t get you the qualitative change in flow. That comes from Step 1. The serious value of automation is that it enables Step 1, a single team with all the relevant knowledge.
Our job as developers is making decisions. DevOps gets us the knowledge we need to make good decisions, the authority to implement them, and the feedback to make better ones in the future.
The other day, I asked my twelve year old daughter for recommendations of drawing programs. She told me about one (FireAlpaca?) “It’s free, and it updates pretty often.” She contrasted that with one that cost money “but isn’t as good. It never updates.”
The next generation appreciates that good software is updated regularly. Anything that doesn’t update is eventually terrible.
Software that doesn’t change falls behind. People’s standards rise, their needs change. At best, old software looks dumb. At worst, it doesn’t run on modern devices.
Software that doesn’t change is dead. You might say, if it still runs, it is not dead. Oh, sure, it’s moving around — but it’s a zombie. If it isn’t learning, it’ll eventually fall over, and it might eat your face.
I want to use software that’s alive. And when I make software, I want it to stay alive as long as it’s in use. I want it be “done” when it’s out of production.
Software is like people. The only “done” is death.
Alive software belongs to a team.
What’s the alternative? Keep learning to keep living. Software needs to keep improving, at least in small ways, for as long as it is running.
We have to be able to change it, easily. If Customer Service says, “Hey, this text is unclear, can you change it to this?” then pushing that out should be as easy as updating some text. It should be not be harder than when the software was in constant iteration.
This requires automated delivery, of course. And you have to know that delivery works. So you have to have run it recently.
But it takes more than that. Someone has to know — or find out quickly — where that text lives. They have to know how to trigger the deployment and how to check whether it worked.
More than that, someone has to know what that text means. A developer needs to understand that application. Probably, this is a developer who was part of its implementation, or the last major set of changes.
For the software to be alive, it has to be alive in someone’s head.
And one head will never do; the unit of delivery is the team. That’s more resilient.
Alive software is owned and cared for by an active team. Some people keep learning, keep teaching the software, and the shared sociotechnical system keeps living. The team and software form a symmathesy.
How do we keep all our software alive, while still growing more?
Okay, but what if the software is good enough right now? How do we keep it alive when there’s no big initiative to change it?
Hmm. We can ask, what kind of code is easy to change?
Code needs to be clean and modern.
Well, it’s consistent. It is up-to-date with the language versions and frameworks and libraries that we currently use for development.
It is “readable” by our current selves. It uses familiar styles and idioms.
What you don’t want is to come at the “simple” (from outside perspective) task of updating some text, and find you need to install a bunch of old tools, oh wait, there’s security patches that need to happen before this will pass pre-deployment checks. Oh now we have to upgrade more stuff to the modern versions of those libraries to work. You don’t want to have to resuscitate the software before you can breathe new life into it.
If changing the software isn’t easy enough, we won’t do it. And then it gets terrible.
So all those tool upgrades, security patches, library updates gotta have been done already, in the regular course of business.
Keeping those up to date gives us an excuse to change the code, trigger a release, and then notice any problems in the deployment pipeline. We keep confidence that we can deploy it, because we deploy it every week whether we need to or not.
People need to be on stable teams with customer contact.
More interesting than properties of the code: what are some properties of people who can keep code alive?
The team is stable. There’s continuity of knowledge.
The team understands the reason the software exists. The business significance of that text and everything else.
And we still care. We have contact with people who use this software, so we can check in on whether this text change works for them. We continue to learn.
Code belongs to one team.
More interesting still: what kind of relationship does the alive-keeping team have with the still-alive code?
Ownership. The code is under the care of a single team.
Good communication. We can teach the code (by changing it), so we have good deployment automation and we understand the programming language, etc. And the code can teach us — it has good tests, so we know when we broke something. It is accountable to us, in the sense that it can tell us the story of what happens. This means observability. With this, we can learn (or re-learn) how it works while it’s running. Keep the learning moving, keep the system living.
The team is a learning system, within a learning system.
Finally: what kind of environment can hold such a relationship?
It’s connected; the teams are in touch with the people who use software, or with customer support. The culture accepts continued iteration as good, it doesn’t fear change. Learning flows into and out of the symmathesy.
It supports learning. Software is funded as a profit center, as operational costs, not as capital expenditure, where a project is “done” and gets deprecated over years. How the accounting works around development teams is a good indication of whether a company is powered by software, or subject to software.
Then there’s the tricky one: the team doesn’t have too much else on their plate.
How do we keep adding code to our responsibilities?
The team that owns this code also owns other code. We don’t want to update libraries all day across various systems we’ve written before. We want to do new work.
It’s like a garden; we want to keep the flowers we planted years ago healthy, and we also want to plant new flowers. How do we increase the number of plants we can care for?
And, at a higher level — how can we, as people who think about DevOps, make every team in our organization able to keep code alive?
Teams are limited by cognitive load.
This is not: how do we increase the amount of work that we do. If all we did was type the same stuff all the time, we know what to do — we automate it.
Our work is not typing; it’s making decisions. Our limitation is not what we can do, it is what we can know.
In Team Topologies, Manuel Pais and Matthew Skelton emphasize: the unit of delivery of a team, and the limitation of a team is cognitive load.
We have to know what that software is about, and what the next software we’re working on is about. and the programming languages they’re in, and how to deploy them, and how to fill out our timesheets and which kitchen has the best bubbly water selection, and who just had a baby, and — it takes a lot of knowledge to do our work well.
Team Topologies lists three categories of cognitive load.
The germane cognitive load, we want that.
Germane cognitive load is the business domain. It is why our software exists. We want complexity here, because the more complex work our software does, the less the people who use it have to bother with. Maximize the percentage of our cognitive load taken up by this category.
So which software systems a team owns matters; group by business domain.
Intrinisic cognitive load increases if we let code get out of date.
Intrinsic cognitive load is essential to the task. This is our programming language and frameworks and libraries. It is the quirks of the systems we integrate with. How to write a healthy database query. How the runtime works: browser behavior, or sometimes the garbage collector.
The fewer languages we have to know, the better. I used to be all about “the best language for the problem.” Now I recommend “the language your team knows best, as long as it’s good enough.”
And “fewer” includes versions of the language, so again, consistency in the code matters.
Extrinsic cognitive load is a property of the work environment. Work on this
Finally, extrinsic cognitive load is everything else. It’s the timesheet system. The health insurance forms. It’s our build tools. It’s Kubernetes. It’s how to get credentials to the database to test those queries. It’s who has to review a pull request, and when it’s OK to merge.
This is not the stuff we want to spend our brain on. The less extrinsic cognitive load on the team, the more we have room for the business and systems knowledge, the more responsibility we can take on.
And this is a place where carefully integrated tools can help.
DevOps is about moving system boundaries to work better. How can we do that?
We can move knowledge within the team, and we can move knowledge out to a different team.
We can move work below the line.
Within the team, we can move knowledge from the social side to the technical side of the symmathesy. We can package up our personal knowledge into code that can be shared.
Automations encapsulate knowledge of how to do something
Automate bits of our work. I do this with scripts.
The trick is, can we make sharing it with the team almost as easy as writing it for ourselves?
Especially automate anything we want to remain consistent.
For instance, when I worked on the docs at Atomist, I wrote the deployment automation for them. I made a glossary, and I wanted it in alphabetical order. I didn’t to put it in alphabetical order; I wanted it to constantly be alphabetical. This is a case for automation.
I wrote a function to alphabetize the markdown sections, and told it to run with every build and push the changes back to the repository.
Autofixes like this also keep the third party licenses up to date (all the npm dependencies and their licenses). This is a legal requirement that a human is not going to do. Another one puts the standard license header on any code that’s committed without it. So I never copied the headers, I just let the automation do that. Formatting and linting, same thing.
If you care about consistency, put it in code. Please don’t nag a human.
Some of that knowledge can help with keeping code alive
Then there’s all that drudgery of updating versions and code styles etc etc — weeding the section of the garden we planted last year and earlier. how much of that can we automate?
We can write code to do some of our coding for us. To find the inconsistencies, and then fix some of them.
Encapsulate knowledge about -when- to do something
Often the work is more than knowledge of -how- to do something. It is also -when-, and that takes requires attentiveness. Very expensive for humans. When my pull request has been approved, then I need to push merge. Then I need to wait for a build, and then I need to use that new artifact in some other repository.
Can we make a computer wait, instead of a person?
This is where you need an event stream to run automations in response to.
Galo Navarro has an excellent description of how this helped smooth the development experience at Adevinta. They created an event hub for software development and operations related activities, called Devhose. (This is what Atomist works to let everyone do, without implementing the event hub themselves.)
We can move some of that to a platform team.
Yet, every automation we build is code that we need to keep alive.
We can move knowledge across team boundaries, with a platform team. I want my team’s breadth of responsibility to increase, as we keep more software alive, so I want its depth to be reduced.
Team Topologies describes this structure. The business software teams are called “stream aligned” because they’re working in a particular value stream, keeping software alive for someone else. We want to thin out their extrinsic cognitive load.
Move some it to a platform team. That team can take responsibility for a lot of those automations. And deep knowledge of delivery and operational tooling. Keep the human judgement of what to deploy when in the stream-aligned teams, and a lot of the “how” and “some common things to watch out for” in the platform team.
Some things a platform team can do:
onboarding of code (delivery setup)
checks every team needs, like licenses
And then, all of this needs to stay alive, too. Your delivery process needs to keep updating for every repository. If delivery is event-based, and the latest delivery logic responds to every push (instead of what the repo was last configured for), then this keeps happening.
But keep thinning our platforms.
Platforms are not business value, though. We don’t really want more and more software there, in the platform.
We do want to keep adding services and automation that helps the team. But growing the platform team is not a goal. Instead, we need to make our platforms thinner.
There is such a thing as “done”
The best way to thin our software is outsourcing to another company. Not the development work, not the decisions. But software as a service, IaaS, logging, tooling of all sorts — hire a professional. Software someone else runs is tech debt you don’t have.
So maybe Galo could move Devhose on top of Atomist and retire some code.
So yeah. There is such a thing as done. “Done” is death. You don’t want it for your value-producing code. You do want it for all other code you run.
Don’t do boring work.
If keeping software alive sounds boring, then let’s change that. Go up a level of abstraction and ask, how much of this can we automate?
Writing code to change code is hard. Automating is hard.
That will challenge your knowledge of your own job, as you try to encode it into a computer. Best case, you get the computer doing the boring bits for you. Worst case, you learn that your job really is hard, and you feel smart.
Keep learning to keep living. Works for software, and it works for us.
Today my Docker build failed on Windows because apt-get update failed because some release files were not valid yet. It said they’d be valid in about 3.5 hours. WAT.
I don’t care about your release files! Do not exit with code 100! This is not what I want to think about right now!
Spoiler: restarting my computer fixed it. 😤
This turned out to be a problem with the system time. The Ubuntu docker containers thought it was 19:30 UTC, which is like 8 hours ago. Probably five hours ago, someone updated the release files wherever apt-get calls home to. My Docker container considered that time THE FUTURE. The scary future.
Windows had the time right, 21:30 CST (which is 6 hours earlier than UTC). Ubuntu in WSL was closer; it thought it was 19:30 CST. But Docker containers were way off. This included Docker on Windows and Docker on Ubuntu.
Entertainingly, the Docker build worked on Ubuntu in WSL. I’m pretty sure that’s because I ran this same build there long ago, and Docker had the layers cached. Each line in the Dockerfile results in a layer, so Docker starts the build operation at the first line that has changed. So it didn’t even run the apt-get update.
This is one of the ways that Docker builds are not reproducible. apt-get calls out to the world, so it doesn’t do the same thing every time. When files were updated matters, and (now I know) what time your computer thinks it is matters.
Something on the internet suggested restarting the VM that Docker uses. It seems likely that Docker on WSL and Docker on Windows (in linux-container mode) are using the same VM under the hood somewhere. I don’t know how to restart that explicitly, so I restarted the computer. Now all the clocks are right (Windows, Ubuntu in WSL, and Ubuntu containers from both Docker daemons). Now the build works fine.
I’m not currently worried about running containers in production. (I just want to develop this website without installing python’s package manager locally. This is our world.) Still working in Docker challenges me to understand more about operating systems, their package managers, networking, system clocks, etc.
Superlatives are dangerous. They pressure you to line up the alternatives in a row, judge them all, make the perfect decision.
This implies that databases can be compared based on internal qualities. Performance, reliability, scalability, integrity, what else is there?
We know better than that. We recognize that relational databases, graph databases, document stores etc serve different use cases.
“What are you going to do with it?”
This is progress; we are now considering the larger problem. Can we define the best database per use case?
In 7 Rules of Effective Change, Esther Derby recommends balance among (1) inner needs and capabilities, (2) the needs and capabilities of those near you, and (3) the larger context and problem. (This is called “congruence.”)
“What is the best?” considers only (1) internal qualities. Asking “What are you doing with it?” adds (3) the problem we’re solving. What about (2) the people and systems nearby?
Maybe you want to store JSON blobs and the high-scale solution is Mongo. But is that the “best” for your situation?
does anyone on the team know how to use and administer it?
are there good drivers and client libraries for your language?
does it run well in the production environment you already run?
do the frameworks you use integrate with it?
do you have diagnostic tools and utilities for it?
Maybe you’ve never used Mongo before. Maybe you already know PostgreSQL, and it already runs at your company. Can you store your JSON blobs there instead? Maybe we don’t need anything “better.”
The people we are and systems we have matter. If a component works smoothly with them, and is good enough to do the task at hand, great. What works for you is better than anyone else’s “best.”
Today my daughter overslept and I had to take her to school. Usually when that happens, I get all grouchy and resentful. Dammit, I thought I’d get a nice quiet coffee but NO, I’m scraping ice off the car.
Today I didn’t mind. It was fine. It’s the first day back after winter break, so I expected her to oversleep. I expected to take her to school.
I expected to scrape ice off the car… OK no, I forgot about that part and did get a little grouchy.
Our feelings surface from the difference between expectations and reality. Not from reality by itself.
In our career, people may ask us what we want in our next role. “Where do you want to be in five years?” Answering this question hurts in several ways.
The more specific our vision of our future selves, the more we set ourselves up for disappointment.
We hide from ourselves all the other possibilities that we didn’t know about.
We tie our future self to the imagined desires of our present self.
In the next five years, I will gain new interests, new friends, and new ideas. Let those, along with the new situations I find myself in, guide my future steps.
Expectations are dangerous. We need some of them, especially in the near future, to take action. The less specific they can be, the more potential remains open to us, and the happier we can be.
Tomorrow, I will have a quiet coffee or take my daughter to school; I don’t have to know which until it happens. Five years from now, I have no idea what I’ll be doing — probably something cooler than my present self can imagine.
It’s difficult for an executive to criticize a budget when most line items are for mysterious high technology activities. It’s easier to tackle the more understandable portions, like postage, janitorial services, and consulting.
We want to help. We want to do stuff. We look for matches between what needs done and what we know how to do, and then we do it.
But does that always help?
Today in his newsletter, Marcus Blankenship told the story of the three bricklayers. What are you doing? “Laying bricks.” “Making a wall.” “Building a cathedral.”
We want to do something. We want to lay some bricks. As programmers, we want to write some tests, make a class, throw out an API.
After we break down a feature implementation into tasks, it is tempting to get started on the parts we know how to do. Knock those out, advance that progress bar. Make the wall taller.
Those are the least helpful parts to start with! In software development, our job is making decisions. What we need most is knowledge.
The pieces of the task most amorphous, kinda vague, the ones our brains want to slide past with hand-waves — those are the tasks that will give us more than apparent progress.
Integrate with that new service. Get authorization working. Pick the database and get familiar with it.
Digging into the uncertain tasks gives us information. We will learn how the API needs to be different than we thought. The data we didn’t know we needed, the unhappy-paths we didn’t know we needed to pave.
Lay the bricks slowly. Consider, what is going to hold this wall up? and what will this wall hold up?
In the opening quote, an executive tries to manage costs. They are drawn to the little nitpicky items that look approachable. But the easy ones are already pretty good! The giant Megatechnology items with big dollars next to them, these provoke handwaves. What might the executive gain by digging in and learning more about these? Maybe not cost-cutting, but definitely better decision making.
Today I brought up a load of laundry. When doing chores, I practice keeping WIP (work in progress) to a minimum. Finish each thing, then do another one. This is good training for code.
For instance, on the way up the stairs with the basket, I saw a tumbleweed of cat hair. I didn’t pick it up. Right now I’m doing laundry.
I put the basket on the bed, pulled out a pair of pants, folded it, and then stopped.
Do I put the pants on the bed and fold the rest? Or do I put the pants away right now, then start the next piece of clothing?
It depends which one thing I’m doing: folding this load of laundry? or putting a piece of clothing in its place?
It’s like in software development. Do we slice work by functionality or by layer?
Feature slicing, where we do all components (front end, back end, database, etc) of one small change and release that before moving on: this is like folding the pants and putting them away before picking up another item.
Layered work, where we first make the whole database schema, then develop the back end, then create the front end: this is like folding all the clothes and then putting them all away.
Pants on the bed are WIP. When clothes are on the bed, the cat sits on them and I can’t put them away. Then when I want to nap, my bed still has clothes on it. WIP is liability, not value. I can’t access my bed, and no one has clean pants.
Yet, folding the laundry and then putting it away is more efficient. I might fold three pairs of pants, and then put them away all at once. Four towels, one trip to the bathroom closet. The process as a whole is faster and smoother (excluding the cats).
Is layered work more efficient in software? NO. It always takes far longer, with worse results. A lot of rework happens, and then we settle for something that isn’t super slick.
Why is laundry different?
Because I’ve folded and put away this same pair of pants many times before. On the same shelf. Household chores are rarely a discovery process.
If I hadn’t done this before, then I might fold all of Evelyn’s pants in fourths. That is standard practice, and my pants fit nicely in my cabinet that way. When I go to put Evelyn’s pants away, I’d find that her shelf is deeper. It’s just right for pants folded in thirds. Folded in fourths, they don’t all fit; I run out of height.
Now it’s time for rework: fold all her pants again, in thirds this time.
With feature slicing, I would fold one pair of pants in fourths, put it on the shelf, notice that it doesn’t fit well, refold it in thirds, and find that it fits perfectly. Every subsequent pair of her pants, I’d fold in thirds to begin with.
Completing a thin feature slice brings learning back to all future feature slices.
For repetitive chores, we can choose efficiency. For new work, such as all software development, aim for learning. That will make us fast.
Games aren’t much “fun” when rules, rather than relationships, dominate the activity, when there is no attention to “flow,” “fairness,” “respect” and “nice.”
Dr. Linda Hughes, “Beyond the Rules of the Game: Why Are Rooie Rules Nice?”
At a past job, we played Hearts every day at lunch. Out of a core group of 6-8, there were always at least four to participate. I worked there for about five years; we played over a thousand games together. We wore out dozens of decks of cards.
On top of the core rules of Hearts, we accrued a whole culture. There was the “Slone shooter” (the worst possible score, named for a former member of the group). We said “Panda Panda” (winning a trick with the highest card) and “Where’s the Jerboa?” (the two of clubs comes out to start the game) — both originated in our favorite deck, which had a different animal on each card. Long after that deck retired, the cards retained their animal names.
We had rules of etiquette. The player in the lead was the target, and everyone else works together to damage their score. Everyone was expected to make a logically justifiable play, except when succumbing to other players chanting “Panda Panda!” to summon the Ace of Hearts.
I’ve never since had that much fun at cards. The rules of the game are only the beginning.
See, you can think about games, or you can observe gaming. There are the rules as written, and then there’s the experience of the players.
It’s the same with work: you can talk about work-as-imagined, or you can look at work-as-done.
Work-as-imagined is the official process. It is how you’re supposed to get your work done. Work-as-done is real life.
This is why tabletop board games are more fun than their electronic equivalents. Everyone sees how the rules work, because we execute them ourselves. House rules evolve. When there’s ambiguity, the group decides what’s fair. Lovely traditions grow, jokes get funnier with repetition, and the game becomes richer than its rules.
It’s the same at work. Post-its on the wall give shared physical context, and they’re more flexible than any ticket-tracking software. (Software usually limits reality to what was imagined by its developers.)
Each collaborating team eventually develops its own small culture. Vocabulary, jokes, etiquette. These exist on top of (sometimes in spite of) decreed processes of work.
These interactions make work “fun.” They care about “fairness,” “respect,” and “nice.” They also lead to “flow” — flow of work through the team. Communication is smooth, collaboration is joyful and productive. This is how we win.
Throwing food in the trash feels wasteful. Sometimes I feel compelled to eat the food instead. Then I feel lethargic, I gain weight, and everything I do is slower.
Sometimes waste isn’t a problem. The world is not better for that food passing through my digestive system. Sometimes it’s preventing waste that hurts us.
Inefficient code uses compute and costs money. Even when the response time and throughput don’t impact the user, that’s still wasteful, right?
Waste that speeds you up
Say we optimize all our code. But then it’s harder to read and to change. We slow ourselves down with our fear of waste.
Duplicate code takes time to write and test. Maybe many teams use some handy functions for formatting strings and manipulating lists. It’s a waste to maintain this more than once!
Say we put that in a shared utility library. Now every change I make impacts a bunch of other teams, and every change they make surprises me. To save a bit of typing, we’ve taken on coupling and mushed up the code ownership. Everything we do is slower.
Waste that slows you down
On the other hand, duplicated business logic means every future change takes more work and coordination. That is some waste that will bite you.
In Office Space, there’s that one person who takes the requirements from one floor to another. His salary is a waste. Much worse: he wants to preserve his meager responsibilities, so he’ll prevent the people who use the software from talking to the developers. Everything is slower forever.
When you spot a distasteful waste, ask: does this waste speed me up, or does this waste slows me down forever?
I can be wasteful, and it’s okay sometimes. I’ll waste compute for faster and safer code changes. I’ll spend time on utility code to skip tripping up other teams.
Some waste is just waste. It is time spent once, or money spent ongoing, and that’s it. Some waste makes us more effective, saves us cognitive load of having to think about another thing. Some inefficiencies let us be more effective.
Other waste is sticky. It drags you into spending more time in the future. It pulls you into handoffs and queues and coupling.
Fight the sticky waste that spirals into more drag in the future. Let the other waste be. Throw the extra food in the trash; your future self will move lightly for it.
Logic and culture have nothing to do with one another.
Jerry Weinberg, The Secrets of Consulting
A friend of mine works for a large government organization that runs a dam, extracting electricity from water and gravity.
They have internal software development, of course, and my friend described some impressive obstacles to change. Deploying a new lambda function is a challenge. My friend calls on contacts throughout the department, including friends from previous jobs that now also work there. They help each other through procedural barriers.
I said, wow. He said, yeah, we have a mantra: “We make power, not sense.”
Culture doesn’t make sense, to anyone from outside. Culture is common sense, to anyone embedded in it.
To understand and work with a large organization, let go of trying to make sense of it. Observe it and see what’s there. After that, logic might help in finding ways to work skillfully inside it, maybe even to change it.
Simon Wardley always says, map what is. Then think about moving it toward what you want.