Smaller pieces, lower pain

Part of XP is breaking work down into the smallest possible pieces. Kent Beck teaches us to teeny tiny changes, changes so small that you don’t mind starting over when you get things wrong. Llewellyn Falco breaks work down into bits so tiny that most of them provable refactorings, minute changes like putting “if (true){}” around some code; adding an empty else statement; or calling a function that does nothing.

When changing a complex system, it helps to make each change as simple as possible.

When our limitation is cognitive load, the difficulty of the task is not the sum of the difficulty of the steps. It is the maximum difficulty of any one step.

At home, when I get stressed out, when the kids are talking to me and I’m trying to get ready to go and the kitchen needs clean and what is sticky thing I just stepped on — I catch myself, and start breaking down my work into the smallest steps possible. One tiny thing at a time.

Put down what I’m holding. Listen to the child. Ask the child to wait. Fetch a washcloth. Get it wet. Wipe the sticky floor. Put the washcloth in the bin. Put one mug in the dishwasher, and call the kitchen “cleaner.” One at a time, put away the things I was holding. Walk past the closet where my coat is, put on my shoes. Now get the coat. Put it on. Now get a hat. Put it on. Now get car keys. Now put the keys down. Put on gloves. Pick up the car keys. Ask the child to repeat the question.

This might take more clock time than if I try to answer the child and put away the stuff I’m holding and pick up the stuff I need while optimizing the route to avoid doubling back in the hallway. Maybe.

The tiny steps lower my cognitive load. This leaves me enough attention to hear the child’s question. It lets me handle the hardest single step (leaving the rest of the kitchen alone) without bitching.

Even easy things get hard when we lump them together. Add stress, and cognitive load is exceeded, leading to more stress, leading to things getting harder. Soon I’m yelling at the children, dropping a mug, and leaving without a hat.

Our limitation is not what we can do. It’s how much we can hold in our heads. So don’t push it!

In programming, it’s dangerous to work near your working memory threshold. You get more mistakes and more complicated code. In life, it’s stressful to optimize for fewer steps or fewer seconds on the clock. Do that when you’re bored; keep yourself entertained by straining your working memory. Only at home, please, not at work.

Great software teaches

Great software solves a problem that you have — plus problems you didn’t know you had.

Here’s an example: today on Twitter, a friend let me know about a broken link to one of my old posts:

The broken link lives in someone else’s blog post, so I can’t update the source. It looks like the link has been broken since I migrated from Blogger to WordPress.com some months ago. Darn!

Ideally, the old link would redirect to the correct one, the one Gary found after a bunch of looking. This would fix the internet (just a tiny bit).

How hard is it to make that work?

Look at this beautiful plugin that appears right at the top when I search for “redirect”:

Redirection plugin. It has a million zillion installations

Perfect. I installed it (thanks to $23/month to WordPress.com for my business site) and entered the two URLs and poof, the redirect worked. That took under 10 minutes.

But wait! There’s more!

During the installation it asked me whether I wanted logs of 301s (requests to my site that got redirected to the right place) and 404s (requests to my site for a page that is not found). Yeah, sure.

After entering the one redirect I knew about, I saw it work in the log of 301s. Then I clicked on the 404 report, and in that couple minutes it had already noticed two more broken requests!

part of the 404 report. It shows the link I just fixed plus two others I did not expect

So I fixed those too! Hovering over the link in the report even gives an “Add redirect” option that populates half of it for me. Amazing.

This Redirection plugin is great software. It worked to solve my specific problem. It teaches me more about the wider problems I’m having, and helps me solve those too. Brilliant.

Understanding, inside and out

Every company, every team is its own system and works in its own ways. There are universal abstractions, but these are only useful when someone can translate them into the particular context of one company or team.

Corporate anthropologists do this. First, they adopt the role of “participant observer.” They get deep into the context of teams and workers at many levels of an organization. They stand on the factory floor, they ride along, they share kitchens and coffee breaks. All the time with “the ever-present notebook in your pocket, jotting down observations.”[1]

They learn the inside perspective. To explain how the system works in the language of the people inside it, the way they experience it.

Then, the anthropologist considers what they saw from the outside perspective, in the frames of various theories. How do these particulars fit into universals of leadership, organization, and work?

This combination of inside and outside perspective provides insights invisible from either one. They might see where the intentions of leaders are lost in communication. They might see that work gets done only by circumventing certain rules.

I am not an anthropologist, but I want this kind of understanding. I want to see the workings of my team both from inside and from outside, to recognize what is particular and what is universal. All the time while doing work.

I am an observing participant.

Developing software with a notebook in my back pocket, I notice how my team gets work done, what rules they circumvent and what unstated conventions they enforce. I notice when I feel surprise or frustration and when others do — clues to deviance from unspoken patterns.

As a member of the team, the inside perspective is natural. I take conscious steps to learn the outside perspective.

I talk with other people at meetups and in Slack communities. Read books and dig into conferences. Seek frameworks and theories of work in online materials and workshops.

Combining this outside view with my natural inside view lets me think about the wider purpose of our work, identify paths that can help us reach the goal more usefully, and flex when the wider system’s needs change.

Do the work, and while watching work. Seek outside perspective. Afterward, reflect. Be an observing participant.

[1] source: Danielle Brown and Jitske Kramer, The Corporate Tribe. Technically they talk about the two perspectives as emic (inside) and etic (outside).

illusions of commonsense perception

Learning is a struggle against “the illusions of commonsense perception” (Maria Popova). When it was obvious that the Earth stood still and the Moon moved, Kepler wrote a novel about traveling to the moon and meeting a civilization who believed, based on their commonsense perception, that the Moon stood still and the Earth moved. If a person can imagine the perspective of a moon-based being, maybe they can see that their belief is tied to their Earthbound context.

The phrase “That’s common sense,” like “That’s obvious,” means “I believe this, and the people I trust all believe this, and I can’t explain it.”

Many people in America have a commonsense notion: “Men have penises and women have vaginas.” It’s all they’ve personally seen. Can you imagine someone with a different perspective? an intersex person, even if you don’t know that you’ve met one?

A woman shaving in dramatic light

This frame is self-fulfilling. Any one who is intersex ain’t gonna tell you about it, when you are sure that’s not a thing. Common sense shapes our perspective of the world in multiple ways: in what we can perceive, and in what people show us.

Especially if you have power! Then people extra show you what you want to see and nothing else. Power inhibits perception.

Common sense is contextual. In the Midwest, obviously everybody drives. (Scientifically, it’s incredibly dangerous.) In embedded device drivers, obviously you test your software thoroughly, and statically typed languages are superior. In web apps for ad campaigns, obviously that’s all a waste of time. Can we imagine situations where people have different perspectives?

If you imagine someone with a different perspective, then you can gain insight into your own perspective. You might get more accurate theories — be able to predict eclipses, which a lifetime of seeing the moon come up at night can’t prepare you for. You might see that your perspective isn’t true everywhere, and that you could move — maybe not to the moon, but to a city where there’s a community of queer or genderfluid people (and maybe even public transport). Or to a team that treats testing differently, where you can expand your experience.

But this kind of imagination takes work. Was it useful to the average human in 1600 to know that the Earth revolves around the sun? Heck, is it to the average person today? It isn’t directly relevant to my daily life, but I believe it by default, because everyone around me does. Most of the commonsense beliefs we grow up with aren’t worth questioning.

It takes effort, research, emotional energy and brainspace to adopt new frames. We can’t all do it for everything. Some of us learn that the gender binary is nonsense. Some of us learn many programming languages. Some of us study astronomy.

When you meet a person who still holds an outdated notion, it doesn’t mean they’re an idiot. It means they haven’t taken the effort to break this one yet. We can’t all understand everything. And most of the time we don’t need to. It’s a bonus when someone does break out of the default beliefs.

When you do gain new understanding and alter the beliefs you started with, it stays with you forever. Wisdom comes with age, or with accumulation of shattered assumptions.

Kepler understood this, and he worked to make it easier for people to understand that maybe Earth isn’t the only place in the universe, and therefore not the center of the universe. Thank you to people who share their stories, lowering the effort it takes me to realize that a belief that’s been good enough for this long, is not good enough for everyone.

Capturing the World in Software

TL;DR – we can get a complete, consistent model of a small piece of the world using Event Sourcing. This is powerful but expensive.

Today on twitter, Jimmy Bogard on the tradeoffs of Event Sourcing:

If event sourcing is not scalable, faster, or simpler, why use it?

Event Sourcing gives you a complete, consistent model of the slice of the world modeled by your software. That’s pretty attractive.

We want to model the real world in software.

You can think about the present world as a sum of everything that happened before. Looking around my room, I can say that my bookshelf is the sum of various purchases, some moving around, a set of decisions about what to read and what to keep.

my bookshelf has philosophy, math, visualization, and a hippo

I can think of myself as the sum of everything that has happened, plus the stories I told myself about that. My next action is an outcome of this, plus my present surroundings, plus incoming events. That action itself is an event in the world.

In life, in biology, we don’t get to see all these inputs. We don’t get to change the response algorithm and try again. But in software, we can!

Of course we want perfect modeling and traceability of decisions! This way we can always answer “why,” and we can improve our understanding and decisionmaking strategies as we learn.

This is what Event Sourcing offers.

We want our model to be complete and consistent.

It’s impossible to model the entire world. Completeness and consistency are in conflict, sadly. Still, if we limit “complete” to a business domain, and to the boundaries of our company, this is possible. Theoretically.

Event Sourcing offers a way to do that.

In event sourcing, every piece of input is an event. Someone requests a counseling appointment, event. Provider signs up for available hours, event. Appointment scheduled, event. Customer notified, event. Customer shows up, event. Session report filed, event.

We can sum past events to get the current state

Skim the timeline of all events for the relevant ones. Sum these up (there are other definitions of “sum” besides adding numbers). From this we calculate the state of the world.

From appointment-was-scheduled events, we construct a provider’s calendar for the day.

At the end of the month, we construct reports on customers served and provider utilization. Based on that, we might seek more providers or have a talk with the less active ones. Headquarters ranks the performance of our office compared with others.

We need to allow corrections

To accurately model the real world, we need to allow for all the stuff that happens in the real world.

Appointments are cancelled. Customers don’t show up. Session reports are filed late. (“Where’s that session report from last week?” “Oh right, they were too late, because the gate to the parking lot malfunctioned. Don’t charge them for it.”)

Data is late or lost. If you insist that this doesn’t happen (“Every provider must enter the session reports by the end of the day”) then your model is blind to reality. The weather turns bad, people go home. There’s a bomb threat, or an active shooter. Reality intrudes.

Events outside your careful model will happen. Accommodate corrections, incorporate events that arrive late, accept partial data. The more of reality you allow into your model, the more accurate it can be.

We can evaluate past decisions based on the information available at the time

When data arrives late, reports change after they are printed. An event sourced system handles this.

As new data comes in about past days, it gets summed in with the data about those days. Reports get more accurate.

A friend of mine works at a counseling center, and he gets calls from headquarters like “Why is your utilization so low for December?” and he’s like “What? It was fine” and then he runs the report again and sure enough, it’s different. After he ran the report, more data about December came in, and now the totals are different. He can’t reproduce the reports he saw, which makes it hard to explain his actions to HQ.

If their software used event sourcing, he could say, “Please run the report as of January 2, and you’ll see why I didn’t take any action.”

Each event records a received timestamp, for when we learned about it, and an effective timestamp, for the real-world happening it represents. Then the software can sum only the events received before January 2 to reproduce the report as it was seen that day.

We can re-evaluate the world with new logic

Not only can an event-sourced system reproduce the same report as on an earlier day, we can ask: what if we changed the report logic? Then what would it look like?

Maybe we want to report unreported appointments as “possibly cancelled” to reflect uncertainty. We can run the new logic against the same events and compare it to the old results.

This means we can run tests against the event stream and detect behavior changes.

We need to record externally-visible decisions for consistency

When we change the software, we endanger consistency.

If we update the report logic in February, then when HQ runs the report “as of January 2” they’ll see something different than my friend saw when he ran it on that date. For consistency, both the data and code need to match what existed on January 2.

Or, we can model the report itself as an event. “On January 2, I said this about December.” Then we can incorporate that into the reporting logic.

Anything our system does that is visible to the outside world is itself an event, because it changes the way external people and software act. To reproduce our behavior consistently, our system can either record its own behavior, or retain all the data and the code that went into choosing it.

So far, this is nice and deterministic. But the real world isn’t.

Reproducing behavior is possible in an event-sourced system, if that behavior is deterministic. In human behavior, we don’t get that luxury. Our choices come from many influences, some of them contradictory. One tweet inspired me to write this article. Thousands of other tweets distract me from it.

Conflicting information comes in from real life.

Event sourcing gets tricky when the real world we are modeling is inconsistent, according to the events that come in.

Now say we’re a shipping company. We model the movement of goods in containers as they move across the world. It is an event when a container is loaded on a ship, and an event when it is unloaded. An event when a ship’s itinerary is scheduled, and when it arrives at each port.

One event says that container 1418 was loaded onto the vessel Enceladus in Auckland. Another event says that Enceladus is scheduled for its next stop in Beijing. Another event says that container 1418 was unloaded in San Francisco. Another says that container 1418 was emptied in Beijing. Which do you believe?

This example comes from a real story. Weird things happen. Does your system let people report reality? Is there a fallback for “Ask a person to go look for that container. Is it really 1418?”

Decisions made in ambiguity are events

Whatever decision the system makes, it needs to record that as an event. Perhaps that shows up as a footnote in reports about Enceladus, Beijing, and San Francisco. Does anybody hear about it in Auckland?

We can see the provenance of each report and decision

If some report comes out uneven, and that feeds back to the development team as a bug, then event sourcing gives us excellent tools for tracking it down.

Each “I made this decision” or “I produced this report” event can record the set of events that were input, and the version of code that ran to produce the output. You can have complete provenance.

This kind of software is accountable. It can tell the story of its decisions, what it did and why. What its world was like at that time.

This is a beautiful property. With full provenance, we can understand what happened. We can tell the story to each other. With replayability, we can change the code and see whether we can improve it for next time.

Recording everything gets ridiculous

Yet, data about provenance gets big very quickly. Each report consumed thousands of events. Each decision that was based on a current-state sum of events now has a dependency on all of those past events, plus the code that defines the current state, plus all the other states it took input from, plus their code and set of events.

Meanwhile some of those events are old, and no longer fit the format expected by the latest code. Meanwhile, we’re still ignoring everything that happened outside the system, so we’re completely blind to a lot of causality. “A person clicked this button.” Why? What information did they see on the screen as input to their decision to click “Container 1418 is in San Francisco”?

In real life, most information is lost. History will never be fully written; the writing is itself history. We’re always generating new actions. The system could theoretically report on all the reports it has reported. It never ends.

Completeness is limited to very small systems. Be careful where you invest this effort. Consciously select the boundaries, outside of which you don’t know what happened. You don’t know what really happened in the shipyard, or in a person’s head, or in the software that another company runs. The slice of the world we see is tiny.

Provenance is precious but difficult. Then again, it is at least as hard to do well in designs other than event sourcing. The painful realities that make event sourcing are painful in other models, too.

There are reasons we don’t model the whole world.

Event sourcing makes a best effort to model the world in its fullness. We try to remember everything significant that happens, sum that up into our own current-state world in the software, make decisions and act.

But events come in out of order. Events are lost. Events contradict each other. Events have partial data, or old data formats. Logic changes. We can’t remember everything.

Sometimes it pays to think about what you would do in an event-sourced system, and then implement just enough of that. Keep copies of produced reports, so that people can retrieve them without re-generating them. Record difficult decisions in a place that lives longer than logs.

Event sourcing is powerful. But it is not easy. Expect to think really hard about edge cases you didn’t want to handle. Expect to deal with storage and speed and up-to-dateness tradeoffs. Allow a human to enter corrections, because the real world will always surprise you.

In the real world, we don’t have all the information, and that’s OK. We can’t model everything in our heads, because our heads are inside everything. This keeps it interesting.

Definition of DevOps

Step 1: On one team, put the people with the knowledge and control necessary to change the software, see the results, change it, see the results.

Step 2: Use automation to take extrinsic cognitive load off this team, so that it needs fewer people.

That’s it, that’s DevOps.

Step 1 describes the cultural change that leads to flow. Delivering change requires no handoffs or approvals from outside the team; the impact of change flows back to the team. Act and learn.

Step 2 is where tools come in. If all you do is improve your tooling, well, that helps a little, but it doesn’t get you the qualitative change in flow. That comes from Step 1. The serious value of automation is that it enables Step 1, a single team with all the relevant knowledge.

Our job as developers is making decisions. DevOps gets us the knowledge we need to make good decisions, the authority to implement them, and the feedback to make better ones in the future.

Taking care of code … more and more code

(This is a shorter version of my talk for DeliveryConf, January 2020. Video of slides+audio; Slides as pdf)

Good software is still alive.

The other day, I asked my twelve year old daughter for recommendations of drawing programs. She told me about one (FireAlpaca?) “It’s free, and it updates pretty often.” She contrasted that with one that cost money “but isn’t as good. It never updates.”

The next generation appreciates that good software is updated regularly. Anything that doesn’t update is eventually terrible.

Software that doesn’t change falls behind. People’s standards rise, their needs change. At best, old software looks dumb. At worst, it doesn’t run on modern devices.

Software that doesn’t change is dead. You might say, if it still runs, it is not dead. Oh, sure, it’s moving around — but it’s a zombie. If it isn’t learning, it’ll eventually fall over, and it might eat your face.

I want to use software that’s alive. And when I make software, I want it to stay alive as long as it’s in use. I want it be “done” when it’s out of production.

Software is like people. The only “done” is death.

Alive software belongs to a team.

What’s the alternative? Keep learning to keep living. Software needs to keep improving, at least in small ways, for as long as it is running.

We have to be able to change it, easily. If Customer Service says, “Hey, this text is unclear, can you change it to this?” then pushing that out should be as easy as updating some text. It should be not be harder than when the software was in constant iteration.

This requires automated delivery, of course. And you have to know that delivery works. So you have to have run it recently.

But it takes more than that. Someone has to know — or find out quickly — where that text lives. They have to know how to trigger the deployment and how to check whether it worked.

More than that, someone has to know what that text means. A developer needs to understand that application. Probably, this is a developer who was part of its implementation, or the last major set of changes.

For the software to be alive, it has to be alive in someone’s head.

And one head will never do; the unit of delivery is the team. That’s more resilient.

Alive software is owned and cared for by an active team. Some people keep learning, keep teaching the software, and the shared sociotechnical system keeps living. The team and software form a symmathesy.

How do we keep all our software alive, while still growing more?

Okay, but what if the software is good enough right now? How do we keep it alive when there’s no big initiative to change it?

Hmm. We can ask, what kind of code is easy to change?

Code needs to be clean and modern.

Well, it’s consistent. It is up-to-date with the language versions and frameworks and libraries that we currently use for development.

It is “readable” by our current selves. It uses familiar styles and idioms.

What you don’t want is to come at the “simple” (from outside perspective) task of updating some text, and find you need to install a bunch of old tools, oh wait, there’s security patches that need to happen before this will pass pre-deployment checks. Oh now we have to upgrade more stuff to the modern versions of those libraries to work. You don’t want to have to resuscitate the software before you can breathe new life into it.

If changing the software isn’t easy enough, we won’t do it. And then it gets terrible.

So all those tool upgrades, security patches, library updates gotta have been done already, in the regular course of business.

Keeping those up to date gives us an excuse to change the code, trigger a release, and then notice any problems in the deployment pipeline. We keep confidence that we can deploy it, because we deploy it every week whether we need to or not.

People need to be on stable teams with customer contact.

More interesting than properties of the code: what are some properties of people who can keep code alive?

The team is stable. There’s continuity of knowledge.

The team understands the reason the software exists. The business significance of that text and everything else.

And we still care. We have contact with people who use this software, so we can check in on whether this text change works for them. We continue to learn.

Code belongs to one team.

More interesting still: what kind of relationship does the alive-keeping team have with the still-alive code?

Ownership. The code is under the care of a single team.

Good communication. We can teach the code (by changing it), so we have good deployment automation and we understand the programming language, etc. And the code can teach us — it has good tests, so we know when we broke something. It is accountable to us, in the sense that it can tell us the story of what happens. This means observability. With this, we can learn (or re-learn) how it works while it’s running. Keep the learning moving, keep the system living.

The team is a learning system, within a learning system.

Finally: what kind of environment can hold such a relationship?

(diagram of code, people, relationship, environment)

It’s connected; the teams are in touch with the people who use software, or with customer support. The culture accepts continued iteration as good, it doesn’t fear change. Learning flows into and out of the symmathesy.

It supports learning. Software is funded as a profit center, as operational costs, not as capital expenditure, where a project is “done” and gets deprecated over years. How the accounting works around development teams is a good indication of whether a company is powered by software, or subject to software.

Then there’s the tricky one: the team doesn’t have too much else on their plate.

How do we keep adding code to our responsibilities?

The team that owns this code also owns other code. We don’t want to update libraries all day across various systems we’ve written before. We want to do new work.

It’s like a garden; we want to keep the flowers we planted years ago healthy, and we also want to plant new flowers. How do we increase the number of plants we can care for?

And, at a higher level — how can we, as people who think about DevOps, make every team in our organization able to keep code alive?

Teams are limited by cognitive load.

This is not: how do we increase the amount of work that we do. If all we did was type the same stuff all the time, we know what to do — we automate it.

Our work is not typing; it’s making decisions. Our limitation is not what we can do, it is what we can know.

In Team Topologies, Manuel Pais and Matthew Skelton emphasize: the unit of delivery of a team, and the limitation of a team is cognitive load.

We have to know what that software is about, and what the next software we’re working on is about. and the programming languages they’re in, and how to deploy them, and how to fill out our timesheets and which kitchen has the best bubbly water selection, and who just had a baby, and — it takes a lot of knowledge to do our work well.

Team Topologies lists three categories of cognitive load.

The germane cognitive load, we want that.

Germane cognitive load is the business domain. It is why our software exists. We want complexity here, because the more complex work our software does, the less the people who use it have to bother with. Maximize the percentage of our cognitive load taken up by this category.

So which software systems a team owns matters; group by business domain.

Intrinisic cognitive load increases if we let code get out of date.

Intrinsic cognitive load is essential to the task. This is our programming language and frameworks and libraries. It is the quirks of the systems we integrate with. How to write a healthy database query. How the runtime works: browser behavior, or sometimes the garbage collector.

The fewer languages we have to know, the better. I used to be all about “the best language for the problem.” Now I recommend “the language your team knows best, as long as it’s good enough.”

And “fewer” includes versions of the language, so again, consistency in the code matters.

Extrinsic cognitive load is a property of the work environment. Work on this

Finally, extrinsic cognitive load is everything else. It’s the timesheet system. The health insurance forms. It’s our build tools. It’s Kubernetes. It’s how to get credentials to the database to test those queries. It’s who has to review a pull request, and when it’s OK to merge.

This is not the stuff we want to spend our brain on. The less extrinsic cognitive load on the team, the more we have room for the business and systems knowledge, the more responsibility we can take on.

And this is a place where carefully integrated tools can help.

DevOps is about moving system boundaries to work better. How can we do that?

We can move knowledge within the team, and we can move knowledge out to a different team.

We can move work below the line.

Within the team, we can move knowledge from the social side to the technical side of the symmathesy. We can package up our personal knowledge into code that can be shared.

Automations encapsulate knowledge of how to do something

Automate bits of our work. I do this with scripts.

The trick is, can we make sharing it with the team almost as easy as writing it for ourselves?

Especially automate anything we want to remain consistent.

For instance, when I worked on the docs at Atomist, I wrote the deployment automation for them. I made a glossary, and I wanted it in alphabetical order. I didn’t to put it in alphabetical order; I wanted it to constantly be alphabetical. This is a case for automation.

I wrote a function to alphabetize the markdown sections, and told it to run with every build and push the changes back to the repository.

Autofixes like this also keep the third party licenses up to date (all the npm dependencies and their licenses). This is a legal requirement that a human is not going to do. Another one puts the standard license header on any code that’s committed without it. So I never copied the headers, I just let the automation do that. Formatting and linting, same thing.

If you care about consistency, put it in code. Please don’t nag a human.

Some of that knowledge can help with keeping code alive

Then there’s all that drudgery of updating versions and code styles etc etc — weeding the section of the garden we planted last year and earlier. how much of that can we automate?

We can write code to do some of our coding for us. To find the inconsistencies, and then fix some of them.

Encapsulate knowledge about -when- to do something

Often the work is more than knowledge of -how- to do something. It is also -when-, and that takes requires attentiveness. Very expensive for humans. When my pull request has been approved, then I need to push merge. Then I need to wait for a build, and then I need to use that new artifact in some other repository.

Can we make a computer wait, instead of a person?

This is where you need an event stream to run automations in response to.

Galo Navarro has an excellent description of how this helped smooth the development experience at Adevinta. They created an event hub for software development and operations related activities, called Devhose. (This is what Atomist works to let everyone do, without implementing the event hub themselves.)

We can move some of that to a platform team.

Yet, every automation we build is code that we need to keep alive.

We can move knowledge across team boundaries, with a platform team. I want my team’s breadth of responsibility to increase, as we keep more software alive, so I want its depth to be reduced.

Team Topologies describes this structure. The business software teams are called “stream aligned” because they’re working in a particular value stream, keeping software alive for someone else. We want to thin out their extrinsic cognitive load.

Move some it to a platform team. That team can take responsibility for a lot of those automations. And deep knowledge of delivery and operational tooling. Keep the human judgement of what to deploy when in the stream-aligned teams, and a lot of the “how” and “some common things to watch out for” in the platform team.

Some things a platform team can do:

  • onboarding
  • onboarding of code (delivery setup)
  • delivery
  • checks every team needs, like licenses

And then, all of this needs to stay alive, too. Your delivery process needs to keep updating for every repository. If delivery is event-based, and the latest delivery logic responds to every push (instead of what the repo was last configured for), then this keeps happening.

But keep thinning our platforms.

Platforms are not business value, though. We don’t really want more and more software there, in the platform.

We do want to keep adding services and automation that helps the team. But growing the platform team is not a goal. Instead, we need to make our platforms thinner.

There is such a thing as “done”

The best way to thin our software is outsourcing to another company. Not the development work, not the decisions. But software as a service, IaaS, logging, tooling of all sorts — hire a professional. Software someone else runs is tech debt you don’t have.

So maybe Galo could move Devhose on top of Atomist and retire some code.

Because any code that isn’t describing business complexity, we do want to die. As soon as we can move onto someone else’s service, win. Kill it, take it out of production. Then, finally, it’s done.

So yeah. There is such a thing as done. “Done” is death. You don’t want it for your value-producing code. You do want it for all other code you run.

Don’t do boring work.

If keeping software alive sounds boring, then let’s change that. Go up a level of abstraction and ask, how much of this can we automate?

Writing code to change code is hard. Automating is hard.

That will challenge your knowledge of your own job, as you try to encode it into a computer. Best case, you get the computer doing the boring bits for you. Worst case, you learn that your job really is hard, and you feel smart.

Keep learning to keep living. Works for software, and it works for us.

Fun with Docker: “Release file… is not valid yet”

Today my Docker build failed on Windows because apt-get update failed because some release files were not valid yet. It said they’d be valid in about 3.5 hours. WAT.

I don’t care about your release files! Do not exit with code 100! This is not what I want to think about right now!

Spoiler: restarting my computer fixed it. 😤

This turned out to be a problem with the system time. The Ubuntu docker containers thought it was 19:30 UTC, which is like 8 hours ago. Probably five hours ago, someone updated the release files wherever apt-get calls home to. My Docker container considered that time THE FUTURE. The scary future.

Windows had the time right, 21:30 CST (which is 6 hours earlier than UTC). Ubuntu in WSL was closer; it thought it was 19:30 CST. But Docker containers were way off. This included Docker on Windows and Docker on Ubuntu.

Entertainingly, the Docker build worked on Ubuntu in WSL. I’m pretty sure that’s because I ran this same build there long ago, and Docker had the layers cached. Each line in the Dockerfile results in a layer, so Docker starts the build operation at the first line that has changed. So it didn’t even run the apt-get update.

This is one of the ways that Docker builds are not reproducible. apt-get calls out to the world, so it doesn’t do the same thing every time. When files were updated matters, and (now I know) what time your computer thinks it is matters.

Something on the internet suggested restarting the VM that Docker uses. It seems likely that Docker on WSL and Docker on Windows (in linux-container mode) are using the same VM under the hood somewhere. I don’t know how to restart that explicitly, so I restarted the computer. Now all the clocks are right (Windows, Ubuntu in WSL, and Ubuntu containers from both Docker daemons). Now the build works fine.

I’m not currently worried about running containers in production. (I just want to develop this website without installing python’s package manager locally. This is our world.) Still working in Docker challenges me to understand more about operating systems, their package managers, networking, system clocks, etc.

Docker: it puts the Ops in DevOps. That’s my day.

I don’t want the best

“Which database is the best?”

Superlatives are dangerous. They pressure you to line up the alternatives in a row, judge them all, make the perfect decision.

This implies that databases can be compared based on internal qualities. Performance, reliability, scalability, integrity, what else is there?

We know better than that. We recognize that relational databases, graph databases, document stores etc serve different use cases.

“What are you going to do with it?”

This is progress; we are now considering the larger problem. Can we define the best database per use case?

In 7 Rules of Effective Change, Esther Derby recommends balance among (1) inner needs and capabilities, (2) the needs and capabilities of those near you, and (3) the larger context and problem. (This is called “congruence.”)

“What is the best?” considers only (1) internal qualities. Asking “What are you doing with it?” adds (3) the problem we’re solving. What about (2) the people and systems nearby?

Maybe you want to store JSON blobs and the high-scale solution is Mongo. But is that the “best” for your situation?

Consider:

  • does anyone on the team know how to use and administer it?
  • are there good drivers and client libraries for your language?
  • does it run well in the production environment you already run?
  • do the frameworks you use integrate with it?
  • do you have diagnostic tools and utilities for it?

Maybe you’ve never used Mongo before. Maybe you already know PostgreSQL, and it already runs at your company. Can you store your JSON blobs there instead? Maybe we don’t need anything “better.”

The people we are and systems we have matter. If a component works smoothly with them, and is good enough to do the task at hand, great. What works for you is better than anyone else’s “best.”

Avoid Specificity in Expectations

Today my daughter overslept and I had to take her to school. Usually when that happens, I get all grouchy and resentful. Dammit, I thought I’d get a nice quiet coffee but NO, I’m scraping ice off the car.

Today I didn’t mind. It was fine. It’s the first day back after winter break, so I expected her to oversleep. I expected to take her to school.

I expected to scrape ice off the car… OK no, I forgot about that part and did get a little grouchy.

Our feelings surface from the difference between expectations and reality. Not from reality by itself.

In our career, people may ask us what we want in our next role.
“Where do you want to be in five years?” Answering this question hurts in several ways.

The more specific our vision of our future selves, the more we set ourselves up for disappointment.

We hide from ourselves all the other possibilities that we didn’t know about.

We tie our future self to the imagined desires of our present self.

In the next five years, I will gain new interests, new friends, and new ideas. Let those, along with the new situations I find myself in, guide my future steps.

Expectations are dangerous. We need some of them, especially in the near future, to take action. The less specific they can be, the more potential remains open to us, and the happier we can be.

Tomorrow, I will have a quiet coffee or take my daughter to school; I don’t have to know which until it happens. Five years from now, I have no idea what I’ll be doing — probably something cooler than my present self can imagine.