Layers in software: from data to value

Then

Back in the 2000s, we wrote applications in layers.

Presentation layer, client, data transfer, API, business logic, data access, database. We maintained strict separation bet ween these layers, even though every new feature changed all of them. Teams organized around these layers. Front end, back end, DBAs.

Each layer of software is a wide box, next to its team.
They stack on top of each other: frontend stuff, backend stuff, database, each with its team.
At the top are some customers. Value flows from them to the db and back, crossing all the layers.
Business value exists only by flowing through all the layers to the DB and back.

Layers crisscrossed the flow of data.

Responsibility for any one thing to work fell across many teams.

Interfaces between teams updated with every application change.

Development was slow and painful.

Now

In 2019, we write applications in layers.

A business unit is supported by a feature team. Feature teams are supported by platforms, tooling, UI components. All teams are supported by software as a service from outside the company.

Feature teams at the top of the software are multicolored, with multiple components in their software.
Under them are platform and component teams, each different.
Under them are nice square boxes of external services.
Business value flows through the top layer (feature teams), staying close to the business people.
Developer value flows between the feature teams, through the internal teams, to external services and back.
Business value is concentrated in the feature teams; developer value flows through support teams and external services.

Back in the day, front end, back end, operations, and DBAs separated because they needed different skills. Now we accept that a software team needs all the skills. We group by responsibility instead — responsibility for business value, not for activities.

Supporting teams provide pieces in which consistency is essential: UI components and some internal libraries.

Interfaces between teams change less frequently than the software changes.

Layers crisscross the flow of value.

DevEx

Feature teams need to do everything, from the old perspective. But that’s too hard for one team — so we make it easier.

This is where Developer Experience (DevEx) teams come in. (a.k.a. Developer Productivity, Platform and Tools, or inaccurately DevOps Teams.) These undergird the feature teams, making their work smoother. Self-service infrastructure, smooth setup of visibility and control for production software. Tools and expertise to help developers learn and do everything necessary to fulfill each team’s purpose.

Internal services are supported by external services. Managed services like Kubernetes, databases, queueing, observability, logging: we have outsourced the deep expertise of operating these components. Meanwhile, internal service teams like DevEx have enough understanding of the details, plus enough company-specific context, to mediate between what the outside world provides and what feature teams need.

This makes development smoother, and therefore faster and safer.

We once layered by serving data to software. Now we layer by serving value to people.

Change at different timescales

On recommendation from @mtnygard and others, I have acquired a copy of How Buildings Learn (Stewart Brand, 1994). Highlights so far:

Buildings are designed not to adapt, but they adapt anyway, because the usages are changing constantly. “The idea is crystalline, the fact fluid.” They’re designed not to adapt because “‘Form ever follows function’ … misled a century of architects into believing that they could really anticipate function.”

We think of buildings as static, because they change at a timescale slower than our notice. They change over generations. Humans operate at some natural timescale, and we see things slower as unchanging, and things faster as transient, insignificant. We are uncomfortable with change in stuff we think of as permanent, like buildings or culture or language. It isn’t permanent, just slower than us.

A jar of honey on its side will leak. The lid doesn’t trap the honey, just slows it down. Imperceptible movement is invisible, until the whole cupboard is sticky.

Software’s pace of change can be unnaturally fast, the opposite of buildings. That makes people uncomfortable. Updating buildings, “we deal with decisions taken long ago for remote reasons.” In software, “long ago” might be last year.

As usages change, so must our environs, in brick and in computers.

What changes faster than usages? Fashion. “Buildings are treated by fashion as big, difficult clothing, always lagging embarrassingly behind the mode of the day. This issue has nothing to do with function.” The latest hot tech may not improve on the value of your legacy software. “The meaningless change of fashion often obstructs necessary change.”

Hyperproductive development

TL;DR: the most productive development happens when one person knows the system intimately because they wrote it; this is in conflict with growing a system beyond what one person maintains.

Let’s talk about why some developers, in some situations, are ten times more productive than others.

hint: it isn’t the developers, so much as the situation.

When do we get that exhilarating feeling of hyperproductivity, when new features flow out of our fingertips? It happens when we know our tools like the back of our hands, and more crucially, when we know the systems we are changing. Know them intimately, like I know the contents of my backpack, when I packed it and I tuned the items in each pouch over years of travel. Know the contents of every module, both what they are and what we’d like them to be if we ever finish that refactoring. Know the edges, who uses every API and which changes will break whom, and we’re friends with all of the stakeholders. Know the underpinnings, which database fields are indexed and which are obsolete and which have quirky special values. Know the infrastructure, where it runs in production and how to ssh in; where it runs in test and what version is deployed and when it is safe to push a new one. Know the output, what looks normal in the logs and what’s a clue. We have scripts, one-liners that tail the logs in all three prod instances to our terminals so our magic eyes can spot the anomaly.

We know it because we wrote it, typically. It is extremely difficult to establish this level of intimacy with an existing system. Braitenberg calls this the Law of Downhill Invention, Uphill Analysis. Complex systems are easier to build than to figure out after they’re working.

We know it because we are changing it. The system is alive in our head. It’s a kind of symbiosis: we help the system run and grow, and the system works the way we wish. If we walk away for a month or two, begin a relationship with a different system, the magic is lost. It takes time to re-establish familiarity.

Except, I’m suspicious of this description. “We” is a plural pronoun. This depth of familiarity and comfort with a system is personal: it’s usually one person. The one person who has conceived this solution, who holds in their head both the current state and where they are aiming.

If you are this person, please realize that no one else experiences this work the way you do. For other people, every change is scary, because they don’t know what effect it will have. They spend hours forming theories about how a piece works, and then struggle to confirm this with experiment; they don’t have the testing setup you do. They study every log message instead of skimming over the irrelevant ones, the ones you’ve skipped over so often you don’t even see them anymore. By the time they do figure something out, you’ve changed it; they can’t gain comprehension of the system as quickly as you can alter it. When they do make a change, they spend lots of time limiting the scope of it, because they don’t know which changes will cause problems. They get it wrong, because they don’t know the users personally; communication is hard.

If you are this person, please go easy on everyone else. You are a synthetic biologist and can alter the DNA of the system from within; they are xenosurgeons and have to cut in through the skin and try not to damage unfamiliar organs.

If you work with this person, I’m sorry. This is a tough position to be in, to always feel inferior and like you’re breaking everything you touch. I’m there now, in some parts of our system. It’s okay for me because the host symbiont, the author and manipulator of that software, is super nice and helpful. He doesn’t expect me to work with it the same way he does.

If your team looks like this, here are some steps to take:

  1. Consider: don’t change it. This really is the fastest way to develop software. One person, coordinating with no other developers, can move faster than a whole team when the system is small enough. Until! we need the system to grow bigger. Or! the system is crucial to the business (it’s an unacceptable risk for only one person to have power over it).
  2. As the host symbiont who lives and breathes the system: strike the words “just”, “easy,” “obvious,” “simple,” and “straightforward” from your vocabulary. These words are contextual, and no other human shares your context.
  3. Please write tests. Tests give people who are afraid of unintentional breakages a way to test their theories. Experimentation is crucial to learning how a system works, and tests make experiments possible. They also serve as documentation of intended behavior.
  4. Pair program! By far the best way to transfer understanding of the system to another human is to change it together. (Or boost a whole team at once: mob program!)
  5. Make a README for other developers. Describe the purpose of the system briefly, and document how you develop, test, and troubleshoot the system. Specify the command lines for running tests, for deployment, for accessing logs. Describe in detail how to obtain the necessary passwords. Write down all the environments where it runs, and the protocol around changing them.
  6. Do you know your users better than anyone else? Remedy that. Bring other team members into the discussion. (There’s a sweet spot of a single developer-type who works within a business unit. When the software becomes too important for this scale, it gets harder.) Let all the devs get to know the users. Have happy hours. Form redundant communication channels. It’ll pay off in ways you never detect.
  7. Slow down. Like seriously, if one person is developing at maximum speed on a project, no one else can get traction. You can’t move at full speed and also add symbionts. When it is important to bring in new people, don’t do anything alone. Pair on everything. Yes, this will slow you down. It will speed them up. Net, we’ll still be slower than you working alone. This is an inherent property of the larger system, which now includes interhuman coordination. There’s more overhead than when it was just you and your program. That’s OK; it’s a tradeoff for safety and scale from sheer speed.

Let’s acknowledge that there really are developer+situations that are 10x more productive than others. Let’s acknowledge that they don’t scale. Make choices about when we can take advantage of the sweet spot of local or individual automation, and when the software we’re building is too important for a bus factor of one.
Distinguish between an experimental prototype, when speed of change and redirection is crucial, versus a production app which needs backwards compatibility guarantees and documentation and all that seriousness — this requires a solid team.

Recognize that the most productive circumstance for development is a rare circumstance. Most of the time, I need to work on a system that someone else wrote. (that “someone else” could be “me, months or years ago.”) The temptation to rewrite is strong, because if I rewrite it then I’ll understand it.

There’s a time for 10x development, and a time for team development. When you want to be serious, the 10x developer prevents this. If that’s your situation, please consider the suggestions in this post.

Code and Coders: components of the sociotechnical system

TL;DR: Study all the interactions between people, code, and our mental models; gather data and we can make real improvements instead of guessing in our retros.

Software is hard to change. Even when it’s clean, well-factored, and everyone working on it is sharp and nice. Why?

Consider a software team and its software. It’s a sociotechnical system; people create the code and the code affects the people.

a blob of code and several people, with two-way arrows between the code and the people and the people

When we want to optimise this system to produce more useful code, what do we do? How do we make the developer->code interactions more productive?

the sociotechnical system, highlight on each person

As a culture, we started by focusing on the individual people: hire those 10x developers! As the software gets more complex, that doesn’t go far. An individual can only do so much.

the sociotechnical system, highlight on the arrows between peopleThe Agile software movement shifted the focus to the interactions between the people. This lets us make improvements at the team level.
the sociotechnical system, highlight on the blob of codeThe technical debt metaphor let us focus on how the code influences the developers. Some code is easier to change than other code.

We shape our tools, and thereafter our tools shape us. – McCluhan

the sociotechnical system, highlight on the arrows reaching the codeTest-driven development focuses on a specific aspect of the developercode interaction: tightening the feedback loop on “will this work as I expected?” Continuous Integration has a similar effect: tightening the feedback loop on “will this break anything else?”

All of these focuses are useful in optimizing this system. How can we do more?

Thereʼs a component in this system that we haven’t explicitly called out yet. It lives in the heads of the coders. Itʼs the developerʼs mental model of the software.

a blob of code and two people. The people have small blobs in their heads. two-way arrows between the code and the small blobs, and between the people
Each developerʼs mental model of the software matches the code (or doesn’t)

Every controller must contain a model of the process being controlled.
Nancy Leveson, Engineering a Safer World

When you write a program, you have a model of it in your head. When you come to modify someone else’s code, you have to build a mental model of it first, through reading and experimenting. When someone else changes your code, your mental model loses accuracy. Depending on the completeness and accuracy of your mental model of the target software, adding features can be fun and productive or full of pain.

Janelle Klein models the developer⟺code interaction in her book Idea Flow.  We want to make a change, so we look around for a bit, then try something. If that works, we move forward (the Confirm loop). If it doesn’t work, we shift into troubleshooting mode: we investigate, then experiment until we figure it out (the Conflict loop). We update our mental model. When weʼre familiar with the software, we make forward progress (Confirm). When weʼre not, pain! From the book:

to make a change, start with learn; modify; validate. If the validation works, Confirm! back to learn. If the validation is negative, Conflict! on to troubleshooting; rework; validate.

That 10x developer is the one with a strong mental model of this software. Probably they wrote it, and no one else understands it. Agile (especially pairing) lets us transfer our mental model to others on the team. Readable code makes it easier for others to construct an accurate mental model. TDD makes that Confirm loop happen many more times, so that Conflict loops are smaller.

We can optimize this developer⟺code interaction by studying it further. Which parts of the code cause a lot of conflict pain? Focus refactoring there. Who has a strong mental model of each part of the system, and who needs that model? Pair them up.

Idea Flow includes tools for measuring friction, for collecting data on the developer⟺code interaction so we can address these problems directly. Recording the switch from Confirm to Conflict tells us how much of our work is forward progress and how much is troubleshooting, so we can recognize when we’re grinding.

Even better, we have data on the causes of the grinding.

We can reflect and choose actions based on what’s causing the most pain, rather than on gut feel of what we remember on the day of the retrospective.

Picturing those internal models as part of the sociotechnical system changes my actions in subtle ways. For instance I now:

  • observe which of my coworkers are familiar with each part of the system.
  • refactor and then throw it away, because that improves my mental model without damaging anyone else’s.
  • avoid writing flexible code if I don’t need it yet, because alternatives inflate the mental model other people have to build.
  • spending more time reviewing PRs in order to keep my model up-to-date.

We can’t do this by focusing on people or code alone. We have to optimize for learning. Well-factored code can help, but it isn’t everything. Positive personal interactions help, but they aren’t everything. Tests are only one way to minimize conflict. No individual skill or familiarity can overcome these challenges.

If we capture and optimize our conflict loops, consciously and with data, we can optimize the entire sociotechnical system. We can make collaborative decisions that let us change our software faster and faster.

Reuse

Developers have a love-hate relationship with code re-use. As in, we used to love it. We love our code and we want it to run everywhere and help everyone. We want to get faster with time by harnessing the work of our former selves.
And yet, we come to hate it. Reuse means dependencies. It means couplings. It means surprises, when changing code impacts something we did not expect, or else it means don’t touch it, it’s too scary. It means trusting code we don’t understand because it’s code didn’t write.

//platform.twitter.com/widgets.js
Here’s the thing: sharing code is dangerous. Do it sparingly.

When reuse is bad

Let’s talk about sharing code. Take a business, developing software for its employees or its customers. Let’s talk about code within an organization that is referenced in more than one service, or by multiple flows in a monolith. (Monolith is defined as “one deployable unit maintained by more than one small team.”)

Let’s see some pictures. Purple Service here has some classes or functions that it finds useful, and the team thinks these would be useful elsewhere. Purple team breaks this code out into a library, the peachy circle.

purple circle, peach circle inside

Then someone from Purple team joins Blue team, and uses that library in Blue Service. You think it looks like this:

peach circle under blue and purple circles

Nah, it’s really more like this:

purple circle with peach circle inside. Blue circle has a line to peach circle

This is called coupling. When Purple team changes their library, Blue team is affected. (If it’s a monolith, their code changed underneath them. I hope they have good tests.)
Now, you could say, Blue team doesn’t have to update their version. The level of reuse is the release, we broke out the library, so this is fine.

picture of purple with orange circle, blue with peach circle.

At that point you’ve basically forked, the code isn’t shared anymore. When Blue team needs to make their own changes, they first must upgrade, so they get surprised some unpredictable time later. (This happened to us at Outpace all the time with our shared “util” libraries and it was the worst. So painful. Those “timesavers” cost us a lot of time and frustration.)

This shared code is a coupling between two services that otherwise have nothing to do with each other. The whole point of microservices was to decouple! To make it so our changes impact only code that our team operates! dead. and for what?

To answer that, consider the nature of the shared code. Why is it shared?
Perhaps it is unrelated to the business: it is general utilities that would otherwise be duplicated, but we’re being DRY and avoiding the extra work of writing and testing and debugging them a second time. In this case, I propose: cut and paste. Or fork. Or best of all, try a more formalized reuse-without-sharing procedure [link to my next post].

What if this is business-related code? What if we had good reason to DRY it out, because it would be wrong for this code to be different in Purple Service and Blue Service? Well sorry, it’s gonna be different. Purple and Blue do not have the same deployment schedules, that’s the point of decoupling into services. In this case, either you’ve made yourself a distributed monolith (requiring coordinated deployments), or you’re ignoring reality. If the business requires exactly one version of this code, then make it its own service.

picture with yellow, purple, and blue circles separate, dotty lines from yellow to purple and to blue.

Now you’re not sharing code anymore. You’re sharing a service. Changes to Peachy can impact Purple and Blue at the same time, because that’s inherent in this must-be-consistent business logic.

It’s easier with a monolith; that shared code stays consistent in production, because there is only one deployment. Any surprises happen immediately, hopefully in testing. In a monolith, if Peachy is utility classes or functions, and Purple (or Blue) team wants to change them, the safest strategy is: make a copy, use the copy, and change that copy. Over time, this results in less shared code.

This crucial observation is #2 in Modern Software Over-engineering Mistakes by RMX.

“Shared logic and abstractions tend to stabilise over time in natural systems. They either stay flat or relatively go down as functionality gets broader.”

Business software is an expanding problem. It will always grow, and not with more of the same: it will grow in ways you didn’t plan for. This kind of code must optimize for change. Reuse is the enemy of change. (I’m talking about reuse of internal code.)

Back in the beginning, Blue team reused the peach library and saved time writing code. But writing code isn’t the expensive part, compared to changing code. We don’t add features faster as our systems get larger and we have more code hypothetically available for re-use. We add features more slowly, because every change has more impacts and is less safe. Shared code makes change less safe. The only code safe to share is code that doesn’t change. Which means no versioning. Heck, you might as well have cut and pasted it.

When reuse is good

We didn’t advance as an industry by rewriting, or cut and pasting, everything we need over and over. We build on libraries published by developers and companies all over the globe. They release them, we reuse them. Yes, we get into dependency hell, but it beats writing your own web framework. We get reuse not only of the code, but of understanding: Rails knowledge transfers between employers.

There is a tipping point where reuse is magical.

I argue that this point is well past a release, past a separate jar.
It is past a stable API
past a coherent abstraction
past automated tests
past solid documentation…

All these might be achieved within the organization if responsibility for the shared utilities lives in a separate team; you can try to use Conway’s Law to enforce architectural boundaries, but within an org, those boundaries are soft. And this code isn’t your business, and you don’t have incentives to spend the time on these. Why have backwards compatibility when you can perform human coordination instead? It isn’t worth it. In my past organizations, shared code has instead been the responsibility of no one. What starts out as “leverage” becomes baggage, as all the Ruby code is tied to an old version of Sinatra. Some switch to Go to get a clean slate.
Break those chains! Copy the pieces you need out of that internal library and make them yours.

At the level of winning reuse, that code has its own marketing department
its own sales team
its own office manager
its own stock price.

The level of reuse is the company.

(Pay for software.)

When the responsible organization succeeds by making its code stable and backwards-compatible and easy to work with and well-documented and extensively tested, that is code I want to reuse!

In addition to SaaS companies and vendors, there are organizations built around open-source software. This is why we look for packages and frameworks with a broad community around them. Or better, a foundation for keeping shared projects healthy. (Contribute to them.)

Conclusion

Reuse is dangerous because it introduces coupling. Share business code only when that coupling is inherent to the business domain. Share library and utility code only when it is maintained by an organization dedicated to publishing that code. (Same with services. If you can pay for infrastructure-level tools, you’ll get better tools without distracting your organization.)

Why did we want to reuse internal code anyway?
For speed, but speed of change is more important.
For consistency, but that means coupling. Don’t hold your teams back with it.
For propagation of bug fixes, which I’ve not seen happen.

All three of these can be automated [LINK to my next post] without dependencies.

Next time you consider making your code reusable, ask “who will I sell this to?”
Next time someone (including you) suggests you reuse their code, ask “who publishes that?” and if they say “me,” copy it instead.

Tradeoffs in Coordination Among Teams

The other day in Budapest, Jez Humble and I wondered, what is the CAP theorem for teams? In distributed database systems, the CAP theorem says: choose two of Consistency, Availability, and Partitioning — and you must choose Partitioning.
Consider a system for building software together. Unless the software is built by exactly one person, we have to choose Partitioning. We can’t meld minds, and talking is slow.
In databases we choose between Consistency (the data is the same everywhere) and Availability (we can always get the data). As teams grow, we choose between Consensus (doing things for the same reasons in the same way) and Actually-getting-things-done.
Or, letting go of the CAP acronym: we balance Moving Together against Moving Forward.

Moving Together

A group of 1 is the trivial case. Decision-making is the same as consensus. All work is forward work, but output is very limited, and when one person is sick everything stops.
A group of 2-7 is ideal: the communication comes with interplay of ideas, and whole new outputs of dialogue make up for the time cost of talking to each other. It is still feasible for everyone in the group to have a mental model of each other person, to know what that person needs to know. Consensus is easy to reach when every stakeholder is friends with every other stakeholder.
Beyond one team, the tradeoffs begin. Take one team of 2-7 people working closely together. Represent their potential output with this tall, hollow arrow pointing up.
This team is building software to run an antique store. Look at them go, full forward motion. (picture: tall, filled arrow.)
Next we add more to the web site while continuing development on the register point-of-sale tools. We break into two teams. We’re still working with the same database of items, and building the same brand, so we coordinate closely. We leverage each others’ tools. More people means more coordination overhead, but we all like each other, so it’s not much burden. We are a community, after all.
A green arrow and a red arrow, each connected by many lines of communication, are filled about halfway up with work.
Now the store is doing well. The web site attracts more retail business, the neighboring antique stores want to advertise their items on our site, everything is succeeding and we add more people. A team for partnerships, which means we need externally-facing reports, which means we need a data pipeline.
A purple arrow and a blue arrow join the red and green ones. Lines crisscross between them, a snarly web. The arrows are filled only a little because of these coordination costs. The purple arrow is less connected, and a bit more full, but it’s pointed thirty degrees to the left.
The same level of consensus and coordination isn’t practical anymore. Coordination costs weigh heavily. New people coming in don’t get to build a mental model of everyone who already works there. They don’t know what other people know, or which other people need to know something. If the partnerships team touches the database, it might break point of sale or the web site, so they are hamstrung. Everyone needs to check everything, so the slowest-to-release team sets the pace. The purple team here is spending less time on coordination, so the data pipeline is getting built, but without any ties to the green team, it’s going in a direction that won’t work for point of sale.
This mess scales up in the direction of mess. How do we scale forward progress instead?

Moving Forward

The other extreme is decoupling. Boundaries. A very clear API between the data pipeline, point of sale, and web. Separate databases, duplicating data when necessary. This is a different kind of overhead: more technical, less personal. Break the back-end coupling at the database; break the front-end (API) coupling with backwards compatibility. Teams operate on their own schedules, releases are not coordinated. This is represented by wider arrows, because backwards compatibility and graceful degradation are expensive. 
Four arrows, each wide and short. A few lines connect them. They’re filled, but the work went to width (solidness) rather than height (forward progress).
These teams are getting about as far as the communication-burdened teams. The difference is: this does scale out. We can add more teams before coordination becomes a limitation again.
Amazon is an extreme example of this: backwards compatible all the things. Each team Moving Forward in full armor. Everything fully separate, so no team can predict what other teams depend on. This made the AWS products possible. However, this is a ton of technical overhead, and maybe also not the kindest culture to work in.
Google takes another extreme. Their monorepo allows more coupling between teams. Libraries are shared. They make up for this with extreme tooling. Tests, refactoring tools, custom version control and build systems — even whole programming languages. Thousands of engineers work on infrastructure at Google, so that they can Move Together using technical overhead.

Balance

For the rest of us, in companies with 7-1000 engineers, we can’t afford one extreme or the other. We have to ask: where is consensus important? and where is consensus holding us back?
Consensus is crucial in objectives and direction. We are building the same business. The business results we are aiming for had better be the same. We all need to agree on “Which way is up?”
Consensus is crippling at the back end. When we require any coordination of releases. When I can’t upgrade a library without impacting other teams in way I can’t predict. When my database change could break a system more production-critical than mine. This is when we are paralyzed. Don’t make teams share databases or libraries.
What about leveraging shared tools and expertise? if every team runs its own database, those arrows get really wide really fast, unless they skimp on monitoring and redundancy — so they will skimp and the system will be fragile. We don’t want to reinvent everything in every team.
The answer is to have a few wide arrows. Shared tools are great when they’re maintained as internal services, by teams with internal customers. Make the data pipeline serve the partnership and reporting teams. Make a database team to supply well-supported database instances to the other teams. (They’re still separate databases, but now we have shared tools to work with them, and hey look, a data pipeline for syncing between them.)
The green, red, and blue arrows are narrow and tall, and mostly full of work, with some lines connecting them. The purple arrow and a new black arrow are wide and short and full of work. The wide arrows (internal services) are connected to the tall arrows (product teams) through their tips.
Re-use helps only when there is a solid API, when there is no coupling of schedules, and when the providing team focuses on customer service.

Conclusions

Avoid shared code libraries, unless you’re Google and have perfect test coverage everywhere, or you’re Amazon and have whole teams supporting those libraries with backwards compatibility.
Avoid shared database instances, but build internal teams around supporting common database tools.
Encourage shared ideas. Random communication among people across an organization has huge potential. Find out what other teams are doing, and that can refine your own direction and speed your development — as long as everything you hear is information, not obligation.
Reach consensus on what we want to achieve, why we are doing it, and how (at a high level) we might achieve it. Point in the same direction, move independently.
Every organization is a distributed system, even when we sit right next to each other. Coordination makes joint activity possible, but not free. Be conscious of the tradeoffs as your organization grows, as consensus becomes less useful and more expensive. Recognize that new people entering the organization experience higher coordination costs than you do. Let teams move forward in their own way, as long as we move together in a common direction. Distributed systems are hard, and we can do it.


Bonus material 

Here is a picture of Jez in Budapest.

And here is a paper about coordination costs:
Common Ground and Coordination in Joint Activity

Patterns and details

Christopher Alexander, the architect who wrote the ORIGINAL original patterns book, contributed a forward to Patterns in Software. He reports that in the years since Patterns in Architecture

“We have begun to make
buildings which really do have the quality I sought for all those year… This has come about in large part because, since 1983, our
group has worked as architects and general contractors. Combining these two
aspects of construction in a single office, we have achieved what was impossible
when one accepts the split between design and construction.”

This applies to software architecture as well. Some intimacy with the details of construction and the broad picture are not separable, not if you’re looking to build something with life in it.

The foreword references an article I can’t find a copy of, Alexander’s “Manifesto 1991.” I did find some excerpts. Here are some bits from his “Hippocratic Oath for Architecture” that apply to software as well as buildings:

No matter how big the building is, the architect does some craft work on every building, with his (or her) own hands.

The involvement of users in the process is necessary – and widespread.

Engineering is part of architecture, and building is conceived while being engineered.

Make each building small in importance in relation to the life of the surrounding world which it supports.

process, not design, is the crux, and that the beauty and functional harmony of the building comes from a thousand small steps

The architect is committed to daily work and experimentation with techniques of making, forming, fabrication, and construction, with an understanding that new methods of building are essential to the creation of harmony and beauty.

The life of the buildings will never be profound or worthwhile unless the life of the construction workers, and their spiritual evolution, is important.

Post-agile: microservices and heads-up development

Notes from Craft Conference 2015, Budapest.

Craft conference was all about microservices this year.[1] Yet, it was about so much more at the same time — even when it was talking about microservices.

lobby of the venue. Very cool, and always packed

Dan and I went on about microservices in our opening keynote,[2] about how it’s not about size, it’s about each service being a responsible adult and taking care of its own data and dependencies. And being about one bounded context, so that it has fewer conflicting cross-cutting concerns (security, durability, resilience, availability, etc) to deal with at any one time.

But it was Mary Poppendieck, in her Friday morning keynote,[3] who showed us why microservices aren’t going away, not any more than the internet is going away. This is how systems grow: through federation and wide participation. (I wish “federated system” wasn’t taken by some 1990s architecture; I like it better than “microservices.”) Our job is no longer to control everything all the computers do, to make it perfectly predictable.[a]

Instead, we need to adapt to the sociotechnical system around us and our code. No one person in can understand all the consequences of their decision, according to Michael Nygard.[4] We can’t SMASH our will upon a complex system, Mary says, but we can poke-poke-poke it; see how it responds; and adjust it to our purposes.

What fun is this?? I went into programming because physics became unsatisfying once I hit quantum mechanics, and I couldn’t know everything all at once anymore. Now I’m fascinated by systems; to work with a system is to be part of something bigger than me, bigger than my own mental model. This is going to be a tough transition for many programmers, though. We spent our training time learning to control computers, and now we are exhorted to give up control, to experiment instead.

And worse: as developers must adapt, so must our businesses. In the closing keynote,[5] Marty Cagan made it very clear that our current model is broken. When most ideas come from executives, implemented according to the roadmap, it doesn’t matter how efficient our agile teams are: we’re wasting our time. Most ideas fail to make money. And the ones that do make money usually take far longer than expected. He ridicules the business case: “How much revenue will it generate? How much will it cost?” We don’t know, and we don’t know! Instead of measuring the impact of an idea after months of development, product teams need to measure in hours or days. And instead of a few ideas from upper management, we need to try out many ideas from the most fruitful source: developers. Because we’re most in the position to see what has just become possible.

Exterior of the venue! (after the tent is down.)

I’d say “developers are a great source of innovation,” except Alf Rehn reports that the word has been drained of meaning.[6] Marty Cagan corroborates that by using “ideas” throughout his keynote instead of “innovation.” So where do these ideas come from? Diversity, says Pieter Hintjens,[7] let people try lots of things. Discovery, says Mike Nygard, let them see what other teams are doing.

Ideas come from having our heads up, not buried only in the code. They come from the first objective of software architecture: understanding the business problem. They come from handing teams an objective, NOT a roadmap. Marty Cagan made that point very clear. Adrian Trenaman concurred,[8] describing how Gilt teams went from a single IT to a team per line of business to a team per initiative. It is about results, measured outcomes.

All these measurements, of results, of expectations, of production service activity, come down to my favorite question – “How do we know what we know?”[b]Property-based (aka generative) testing is experiencing a resurgence (maybe its first major surgence) lately, as black-box testing around service-level components. In my solo talk,[10] I proposed a possible design for lowering the risk around interacting components. Mary had some other ideas in her talk too, which I will check out. Considering properties of a service can help us find the seams that align simplicity with options.

Mike Nygard remarked that the most successful microservices implementations he’s seen started as a monolith, where refactoring identified those seams. There’s nothing wrong with a monolith when that supports the business objectives; Randy Shoup said that microservices solve scaling problems, not business problems.[9]Mike and Adrian both pointed out that a target architecture is not a revolution, but an evolving direction. Architecture is like a city: as we build microservices in the new, hip part of town, those legacy tenements are still useful. The architecture is done only when the company goes out of business. Instead of working to a central plan, we want to develop situational awareness (“knowing what’s happening in time to do something about it”[3]), and choose to work on what’s most important right now.

It isn’t enough to be good at coding anymore. The new “full-stack” is from network to customer. Marty: if your developers are only coding, you’re not getting half their value. I want to do heads-up development. “Software Craftsmanship is less about internal efficiency, and more about engaging with the world around us,” says Alf Rehn. “Creators need an immediate connection to what they are creating,” quotes Mary Poppendieck.

As fun as it is to pop the next story off the roadmap and sit down and code it, we can have more impact. We can look up, as developers, as organizations. We can look at results, not requirements. We can learn from consequences, as well as conferences.

This transition won’t be easy. It’s the next step after agile. Microservices are a symptom of this kind of focus, the way good retrospectives are associated with constant improvement. Sure, it’s all about microservices – in that microservices are about reducing friction and lowering risk. The faster we can learn, the farther we can get.


I’ll add the links as Gergely posts the videos.

[1] Maciej was starting to get bored
[2] my keynote with Dan, “Complexity is Outside the Code”
[3] Mary Poppendieck’s keynote, “The New New Software Development Game”
[a] Viktor Klang: “Writing software that is completely deterministic is nonsense because no machine is completely deterministic,” much less the network.
[4] Mike Nygard’s talk, “Architecture Without an End State”
[5] Marty’s keynote
[6] Alf Rehn (ah!  what a beautiful speaker! such rhythm!) keynote. Maybe he didn’t allow recording?
[7] Pieter’s talk
[8] Adrian’s talk, “Scaling Micro-services at Gilt”
[b] OK my real favorite question is “What is your favorite color?” but this is a deep second.
[9] Randy’s talk, “From the Monolith to Microservices”
[10] my talk, “Contracts in Clojure: a compromise between types and tests”

The opposite of simple is not complex

Studying biology or economics, one finds organisms, ecosystems, and economies that are more than the sum of their parts. Somehow many interacting agents with limited information produce increasing organization, creating amazing complexity out of relatively simple components.

In computing, if we want to harvest this potential for surprise, see results this interesting, we have to write complex systems.
This doesn’t mean we want to write complicated systems.
Wait, isn’t that contradictory? Complex systems that are not complicated?
Ah, but that is the whole point of complexity theory. 
Herbert Simon described a complex system as being composed of hierarchy and near-decomposable parts.[1] Just as an economy is composed of many humans making decisions independently, a piece of software can be composed of many parts that can’t see inside each other. This preserves the potential for producing complex behavior, solving a complex problem. Yet, each part within the system may provide a simple abstraction.
The difference between complex and complicated – or, as Rich Hickey calls it, complected – is intertwining the different modules in the system. An application may consist of many uniform modules, each of which knows about the inner workings of the other — complicated. Or it may consist of even more modules, each of which interacts only with the abstractions exposed by the others — simpler, and yet with potentially complex results.
The key is breaking the software into independent modules, each of which says to the others “I don’t know how you do what you do, and I don’t care.” Each module or component can be developed by a separate team with different coding standards or in a different language. Uniformity is compromised *gasp*. No single architect understands all parts of the system. Strict top-down, God-like orchestration is sacrificed. Instead, component interactions are abstract. Teams interact at higher levels, without knowledge of each others’ code.
Each component handles versioning, security, releases, backwards compatibility – everything, in its own way. Reuse is at the component level, at the level of release, not at the class or function level.
A system based on independent components like these probably contains more code than a functionality-equivalent system that is orchestrated top-down. That’s okay. It has potential to support much greater solution complexity.

A fact of software development: 

For every 25 percent increase in problem complexity, there is a 100 percent increase in complexity of the software solution. That’s not a condition to try to change (even though reducing complexity is always a desirable thing to do); that’s just the way it is. [2]

This is true because every feature we add potentially impacts every feature we already support. The increase in solution complexity is combinatorial, not linear. The way to combat this is to reduce the complectedness of our software. Stop intertwining all the bits — a coherent architecture is more efficient in lines of code, but it is inherently limiting. It is limited by what the God Architect can hold in his head.

Break the software into simpler components. Some of these may be open-source components, which someone has to learn well enough to configure and implement. Others will be custom. Break them off into teams and give those teams autonomy. Let each team implement its piece in the simplest way possible for that particular problem.

Meanwhile, the systemwide architects don’t know the particulars of each solution. In fact, they should not know the particulars. That would allow them to base solutions on implementation details. Don’t let that happen! Raise the level of abstraction. Components must interact with abstractions, not with each other. Knowledge of inner workings of other modules is a negative.

Jeff Bezos forced this strategy at Amazon around 2002 by decree: all teams will interact via service interfaces only; they can use whatever technology they want; and all interfaces must be externalizable, ready to be exposed to the outside world. The result? The largest online bookseller became the largest vendor of cloud computing. Did anyone predict that ten years ago?

When you and I interact, our brains do not interact directly. Rather, we both interact with the physical world. I say something, you hear it. Abstractions on both sides reduce the granularity of the interaction, but maintain the independence of our individual brains. No mind-melds allowed. While this seems limiting, observation shows that amazing unpredictable structures – nations, communities, economies – emerge from these limited interactions.

Complicated systems are limited in growth. Complex systems have even greater potential for growth than the designers of the components conceived. Keep your components simple and independent, and there is no end to the problems we can solve.

[1] Complexity, A Guided Tour, by Melanie Mitchell, chapter 7
[2] Facts and Fallacies of Software Engineering, by Robert L. Glass, p. 58

Don Quixote was an Enterprise Architect

We need to invent a word:

?: (n) The goal that you aim for, not because you expect to hit it, but because aiming for it points you in the right direction.

Julian Browne in The Death of Architecture accepts that while an Enterprise Architecture will never be achieved, it can direct our efforts: “Great ideas and plans are what inspires and guides people to focus on making those tactical changes well.”

Having a plan is good. Sticking religiously to the plan is bad. The business is constantly changing, and so the architecture should align with the current status and the latest goals. Architecture is a map, but we operate in Street View. The map is always out of date.

So it is in life. As people we are constantly changing. There are objectives to shoot for, some concrete and achievable, others unrealistic but worth trying. Aim for enlightenment, if you like, but redefine what enlightenment means with each learning. Embrace change, and direct it by choosing where to focus your attention. If our goal is readable code, we’re never going to transform our entire legacy codebase to our current coding standards. Our coding standards will have changed by then. Instead, as we modify pieces of the code, we make these parts the best they can be according to the direction we’ve set.

Aim for the mountaintop, but recalibrate the location of that top. Appreciate each foot of ascent, as the good-software mountain is constantly shifting, and each improvement — each refactor or new test or simplification — counts as progress. An architecture achieved is an architecture out of date – touch the sky, find that it is made of paper, tear through it and aim again for the highest thing you can see.