Library vs service: who controls change?

When you have a common piece of functionality to share between two apps, do you make a library for them to share, or break out a service?

The biggest difference between publishing a library or operating a service is: who controls the pace of change.

If you publish a library for people to use, you can put out new versions, but it is up to each application’s team to incorporate that new version. Upgrades are gradual and may never fully happen.

If you operate a service, you control when upgrades happen. You put the new code in production, and poof, everyone is using it. You can upgrade multiple times per day, and you control when each is complete.

If you need certain logic to be consistent between applications, consider making a service. Control the pace of change.

(For one flipside, see: From Complicated to Complex)

From complicated to complex

Avdi Grimm describes how the book Vehicles illustrates how simple parts can compose a very complex system.

Another example is Conway’s Game of Life: a nice organized grid, uniform time steps, and four tiny rules. These simple pieces combine to make all kinds of wild patterns, and even look alive!

These systems are complex: hard to predict; forces interact to produce surprising behavior.

They are not complicated: they do not have a bunch of different parts, nor a lot of little details.

In software, a monolith is complicated. It has many interconnected parts, each full of details. Conditionals abound, forming intricate rules that may surprise us when we read them.

But we can read them.

As Avdi points out, microservices are different. Break that monolith into tiny pieces, and each piece can be simple. Put those pieces together in a distributed system, and … crap. Distributed systems are inherently complex. Interactions can’t be predicted. Surprising behaviors occur.

Complicated programs can be understood, it’s just a lot to hold in your head. Build a good enough model of the relevant pieces, and you can reason out the consequences pf change.

Complex systems can’t be fully understood. We can notice patterns, but we can’t guarantee what will happen when we change it.

Be careful in your aim for simplicity: don’t trade a little complication for a lot of complexity.

Reuse

Developers have a love-hate relationship with code re-use. As in, we used to love it. We love our code and we want it to run everywhere and help everyone. We want to get faster with time by harnessing the work of our former selves.
And yet, we come to hate it. Reuse means dependencies. It means couplings. It means surprises, when changing code impacts something we did not expect, or else it means don’t touch it, it’s too scary. It means trusting code we don’t understand because it’s code didn’t write.

//platform.twitter.com/widgets.js
Here’s the thing: sharing code is dangerous. Do it sparingly.

When reuse is bad

Let’s talk about sharing code. Take a business, developing software for its employees or its customers. Let’s talk about code within an organization that is referenced in more than one service, or by multiple flows in a monolith. (Monolith is defined as “one deployable unit maintained by more than one small team.”)

Let’s see some pictures. Purple Service here has some classes or functions that it finds useful, and the team thinks these would be useful elsewhere. Purple team breaks this code out into a library, the peachy circle.

purple circle, peach circle inside

Then someone from Purple team joins Blue team, and uses that library in Blue Service. You think it looks like this:

peach circle under blue and purple circles

Nah, it’s really more like this:

purple circle with peach circle inside. Blue circle has a line to peach circle

This is called coupling. When Purple team changes their library, Blue team is affected. (If it’s a monolith, their code changed underneath them. I hope they have good tests.)
Now, you could say, Blue team doesn’t have to update their version. The level of reuse is the release, we broke out the library, so this is fine.

picture of purple with orange circle, blue with peach circle.

At that point you’ve basically forked, the code isn’t shared anymore. When Blue team needs to make their own changes, they first must upgrade, so they get surprised some unpredictable time later. (This happened to us at Outpace all the time with our shared “util” libraries and it was the worst. So painful. Those “timesavers” cost us a lot of time and frustration.)

This shared code is a coupling between two services that otherwise have nothing to do with each other. The whole point of microservices was to decouple! To make it so our changes impact only code that our team operates! dead. and for what?

To answer that, consider the nature of the shared code. Why is it shared?
Perhaps it is unrelated to the business: it is general utilities that would otherwise be duplicated, but we’re being DRY and avoiding the extra work of writing and testing and debugging them a second time. In this case, I propose: cut and paste. Or fork. Or best of all, try a more formalized reuse-without-sharing procedure [link to my next post].

What if this is business-related code? What if we had good reason to DRY it out, because it would be wrong for this code to be different in Purple Service and Blue Service? Well sorry, it’s gonna be different. Purple and Blue do not have the same deployment schedules, that’s the point of decoupling into services. In this case, either you’ve made yourself a distributed monolith (requiring coordinated deployments), or you’re ignoring reality. If the business requires exactly one version of this code, then make it its own service.

picture with yellow, purple, and blue circles separate, dotty lines from yellow to purple and to blue.

Now you’re not sharing code anymore. You’re sharing a service. Changes to Peachy can impact Purple and Blue at the same time, because that’s inherent in this must-be-consistent business logic.

It’s easier with a monolith; that shared code stays consistent in production, because there is only one deployment. Any surprises happen immediately, hopefully in testing. In a monolith, if Peachy is utility classes or functions, and Purple (or Blue) team wants to change them, the safest strategy is: make a copy, use the copy, and change that copy. Over time, this results in less shared code.

This crucial observation is #2 in Modern Software Over-engineering Mistakes by RMX.

“Shared logic and abstractions tend to stabilise over time in natural systems. They either stay flat or relatively go down as functionality gets broader.”

Business software is an expanding problem. It will always grow, and not with more of the same: it will grow in ways you didn’t plan for. This kind of code must optimize for change. Reuse is the enemy of change. (I’m talking about reuse of internal code.)

Back in the beginning, Blue team reused the peach library and saved time writing code. But writing code isn’t the expensive part, compared to changing code. We don’t add features faster as our systems get larger and we have more code hypothetically available for re-use. We add features more slowly, because every change has more impacts and is less safe. Shared code makes change less safe. The only code safe to share is code that doesn’t change. Which means no versioning. Heck, you might as well have cut and pasted it.

When reuse is good

We didn’t advance as an industry by rewriting, or cut and pasting, everything we need over and over. We build on libraries published by developers and companies all over the globe. They release them, we reuse them. Yes, we get into dependency hell, but it beats writing your own web framework. We get reuse not only of the code, but of understanding: Rails knowledge transfers between employers.

There is a tipping point where reuse is magical.

I argue that this point is well past a release, past a separate jar.
It is past a stable API
past a coherent abstraction
past automated tests
past solid documentation…

All these might be achieved within the organization if responsibility for the shared utilities lives in a separate team; you can try to use Conway’s Law to enforce architectural boundaries, but within an org, those boundaries are soft. And this code isn’t your business, and you don’t have incentives to spend the time on these. Why have backwards compatibility when you can perform human coordination instead? It isn’t worth it. In my past organizations, shared code has instead been the responsibility of no one. What starts out as “leverage” becomes baggage, as all the Ruby code is tied to an old version of Sinatra. Some switch to Go to get a clean slate.
Break those chains! Copy the pieces you need out of that internal library and make them yours.

At the level of winning reuse, that code has its own marketing department
its own sales team
its own office manager
its own stock price.

The level of reuse is the company.

(Pay for software.)

When the responsible organization succeeds by making its code stable and backwards-compatible and easy to work with and well-documented and extensively tested, that is code I want to reuse!

In addition to SaaS companies and vendors, there are organizations built around open-source software. This is why we look for packages and frameworks with a broad community around them. Or better, a foundation for keeping shared projects healthy. (Contribute to them.)

Conclusion

Reuse is dangerous because it introduces coupling. Share business code only when that coupling is inherent to the business domain. Share library and utility code only when it is maintained by an organization dedicated to publishing that code. (Same with services. If you can pay for infrastructure-level tools, you’ll get better tools without distracting your organization.)

Why did we want to reuse internal code anyway?
For speed, but speed of change is more important.
For consistency, but that means coupling. Don’t hold your teams back with it.
For propagation of bug fixes, which I’ve not seen happen.

All three of these can be automated [LINK to my next post] without dependencies.

Next time you consider making your code reusable, ask “who will I sell this to?”
Next time someone (including you) suggests you reuse their code, ask “who publishes that?” and if they say “me,” copy it instead.

Tradeoffs in Coordination Among Teams

The other day in Budapest, Jez Humble and I wondered, what is the CAP theorem for teams? In distributed database systems, the CAP theorem says: choose two of Consistency, Availability, and Partitioning — and you must choose Partitioning.
Consider a system for building software together. Unless the software is built by exactly one person, we have to choose Partitioning. We can’t meld minds, and talking is slow.
In databases we choose between Consistency (the data is the same everywhere) and Availability (we can always get the data). As teams grow, we choose between Consensus (doing things for the same reasons in the same way) and Actually-getting-things-done.
Or, letting go of the CAP acronym: we balance Moving Together against Moving Forward.

Moving Together

A group of 1 is the trivial case. Decision-making is the same as consensus. All work is forward work, but output is very limited, and when one person is sick everything stops.
A group of 2-7 is ideal: the communication comes with interplay of ideas, and whole new outputs of dialogue make up for the time cost of talking to each other. It is still feasible for everyone in the group to have a mental model of each other person, to know what that person needs to know. Consensus is easy to reach when every stakeholder is friends with every other stakeholder.
Beyond one team, the tradeoffs begin. Take one team of 2-7 people working closely together. Represent their potential output with this tall, hollow arrow pointing up.
This team is building software to run an antique store. Look at them go, full forward motion. (picture: tall, filled arrow.)
Next we add more to the web site while continuing development on the register point-of-sale tools. We break into two teams. We’re still working with the same database of items, and building the same brand, so we coordinate closely. We leverage each others’ tools. More people means more coordination overhead, but we all like each other, so it’s not much burden. We are a community, after all.
A green arrow and a red arrow, each connected by many lines of communication, are filled about halfway up with work.
Now the store is doing well. The web site attracts more retail business, the neighboring antique stores want to advertise their items on our site, everything is succeeding and we add more people. A team for partnerships, which means we need externally-facing reports, which means we need a data pipeline.
A purple arrow and a blue arrow join the red and green ones. Lines crisscross between them, a snarly web. The arrows are filled only a little because of these coordination costs. The purple arrow is less connected, and a bit more full, but it’s pointed thirty degrees to the left.
The same level of consensus and coordination isn’t practical anymore. Coordination costs weigh heavily. New people coming in don’t get to build a mental model of everyone who already works there. They don’t know what other people know, or which other people need to know something. If the partnerships team touches the database, it might break point of sale or the web site, so they are hamstrung. Everyone needs to check everything, so the slowest-to-release team sets the pace. The purple team here is spending less time on coordination, so the data pipeline is getting built, but without any ties to the green team, it’s going in a direction that won’t work for point of sale.
This mess scales up in the direction of mess. How do we scale forward progress instead?

Moving Forward

The other extreme is decoupling. Boundaries. A very clear API between the data pipeline, point of sale, and web. Separate databases, duplicating data when necessary. This is a different kind of overhead: more technical, less personal. Break the back-end coupling at the database; break the front-end (API) coupling with backwards compatibility. Teams operate on their own schedules, releases are not coordinated. This is represented by wider arrows, because backwards compatibility and graceful degradation are expensive. 
Four arrows, each wide and short. A few lines connect them. They’re filled, but the work went to width (solidness) rather than height (forward progress).
These teams are getting about as far as the communication-burdened teams. The difference is: this does scale out. We can add more teams before coordination becomes a limitation again.
Amazon is an extreme example of this: backwards compatible all the things. Each team Moving Forward in full armor. Everything fully separate, so no team can predict what other teams depend on. This made the AWS products possible. However, this is a ton of technical overhead, and maybe also not the kindest culture to work in.
Google takes another extreme. Their monorepo allows more coupling between teams. Libraries are shared. They make up for this with extreme tooling. Tests, refactoring tools, custom version control and build systems — even whole programming languages. Thousands of engineers work on infrastructure at Google, so that they can Move Together using technical overhead.

Balance

For the rest of us, in companies with 7-1000 engineers, we can’t afford one extreme or the other. We have to ask: where is consensus important? and where is consensus holding us back?
Consensus is crucial in objectives and direction. We are building the same business. The business results we are aiming for had better be the same. We all need to agree on “Which way is up?”
Consensus is crippling at the back end. When we require any coordination of releases. When I can’t upgrade a library without impacting other teams in way I can’t predict. When my database change could break a system more production-critical than mine. This is when we are paralyzed. Don’t make teams share databases or libraries.
What about leveraging shared tools and expertise? if every team runs its own database, those arrows get really wide really fast, unless they skimp on monitoring and redundancy — so they will skimp and the system will be fragile. We don’t want to reinvent everything in every team.
The answer is to have a few wide arrows. Shared tools are great when they’re maintained as internal services, by teams with internal customers. Make the data pipeline serve the partnership and reporting teams. Make a database team to supply well-supported database instances to the other teams. (They’re still separate databases, but now we have shared tools to work with them, and hey look, a data pipeline for syncing between them.)
The green, red, and blue arrows are narrow and tall, and mostly full of work, with some lines connecting them. The purple arrow and a new black arrow are wide and short and full of work. The wide arrows (internal services) are connected to the tall arrows (product teams) through their tips.
Re-use helps only when there is a solid API, when there is no coupling of schedules, and when the providing team focuses on customer service.

Conclusions

Avoid shared code libraries, unless you’re Google and have perfect test coverage everywhere, or you’re Amazon and have whole teams supporting those libraries with backwards compatibility.
Avoid shared database instances, but build internal teams around supporting common database tools.
Encourage shared ideas. Random communication among people across an organization has huge potential. Find out what other teams are doing, and that can refine your own direction and speed your development — as long as everything you hear is information, not obligation.
Reach consensus on what we want to achieve, why we are doing it, and how (at a high level) we might achieve it. Point in the same direction, move independently.
Every organization is a distributed system, even when we sit right next to each other. Coordination makes joint activity possible, but not free. Be conscious of the tradeoffs as your organization grows, as consensus becomes less useful and more expensive. Recognize that new people entering the organization experience higher coordination costs than you do. Let teams move forward in their own way, as long as we move together in a common direction. Distributed systems are hard, and we can do it.


Bonus material 

Here is a picture of Jez in Budapest.

And here is a paper about coordination costs:
Common Ground and Coordination in Joint Activity

Post-agile: microservices and heads-up development

Notes from Craft Conference 2015, Budapest.

Craft conference was all about microservices this year.[1] Yet, it was about so much more at the same time — even when it was talking about microservices.

lobby of the venue. Very cool, and always packed

Dan and I went on about microservices in our opening keynote,[2] about how it’s not about size, it’s about each service being a responsible adult and taking care of its own data and dependencies. And being about one bounded context, so that it has fewer conflicting cross-cutting concerns (security, durability, resilience, availability, etc) to deal with at any one time.

But it was Mary Poppendieck, in her Friday morning keynote,[3] who showed us why microservices aren’t going away, not any more than the internet is going away. This is how systems grow: through federation and wide participation. (I wish “federated system” wasn’t taken by some 1990s architecture; I like it better than “microservices.”) Our job is no longer to control everything all the computers do, to make it perfectly predictable.[a]

Instead, we need to adapt to the sociotechnical system around us and our code. No one person in can understand all the consequences of their decision, according to Michael Nygard.[4] We can’t SMASH our will upon a complex system, Mary says, but we can poke-poke-poke it; see how it responds; and adjust it to our purposes.

What fun is this?? I went into programming because physics became unsatisfying once I hit quantum mechanics, and I couldn’t know everything all at once anymore. Now I’m fascinated by systems; to work with a system is to be part of something bigger than me, bigger than my own mental model. This is going to be a tough transition for many programmers, though. We spent our training time learning to control computers, and now we are exhorted to give up control, to experiment instead.

And worse: as developers must adapt, so must our businesses. In the closing keynote,[5] Marty Cagan made it very clear that our current model is broken. When most ideas come from executives, implemented according to the roadmap, it doesn’t matter how efficient our agile teams are: we’re wasting our time. Most ideas fail to make money. And the ones that do make money usually take far longer than expected. He ridicules the business case: “How much revenue will it generate? How much will it cost?” We don’t know, and we don’t know! Instead of measuring the impact of an idea after months of development, product teams need to measure in hours or days. And instead of a few ideas from upper management, we need to try out many ideas from the most fruitful source: developers. Because we’re most in the position to see what has just become possible.

Exterior of the venue! (after the tent is down.)

I’d say “developers are a great source of innovation,” except Alf Rehn reports that the word has been drained of meaning.[6] Marty Cagan corroborates that by using “ideas” throughout his keynote instead of “innovation.” So where do these ideas come from? Diversity, says Pieter Hintjens,[7] let people try lots of things. Discovery, says Mike Nygard, let them see what other teams are doing.

Ideas come from having our heads up, not buried only in the code. They come from the first objective of software architecture: understanding the business problem. They come from handing teams an objective, NOT a roadmap. Marty Cagan made that point very clear. Adrian Trenaman concurred,[8] describing how Gilt teams went from a single IT to a team per line of business to a team per initiative. It is about results, measured outcomes.

All these measurements, of results, of expectations, of production service activity, come down to my favorite question – “How do we know what we know?”[b]Property-based (aka generative) testing is experiencing a resurgence (maybe its first major surgence) lately, as black-box testing around service-level components. In my solo talk,[10] I proposed a possible design for lowering the risk around interacting components. Mary had some other ideas in her talk too, which I will check out. Considering properties of a service can help us find the seams that align simplicity with options.

Mike Nygard remarked that the most successful microservices implementations he’s seen started as a monolith, where refactoring identified those seams. There’s nothing wrong with a monolith when that supports the business objectives; Randy Shoup said that microservices solve scaling problems, not business problems.[9]Mike and Adrian both pointed out that a target architecture is not a revolution, but an evolving direction. Architecture is like a city: as we build microservices in the new, hip part of town, those legacy tenements are still useful. The architecture is done only when the company goes out of business. Instead of working to a central plan, we want to develop situational awareness (“knowing what’s happening in time to do something about it”[3]), and choose to work on what’s most important right now.

It isn’t enough to be good at coding anymore. The new “full-stack” is from network to customer. Marty: if your developers are only coding, you’re not getting half their value. I want to do heads-up development. “Software Craftsmanship is less about internal efficiency, and more about engaging with the world around us,” says Alf Rehn. “Creators need an immediate connection to what they are creating,” quotes Mary Poppendieck.

As fun as it is to pop the next story off the roadmap and sit down and code it, we can have more impact. We can look up, as developers, as organizations. We can look at results, not requirements. We can learn from consequences, as well as conferences.

This transition won’t be easy. It’s the next step after agile. Microservices are a symptom of this kind of focus, the way good retrospectives are associated with constant improvement. Sure, it’s all about microservices – in that microservices are about reducing friction and lowering risk. The faster we can learn, the farther we can get.


I’ll add the links as Gergely posts the videos.

[1] Maciej was starting to get bored
[2] my keynote with Dan, “Complexity is Outside the Code”
[3] Mary Poppendieck’s keynote, “The New New Software Development Game”
[a] Viktor Klang: “Writing software that is completely deterministic is nonsense because no machine is completely deterministic,” much less the network.
[4] Mike Nygard’s talk, “Architecture Without an End State”
[5] Marty’s keynote
[6] Alf Rehn (ah!  what a beautiful speaker! such rhythm!) keynote. Maybe he didn’t allow recording?
[7] Pieter’s talk
[8] Adrian’s talk, “Scaling Micro-services at Gilt”
[b] OK my real favorite question is “What is your favorite color?” but this is a deep second.
[9] Randy’s talk, “From the Monolith to Microservices”
[10] my talk, “Contracts in Clojure: a compromise between types and tests”

Microservices, Microbusinesses

To avoid duplication of effort, we can build software uniformly. Everything in one monolithic system or (if that gets unwieldy) in services that follow the same conventions, in the same language, developed with uniform processes, released at the same time, connecting to the same database.

That may avoid duplication, but it doesn’t make great software. Excellence in architecture these days involves specialization — not by task, but by purpose. Microservices divide a software system into fully independent pieces. Instead of limiting which pieces talk to which, they specify clear APIs and let both the interactions and the innards vary as needed.

A good agile team is like a microservice: centered around a common purpose. In each retro, the team looks for ways to optimize their workings, ways particular to that team’s members, context, objectives.

When a service has a single purpose[1], it can focus on the problems important to that purpose. Authentication? availability is important, consistency, and security. Search? speed is crucial, repeatability is not. Each microservice can use the database suited to its priorities, and change it out when growth exceeds capacity. The business logic at the core of each service is optimized for efficient and clear execution.

Independence has a cost. Each service is a citizen in the ecosystem. That means accepting requests that come in, with backwards compatibility. It means sending requests and output in the format other services require, not overloading services it depends on, and handling failure of any downstream service. Basically, everybody is a responsible adult.

That’s a lot of overhead and glue code. Every service has to do translation from input to its internal format, and then to whatever output format someone else requires. Error handling, caching or throttling, failover and load balancing and monitoring, contract testing, maintaining multiple interface versions, database interaction details. Most of the code is glue, layers of glue protecting a small core of business logic. These strong boundaries allow healthy relationships with other services, including new interactions that weren’t designed into the architecture from the beginning. For all this work, we get freedom on the inside. We get the opportunity to exceed expectations, rather than straining standardized languages and tools to meet requirements.

Do the teams in your company have the opportunity to optimize in the same way?

I’ve worked multiple places where management decreed that all teams would track work in JIRA, with umpteen required fields, because they like how progress tracking rolls up. They can see high-level numbers or drill down into each team’s work. This is great for legibility.[3] All the work fits into nice little boxes that add together. However, what suits the organizational hierarchy might not suit the work of an individual team.

Managers love to have a standard programming language, standard testing tools with reports that roll up, standard practices. This gives them the feeling that they can move people from team to team with less impact to performance. If that feeling is accurate, it’s only because the level of performance is constrained everywhere.

Like software components, teams have their own problems to solve. Communication between individuals matters most, so optimize for that[2]. Given the freedom to vary practices and tools, a healthy agile team gets better and better. The outcome of a given day’s work is not only a task completed: it includes ideas about how to do this better next time.

Tom Marsh says, “Make everyone in your organisation believe that they are working in a business unit about 10 people big.” A team can learn from decisions and make better ones and spiral upward into exceptional performance (plus innovation), when internal consensus is enough to implement a change. Like a microbusiness.

Still, a team exists as a citizen inside a larger organization. There are interfaces to fulfill. Management really does need to know about progress. Outward collaboration is essential. We can do this the same way our code does: with glue. Glue made of people. One team member, taking the responsibility of translating cards on the wall into JIRA, can free the team to optimize communication while filling management’s strongest needs.

Management defines an API. Encapsulate the inner workings of the team, and expose an interface that makes sense outside. By all means, provide a reference implementation: “Other teams use Target Process to track work.” Have default recommendations: “We use Clojure here unless there’s some better solution. SQL Server is our first choice database for these reasons.” Give teams a strong process and technology stack to start from, and let them innovate from there.

On a healthy team, people are accomplishing something together, and that’s motivating. When we feel agency to share and try out ideas, when the external organization is only encouraging, then a team can gel, excel and innovate. This cohesive team culture (plus pairing) brings new members up to speed faster than any familiarity with tooling.

As in microservices, it takes work to fulfill external obligations and allow internal optimization. This duplication is not waste. Sometimes a little overhead unleashes all the potential.


[1] The “micro” in microservices is a misnomer: the only size restriction on a microservice is Conway’s Law: each is maintained by a single team, 10 or fewer people. A team may maintain one or more system components, and one team takes full responsibility for the functioning of the piece of software.

[2] Teams work best when each member connects with each other member (NYTimes)

[3] Seeing Like a State points out how legibility benefits the people in charge, usually at the detriment of the people on the ground. Sacrificing local utility for upward legibility is … well, it’s efficiency over innovation.