Thursday, February 23, 2017

Reuse

Developers have a love-hate relationship with code re-use. As in, we used to love it. We love our code and we want it to run everywhere and help everyone. We want to get faster with time by harnessing the work of our former selves.
And yet, we come to hate it. Reuse means dependencies. It means couplings. It means surprises, when changing code impacts something we did not expect, or else it means don't touch it, it's too scary. It means trusting code we don't understand because it's code didn't write.

Here's the thing: sharing code is dangerous. Do it sparingly.

When reuse is bad


Let's talk about sharing code. Take a business, developing software for its employees or its customers. Let's talk about code within an organization that is referenced in more than one service, or by multiple flows in a monolith. (Monolith is defined as "one deployable unit maintained by more than one small team.")

Let's see some pictures. Purple Service here has some classes or functions that it finds useful, and the team thinks these would be useful elsewhere. Purple team breaks this code out into a library, the peachy circle.

purple circle, peach circle inside

Then someone from Purple team joins Blue team, and uses that library in Blue Service. You think it looks like this:
peach circle under blue and purple circles


Nah, it's really more like this:
purple circle with peach circle inside. Blue circle has a line to peach circle


This is called coupling. When Purple team changes their library, Blue team is affected. (If it's a monolith, their code changed underneath them. I hope they have good tests.)
Now, you could say, Blue team doesn't have to update their version. The level of reuse is the release, we broke out the library, so this is fine.
picture of purple with orange circle, blue with peach circle.

At that point you've basically forked, the code isn't shared anymore. When Blue team needs to make their own changes, they first must upgrade, so they get surprised some unpredictable time later. (This happened to us at Outpace all the time with our shared "util" libraries and it was the worst. So painful. Those "timesavers" cost us a lot of time and frustration.)

This shared code is a coupling between two services that otherwise have nothing to do with each other. The whole point of microservices was to decouple! To make it so our changes impact only code that our team operates! dead. and for what?

To answer that, consider the nature of the shared code. Why is it shared?
Perhaps it is unrelated to the business: it is general utilities that would otherwise be duplicated, but we're being DRY and avoiding the extra work of writing and testing and debugging them a second time. In this case, I propose: cut and paste. Or fork. Or best of all, try a more formalized reuse-without-sharing procedure [link to my next post].

What if this is business-related code? What if we had good reason to DRY it out, because it would be wrong for this code to be different in Purple Service and Blue Service? Well sorry, it's gonna be different. Purple and Blue do not have the same deployment schedules, that's the point of decoupling into services. In this case, either you've made yourself a distributed monolith (requiring coordinated deployments), or you're ignoring reality. If the business requires exactly one version of this code, then make it its own service.
picture with yellow, purple, and blue circles separate, dotty lines from yellow to purple and to blue.


Now you're not sharing code anymore. You're sharing a service. Changes to Peachy can impact Purple and Blue at the same time, because that's inherent in this must-be-consistent business logic.

It's easier with a monolith; that shared code stays consistent in production, because there is only one deployment. Any surprises happen immediately, hopefully in testing. In a monolith, if Peachy is utility classes or functions, and Purple (or Blue) team wants to change them, the safest strategy is: make a copy, use the copy, and change that copy. Over time, this results in less shared code.

This crucial observation is #2 in Modern Software Over-engineering Mistakes by RMX.
"Shared logic and abstractions tend to stabilise over time in natural systems. They either stay flat or relatively go down as functionality gets broader."
Business software is an expanding problem. It will always grow, and not with more of the same: it will grow in ways you didn't plan for. This kind of code must optimize for change. Reuse is the enemy of change. (I'm talking about reuse of internal code.)

Back in the beginning, Blue team reused the peach library and saved time writing code. But writing code isn't the expensive part, compared to changing code. We don't add features faster as our systems get larger and we have more code hypothetically available for re-use. We add features more slowly, because every change has more impacts and is less safe. Shared code makes change less safe. The only code safe to share is code that doesn't change. Which means no versioning. Heck, you might as well have cut and pasted it.

When reuse is good


We didn't advance as an industry by rewriting, or cut and pasting, everything we need over and over. We build on libraries published by developers and companies all over the globe. They release them, we reuse them. Yes, we get into dependency hell, but it beats writing your own web framework. We get reuse not only of the code, but of understanding: Rails knowledge transfers between employers.

There is a tipping point where reuse is magical.

I argue that this point is well past a release, past a separate jar.
It is past a stable API
past a coherent abstraction
past automated tests
past solid documentation...

All these might be achieved within the organization if responsibility for the shared utilities lives in a separate team; you can try to use Conway's Law to enforce architectural boundaries, but within an org, those boundaries are soft. And this code isn't your business, and you don't have incentives to spend the time on these. Why have backwards compatibility when you can perform human coordination instead? It isn't worth it. In my past organizations, shared code has instead been the responsibility of no one. What starts out as "leverage" becomes baggage, as all the Ruby code is tied to an old version of Sinatra. Some switch to Go to get a clean slate.
Break those chains! Copy the pieces you need out of that internal library and make them yours.

At the level of winning reuse, that code has its own marketing department
its own sales team
its own office manager
its own stock price.

The level of reuse is the company.

(Pay for software.)

When the responsible organization succeeds by making its code stable and backwards-compatible and easy to work with and well-documented and extensively tested, that is code I want to reuse!

In addition to SaaS companies and vendors, there are organizations built around open-source software. This is why we look for packages and frameworks with a broad community around them. Or better, a foundation for keeping shared projects healthy. (Contribute to them.)

Conclusion


Reuse is dangerous because it introduces coupling. Share business code only when that coupling is inherent to the business domain. Share library and utility code only when it is maintained by an organization dedicated to publishing that code. (Same with services. If you can pay for infrastructure-level tools, you'll get better tools without distracting your organization.)

Why did we want to reuse internal code anyway?
For speed, but speed of change is more important.
For consistency, but that means coupling. Don't hold your teams back with it.
For propagation of bug fixes, which I've not seen happen.

All three of these can be automated [LINK to my next post] without dependencies.

Next time you consider making your code reusable, ask "who will I sell this to?"
Next time someone (including you) suggests you reuse their code, ask "who publishes that?" and if they say "me," copy it instead.

3 comments:

  1. One of the biggest reasons to reuse code is to help prevent new bugs from being introduced, and allowing code to be debugged once, not 10 times.

    The number of bugs per line of code is constant, so copying code w/o the ability to integrate bug fixes will only compound the bug issue.

    ReplyDelete
    Replies
    1. What stops you from copying the bug fixes too?

      If you're in a monolith, you already get the bug fix for free.

      If you're using a library, you only get the bug fix if you notice the new release and upgrade to it. The implies watching the library for changes. But if you're watching it for changes, you can just copy the fix when it appears.

      Delete
    2. There can be also a tendency to allow reused code to handle the different needs of the different places where it will be used, so you end up pulling in a library where you only need about half the paths through the code. This can lead to more code and higher complexity within that module; so building off of the constant number of bugs per line idea this means that the likelihood of a bug within that module could increase because of reuse. I tend to find that the number of bugs is generally more proportional to [cyclomatic] complexity rather than the more simplified LOC perspective, which would mean a more dramatic increase in bug likelihood. In these cases you're spreading the likelihood of bugs around including those that may affect an app even though it doesn't care about the logic with which the bugs are associated. Now you may have to update the module to get the bugfix when you wouldn't have had to otherwise...but of course that shouldn't be a problem since I'm sure libraries are being regularly kept up to date and also that they're all being worked on in a way that bugfixes can be safely pulled in without worrying about any other code changes (simple).

      Everything in moderation

      Delete