Monday, January 16, 2017

Dependencies

Dependency management.
Nobody wants to think about it. We just want this stuff to work.
It is one of the nasty sneaky unsolved problems in software.

Each language system says, we've got a package manager and a build tool. This is how dependencies work. Some are better than others (I <3 Elm; npm OMG) but none of them are complete. We avert our eyes.

Dependencies are important. They're the edges in the software graph, and edges are always where the meaning lies. Edges are also harder to focus on than nodes.

They can be relatively explicit, declared in a pom.xml or package.json.
They can be hard to discover, like HTTP calls to URLs constructed from configuration + code + input.

Dependencies describe how we hook things together. Which means, they also determine our options for breaking things apart. And breaking things apart is how we scale a software system -- scale in our heads, that is; scale in terms of complexity, not volume.

If we look carefully at them, stop wishing they would stop being such a bother, maybe we can get this closer to right. There's a lot more to this topic than is covered in this post, but it's a start.

Libraries v Services


The biggest distinction. Definitions:

Libraries are compiled in. They're separate modules, in different repositories (or directories in a giant repository of doom (aka monorepo)). They are probably maintained by different companies or at least teams. Code re-use is achieved by compiling the same code into multiple applications (aka services). [I'm talking about compile-scope libraries here. Provided-scope, and things like .dll's (what is that even called) are another thing that should probably be a separate category in this post but isn't included.]

Services: one application calls another over the network (or sockets on the same machine); the code runs in different processes. There's some rigmarole in how to find each other: service discovery is totally a problem of its own, with DNS as the most common solution.

Tradeoffs


Visibility


Libraries are declared explicitly, although not always specifically. Something physically brings their code into my code, whether as a jar or as code explicitly.

Service dependencies are declared informally if at all. They may be discovered in logging. They may be discernible from security groups, if you're picky about which applications are allowed to access which other ones.

Deployment


Here's a crucial difference IMO. Libraries: you can release it and ask people to upgrade. If your library is internal, you may even upgrade the version in other teams' code. But it's the users of your library that decide when that new version goes into production. Your new code is upgraded when your users choose to deploy the upgraded code.

Services: You choose when it's upgraded. You deploy that new code, you turn off the old code, and that's it. Everyone who uses your service is using the new code. Bam. You have the power.
This also means you can choose how many of them are running at a time. This is the independent-scalability thing people get excited about.

If your library/service has data backing it, controlling code deployment means a lot for the format of the data. If your database is accessed only by your service, then you can any necessary translations into the code. If your database is accessed by a library that other people incorporate, you'd better keep that schema compatible.

Versioning


There's a lovely Rich Hickey talk, my notes here, about versioning libraries. Much of it also applies to services.

If you change the interface to a library, what you have is a different library. If you name it the same and call it a new version, then what you have is a different library that refuses to compile with the other one and will fight over what gets in. Then you get into the whole question of version conflicts, which different language systems resolve in different ways. Version conflicts occur when the application declares dependencies on two libraries, each of which declares a dependency on the same-name fourth library. In JavaScript, whatever, we can compile in both of them, it's just code copied in anyway. In Java, thou mayest have only one definition of each class name in a given ClassLoader, so the tools choose the newest version and hope everyone can cope.

Services, you can get complicated and do some sort of routing by version; you can run multiple versions of a service in production at the same time. See? You call it two versions of the same service, but it's actually two different services. Same as the libraries. Or, you can support multiple versions of the API within the same code. Backwards compatibility, it's all the pain for you and all the actual-working-software for your users.

API Changes and Backwards Compatibility


So you want to change the way users interact with your code. There's an important distinction here: changing your code (refactoring, bug fix, complete rewrite) is very different from requiring customers to change their code in order to change yours correctly. That's a serious impact.

Services: who uses it? Maybe it's an internal service and you have some hope of grepping all company code for your URL. You have the option of personally coordinating with those teams to change the usage of your service.
Or it's a public-facing service. DON'T CHANGE IT. You can never know who is using it. I mean maybe you don't care about your users, and you're OK with breaking their code. Sad day. Otherwise, you need permanent backwards-compatibility forever, and yes, your code will be ugly.

Libraries: if your package manager is respectable (meaning: immutable, if it ever provides a certain library-version is will continue to provide the same download forever), then your old versions are still around, they can stay in production. You can't take that code away. However, you can break any users who aren't ultra-specific about their version numbers. That's where semantic versioning comes in; it's rude to change the API in anything short of a major version, and people are supposed to be careful about picking up a new major version of your library.
But if you're nice you could name it something different, instead of pretending it's a different number of the same thing.

Isolation


A trick about libraries: it's way harder to know "what is an API change?"
With services it's clear; we recognize certain requests, and provide certain responses.
With libraries, there's all the public methods and public classes and packages and ... well, as a Java/Scala coder, I've never been especially careful about what I expose publicly. But library authors need to be if they're ever going to safely change anything inside the library.

Services are isolated: you can't depend on my internals because you physically can't access them. In order to expose anything to external use I have to make an explicit decision. This is much stronger modularity. It also means you can write them in different languages. That's a bonus.

There are a few companies that sell libraries. Those are some serious professionals, there. They have to test versions from way-back, on every OS they could run on. They have to be super aware of what is exposed, and test the new versions against a lot of scenarios. Services are a lot more practical to throw out there - even though backwards compatibility is a huge pain, at least you know where it is.

Failure


Libraries: it fails, your code fails. It runs out of memory, goodbye process. Failures are communicated synchronously, and if it fails, the app knows it.

Services: it fails, or it doesn't respond, you don't really know that it fails ... ouch. Partial failures, indeterminate failures, are way harder. Even on the same machine coordinating over a socket, we can't guarantee the response time or whether responses are delivered at all. This is all ouch, a major cost of using this modularization mechanism.

Conclusions


I think the biggest consideration in choosing whether to use libraries or services for distribution of effort / modularization is that choice of who decides when it deploys. Who controls which code is in production at a given time.

Libraries are more efficient and easier to handle failures. That's a thing. In-process communication is faster and failures are much easier to handle and consistency is possible.

Services are actual decoupling. They let a team be responsible for their own software, writing it and operating it. They let a team choose what is in production at a given time -- which means there's hope of ever changing data sources or schemas. Generally, I think the inertia present in data, data which has a lot of value, is underemphasized in discussion of software architecture. If you have a solid service interface guarding access to your data, you can (with a lot of painful work) move it into a different database or format. Without that, data migrations may be impossible.

Decoupling of time-of-deployment is essential for maintaining forward momentum as an organization grows from one team to many. Decoupling of features and of language systems, versions, tools helps too. To require everyone use the same tools (or heaven forbid, repository) is to couple every team to another in ways that are avoidable. I want my teams and applications coupled (integrated) in ways that streamline the customer's experience. I don't need them coupled in ways that streamline the development manager's experience.

Overall: libraries are faster until coordination is the bottleneck. Services add more openings to your bottle. That can make your bottle harder to understand.

There's a lot more to the problems of dependency management. This is one crucial distinction. All choices are valid, when made consciously in context. Try to focus through your tears.


4 comments:

  1. And then there are libbits of course! https://github.com/jessitron/microlib

    ReplyDelete
  2. services also have the advantage that you can track usage. Who uses it? Is it being used? check your logs. With an API you have no idea if feature is still being used.

    This also allows you to make informed backward compatibility breaks: only 10 users are using this, ok kill it. 100,000 users are, let's keep it around.

    ReplyDelete
    Replies
    1. Good point! A public service, you don't know -who- is using it, but you can know how many. Libraries, you don't know about use, only possibly downloads.

      Delete
    2. You can do instrumentation on your libs and log the usage if that's something you need.

      Delete