Do Things Right with npm install

Lately I’ve been wrestling with npm. Here are some rules I’ve learned:

Use `npm ci` rather than `npm install`

`npm ci` will bring down exactly the dependencies specified in package-lock.json. `npm install` does more than that; it also tries to update some libraries to a more recent version. Sometimes it updates URLs or nonsense in package.json so that it my `git status` is dirty. Sometimes it does deduping. Sometimes it sticks the version you have lying around. I haven’t figured it out. It seems to be pretty dependent on the current circumstances on my file system.

Now I only use `npm install` if I specifically want to change the dependencies in my filesystem.

Use `npm install –save-exact`

Especially for snapshots. Semver does not work for snapshots or branches or anything but releases. And npm only works with semver. If you are not using a release; if you publish with build tags or branch tags or anything like that; do not give npm any sort of flexibility. It will not work. Specify a precise version or else it will give you nasty surprises, like deciding some alphabetically-later branch is better than the master-branch version you specified.

Use `npm view` to check the status of a library

This is quite useful. Try `npm view ` and it brings to your command line the info you can get from the npm website. You can ask it for specific fields. To get the latest version of chalk:

$ npm view chalk dist-tags.latest
2.4.1

If you want to do anything programmatic with this info, the “do things right” flag for `npm view` is `–json`.

Try `npm ls` but then dig around on the filesystem

Exploring the dependency tree, `npm ls` is really cool; it shows it to you. You can see where you’re getting a specific library with `npm ls ` except that it doesn’t always work. In the end, I dig around in my node_modules directory, using `find -name .` to look for the real thing.

Other times I use my little node-dependency dungeon explorer game to see what version of stuff is where. 

These are just a few of the nasty surprises I’ve found moving from Java to TypeScript, from maven dependencies to npm. Dependency management is an unsolved problem, and the people working on npm have made huge improvements in the last few years. I look forward to more.

Reuse

Developers have a love-hate relationship with code re-use. As in, we used to love it. We love our code and we want it to run everywhere and help everyone. We want to get faster with time by harnessing the work of our former selves.
And yet, we come to hate it. Reuse means dependencies. It means couplings. It means surprises, when changing code impacts something we did not expect, or else it means don’t touch it, it’s too scary. It means trusting code we don’t understand because it’s code didn’t write.

//platform.twitter.com/widgets.js
Here’s the thing: sharing code is dangerous. Do it sparingly.

When reuse is bad

Let’s talk about sharing code. Take a business, developing software for its employees or its customers. Let’s talk about code within an organization that is referenced in more than one service, or by multiple flows in a monolith. (Monolith is defined as “one deployable unit maintained by more than one small team.”)

Let’s see some pictures. Purple Service here has some classes or functions that it finds useful, and the team thinks these would be useful elsewhere. Purple team breaks this code out into a library, the peachy circle.

purple circle, peach circle inside

Then someone from Purple team joins Blue team, and uses that library in Blue Service. You think it looks like this:

peach circle under blue and purple circles

Nah, it’s really more like this:

purple circle with peach circle inside. Blue circle has a line to peach circle

This is called coupling. When Purple team changes their library, Blue team is affected. (If it’s a monolith, their code changed underneath them. I hope they have good tests.)
Now, you could say, Blue team doesn’t have to update their version. The level of reuse is the release, we broke out the library, so this is fine.

picture of purple with orange circle, blue with peach circle.

At that point you’ve basically forked, the code isn’t shared anymore. When Blue team needs to make their own changes, they first must upgrade, so they get surprised some unpredictable time later. (This happened to us at Outpace all the time with our shared “util” libraries and it was the worst. So painful. Those “timesavers” cost us a lot of time and frustration.)

This shared code is a coupling between two services that otherwise have nothing to do with each other. The whole point of microservices was to decouple! To make it so our changes impact only code that our team operates! dead. and for what?

To answer that, consider the nature of the shared code. Why is it shared?
Perhaps it is unrelated to the business: it is general utilities that would otherwise be duplicated, but we’re being DRY and avoiding the extra work of writing and testing and debugging them a second time. In this case, I propose: cut and paste. Or fork. Or best of all, try a more formalized reuse-without-sharing procedure [link to my next post].

What if this is business-related code? What if we had good reason to DRY it out, because it would be wrong for this code to be different in Purple Service and Blue Service? Well sorry, it’s gonna be different. Purple and Blue do not have the same deployment schedules, that’s the point of decoupling into services. In this case, either you’ve made yourself a distributed monolith (requiring coordinated deployments), or you’re ignoring reality. If the business requires exactly one version of this code, then make it its own service.

picture with yellow, purple, and blue circles separate, dotty lines from yellow to purple and to blue.

Now you’re not sharing code anymore. You’re sharing a service. Changes to Peachy can impact Purple and Blue at the same time, because that’s inherent in this must-be-consistent business logic.

It’s easier with a monolith; that shared code stays consistent in production, because there is only one deployment. Any surprises happen immediately, hopefully in testing. In a monolith, if Peachy is utility classes or functions, and Purple (or Blue) team wants to change them, the safest strategy is: make a copy, use the copy, and change that copy. Over time, this results in less shared code.

This crucial observation is #2 in Modern Software Over-engineering Mistakes by RMX.

“Shared logic and abstractions tend to stabilise over time in natural systems. They either stay flat or relatively go down as functionality gets broader.”

Business software is an expanding problem. It will always grow, and not with more of the same: it will grow in ways you didn’t plan for. This kind of code must optimize for change. Reuse is the enemy of change. (I’m talking about reuse of internal code.)

Back in the beginning, Blue team reused the peach library and saved time writing code. But writing code isn’t the expensive part, compared to changing code. We don’t add features faster as our systems get larger and we have more code hypothetically available for re-use. We add features more slowly, because every change has more impacts and is less safe. Shared code makes change less safe. The only code safe to share is code that doesn’t change. Which means no versioning. Heck, you might as well have cut and pasted it.

When reuse is good

We didn’t advance as an industry by rewriting, or cut and pasting, everything we need over and over. We build on libraries published by developers and companies all over the globe. They release them, we reuse them. Yes, we get into dependency hell, but it beats writing your own web framework. We get reuse not only of the code, but of understanding: Rails knowledge transfers between employers.

There is a tipping point where reuse is magical.

I argue that this point is well past a release, past a separate jar.
It is past a stable API
past a coherent abstraction
past automated tests
past solid documentation…

All these might be achieved within the organization if responsibility for the shared utilities lives in a separate team; you can try to use Conway’s Law to enforce architectural boundaries, but within an org, those boundaries are soft. And this code isn’t your business, and you don’t have incentives to spend the time on these. Why have backwards compatibility when you can perform human coordination instead? It isn’t worth it. In my past organizations, shared code has instead been the responsibility of no one. What starts out as “leverage” becomes baggage, as all the Ruby code is tied to an old version of Sinatra. Some switch to Go to get a clean slate.
Break those chains! Copy the pieces you need out of that internal library and make them yours.

At the level of winning reuse, that code has its own marketing department
its own sales team
its own office manager
its own stock price.

The level of reuse is the company.

(Pay for software.)

When the responsible organization succeeds by making its code stable and backwards-compatible and easy to work with and well-documented and extensively tested, that is code I want to reuse!

In addition to SaaS companies and vendors, there are organizations built around open-source software. This is why we look for packages and frameworks with a broad community around them. Or better, a foundation for keeping shared projects healthy. (Contribute to them.)

Conclusion

Reuse is dangerous because it introduces coupling. Share business code only when that coupling is inherent to the business domain. Share library and utility code only when it is maintained by an organization dedicated to publishing that code. (Same with services. If you can pay for infrastructure-level tools, you’ll get better tools without distracting your organization.)

Why did we want to reuse internal code anyway?
For speed, but speed of change is more important.
For consistency, but that means coupling. Don’t hold your teams back with it.
For propagation of bug fixes, which I’ve not seen happen.

All three of these can be automated [LINK to my next post] without dependencies.

Next time you consider making your code reusable, ask “who will I sell this to?”
Next time someone (including you) suggests you reuse their code, ask “who publishes that?” and if they say “me,” copy it instead.

Dependencies

Dependency management.
Nobody wants to think about it. We just want this stuff to work.
It is one of the nasty sneaky unsolved problems in software.

Each language system says, we’ve got a package manager and a build tool. This is how dependencies work. Some are better than others (I ❤ Elm; npm OMG) but none of them are complete. We avert our eyes.

Dependencies are important. They’re the edges in the software graph, and edges are always where the meaning lies. Edges are also harder to focus on than nodes.

They can be relatively explicit, declared in a pom.xml or package.json.
They can be hard to discover, like HTTP calls to URLs constructed from configuration + code + input.

Dependencies describe how we hook things together. Which means, they also determine our options for breaking things apart. And breaking things apart is how we scale a software system — scale in our heads, that is; scale in terms of complexity, not volume.

If we look carefully at them, stop wishing they would stop being such a bother, maybe we can get this closer to right. There’s a lot more to this topic than is covered in this post, but it’s a start.

Libraries v Services

The biggest distinction. Definitions:

Libraries are compiled in. They’re separate modules, in different repositories (or directories in a giant repository of doom (aka monorepo)). They are probably maintained by different companies or at least teams. Code re-use is achieved by compiling the same code into multiple applications (aka services). [I’m talking about compile-scope libraries here. Provided-scope, and things like .dll’s (what is that even called) are another thing that should probably be a separate category in this post but isn’t included.]

Services: one application calls another over the network (or sockets on the same machine); the code runs in different processes. There’s some rigmarole in how to find each other: service discovery is totally a problem of its own, with DNS as the most common solution.

Tradeoffs

Visibility

Libraries are declared explicitly, although not always specifically. Something physically brings their code into my code, whether as a jar or as code explicitly.

Service dependencies are declared informally if at all. They may be discovered in logging. They may be discernible from security groups, if you’re picky about which applications are allowed to access which other ones.

Deployment

Here’s a crucial difference IMO. Libraries: you can release it and ask people to upgrade. If your library is internal, you may even upgrade the version in other teams’ code. But it’s the users of your library that decide when that new version goes into production. Your new code is upgraded when your users choose to deploy the upgraded code.

Services: You choose when it’s upgraded. You deploy that new code, you turn off the old code, and that’s it. Everyone who uses your service is using the new code. Bam. You have the power.
This also means you can choose how many of them are running at a time. This is the independent-scalability thing people get excited about.

If your library/service has data backing it, controlling code deployment means a lot for the format of the data. If your database is accessed only by your service, then you can any necessary translations into the code. If your database is accessed by a library that other people incorporate, you’d better keep that schema compatible.

Versioning

There’s a lovely Rich Hickey talk, my notes here, about versioning libraries. Much of it also applies to services.

If you change the interface to a library, what you have is a different library. If you name it the same and call it a new version, then what you have is a different library that refuses to compile with the other one and will fight over what gets in. Then you get into the whole question of version conflicts, which different language systems resolve in different ways. Version conflicts occur when the application declares dependencies on two libraries, each of which declares a dependency on the same-name fourth library. In JavaScript, whatever, we can compile in both of them, it’s just code copied in anyway. In Java, thou mayest have only one definition of each class name in a given ClassLoader, so the tools choose the newest version and hope everyone can cope.

Services, you can get complicated and do some sort of routing by version; you can run multiple versions of a service in production at the same time. See? You call it two versions of the same service, but it’s actually two different services. Same as the libraries. Or, you can support multiple versions of the API within the same code. Backwards compatibility, it’s all the pain for you and all the actual-working-software for your users.

API Changes and Backwards Compatibility

So you want to change the way users interact with your code. There’s an important distinction here: changing your code (refactoring, bug fix, complete rewrite) is very different from requiring customers to change their code in order to change yours correctly. That’s a serious impact.

Services: who uses it? Maybe it’s an internal service and you have some hope of grepping all company code for your URL. You have the option of personally coordinating with those teams to change the usage of your service.
Or it’s a public-facing service. DON’T CHANGE IT. You can never know who is using it. I mean maybe you don’t care about your users, and you’re OK with breaking their code. Sad day. Otherwise, you need permanent backwards-compatibility forever, and yes, your code will be ugly.

Libraries: if your package manager is respectable (meaning: immutable, if it ever provides a certain library-version is will continue to provide the same download forever), then your old versions are still around, they can stay in production. You can’t take that code away. However, you can break any users who aren’t ultra-specific about their version numbers. That’s where semantic versioning comes in; it’s rude to change the API in anything short of a major version, and people are supposed to be careful about picking up a new major version of your library.
But if you’re nice you could name it something different, instead of pretending it’s a different number of the same thing.

Isolation


A trick about libraries: it’s way harder to know “what is an API change?”
With services it’s clear; we recognize certain requests, and provide certain responses.
With libraries, there’s all the public methods and public classes and packages and … well, as a Java/Scala coder, I’ve never been especially careful about what I expose publicly. But library authors need to be if they’re ever going to safely change anything inside the library.

Services are isolated: you can’t depend on my internals because you physically can’t access them. In order to expose anything to external use I have to make an explicit decision. This is much stronger modularity. It also means you can write them in different languages. That’s a bonus.

There are a few companies that sell libraries. Those are some serious professionals, there. They have to test versions from way-back, on every OS they could run on. They have to be super aware of what is exposed, and test the new versions against a lot of scenarios. Services are a lot more practical to throw out there – even though backwards compatibility is a huge pain, at least you know where it is.

Failure

Libraries: it fails, your code fails. It runs out of memory, goodbye process. Failures are communicated synchronously, and if it fails, the app knows it.

Services: it fails, or it doesn’t respond, you don’t really know that it fails … ouch. Partial failures, indeterminate failures, are way harder. Even on the same machine coordinating over a socket, we can’t guarantee the response time or whether responses are delivered at all. This is all ouch, a major cost of using this modularization mechanism.

Conclusions

I think the biggest consideration in choosing whether to use libraries or services for distribution of effort / modularization is that choice of who decides when it deploys. Who controls which code is in production at a given time.

Libraries are more efficient and easier to handle failures. That’s a thing. In-process communication is faster and failures are much easier to handle and consistency is possible.

Services are actual decoupling. They let a team be responsible for their own software, writing it and operating it. They let a team choose what is in production at a given time — which means there’s hope of ever changing data sources or schemas. Generally, I think the inertia present in data, data which has a lot of value, is underemphasized in discussion of software architecture. If you have a solid service interface guarding access to your data, you can (with a lot of painful work) move it into a different database or format. Without that, data migrations may be impossible.

Decoupling of time-of-deployment is essential for maintaining forward momentum as an organization grows from one team to many. Decoupling of features and of language systems, versions, tools helps too. To require everyone use the same tools (or heaven forbid, repository) is to couple every team to another in ways that are avoidable. I want my teams and applications coupled (integrated) in ways that streamline the customer’s experience. I don’t need them coupled in ways that streamline the development manager’s experience.

Overall: libraries are faster until coordination is the bottleneck. Services add more openings to your bottle. That can make your bottle harder to understand.

There’s a lot more to the problems of dependency management. This is one crucial distinction. All choices are valid, when made consciously in context. Try to focus through your tears.

It’s Atomist Time!

I’m hella excited to get to work on Atomist full-time starting now (January 2017). Why? What do they do? Oh let me tell you!
I love developing software, not least because we (as an industry) have not yet figured out how to software. We know it’s powerful, but not yet how powerful. Software is like engineering except that the constraints aren’t in what we can build, but what we can design and specify. Atomist is expanding this capacity.
Atomist builds tooling to smooth some bumps in software development. There are three components that I’m excited about, three components that open new options in how we develop software.

Component 1: Code that changes code

First, there are code editors, called Rugs. On the surface, these automate the typing part. Like code generators, except they continue to work with the code after you modify it. Like refactorings in an IDE, except they appear as a pull request, and then you can continue development on that branch. If you have some consistent code structure (and if you use a framework, you do), Rugs can perform common feature-adding or upgrading or refactoring operations. Use standard Rugs to, say, add graph database support to an existing Spring Boot project. Customize Rugs to set up your Travis build uniformly in your projects. Create your own Rugs to implement metrics integration according to your company’s standards — and to upgrade existing code when those standards change.
On the surface this is an incremental improvement over existing code generation and IDE refactoring tools. Yet, I see it as something more. I see it as a whole new answer to the question of “indirection or repetition?” in code. Take for instance: adding a field to a Rails app makes us change the controller, the model, and four other places. Or creating a new service means changing deployment configuration, provisioning, and service discovery. Whenever a single conceptual change requires code changes in multiple spots, we complain about the work and we make mistakes. Then we start to get clever with it: we introduce some form of indirection that localizes that change to one place. Configuration files get generated in the build, Ruby metaprogramming introduces syntax that I can’t even figure out how it’s executable — magic happens. The code gets less explicit, so that we can enforce consistency and make changing it … well, I’m not gonna say “easier” because learning to cast the spell is tricky, but it is less typing.
Atomist introduces a third alternative: express that single intention (“create a new service” or “add this field”) as a Rug editor. This makes writing it one step, and then the editor makes all those code changes in a single commit in a branch. From there, customize your field or your new service; each commit that you make shows how your feature is special. The code remains explicit, without additional magic. When I come back and read it, I have some hope of understanding what it’s doing. When I realize that I forgot something (“oops! I also need to add that service to the list of log sources”) then I fix it once, in the NewService.rug editor. Now I never forget, and I never have to remember.
I love this about developing with Rugs: as I code, I’m asking myself, “how could I automate this?” and then what I learn is encoded in the Rug, for the benefit of future-me and (if I publish it) of future-everyone-else. That is when I feel productive.

Component 2: Coordination between projects

Editors are cute when applied to one project. When applied across an organization, they start to look seriously useful. Imagine: A library released a security update, and we need to upgrade it across the organization. Atomist creates a pull request on every project that uses that library. The build runs, maybe we even auto-merge it when the build passes. Or perhaps there are breaking changes; the editor can sometimes be taught how to make those changes in our code.
And if a Rug can change the way we use a library, then it can change the way we use ours. This is cross-repository refactoring: I publish an internal library, and I want to rename this function in the next version. Here’s my game: I publish not only the new version of my library, but an editor – and then I ask Atomist to create pull requests across the organization. Now it is a quick code review and “accept” for teams to upgrade to the new version.
Atomist coordinates with teams in GitHub and in Slack. Ask Atomist in Slack to start that new feature for you, or to check all repositories in the organization and create pull requests. Atomist can also coordinate with continuous integration. It ties these pieces together across repositories, and including humans. It can react to issues, to build results, to merges; and it can ping you in Slack if it needs more information to act appropriately. I have plans to use this functionality to link libraries to the services that use them: when the build passes on my branch, go build the app that uses my library with this new version, and tell me whether those tests pass.
This is cross-repository refactoring and cross-repository build coordination. This gives companies an alternative to the monorepo, to loading all their libraries and services into one giant repository in order to test them together. The monorepo is a lie: our deployments are heterogenous, so while the monorepo is like “look at this lovely snapshot of a bunch of code that works together” the production environment is something different. The monorepo is also painful because git gets slow when the repository gets large; because it’s hard to tell which commits affect which deployed units; and because application owners lose control over when library upgrades are integrated. Atomist will provide a layer on top of many repositories, letting us coordinate change while our repositories reflect production realities.
Atomist tooling will make multirepo development grow with our codebases.

Component 3: is still a secret

I’m not sure I can talk about the third piece of possibility-expanding tooling yet. So have this instead:
Automated coordination among systems and people who interact with code — this is useful everywhere, but it’s a lot of work to create our own bots for this. Some companies put the resources into creating enough automation for their own needs. No one business-software-building organization has a reason to develop, refine, and publish a general solution for this kind of development-process automation. Atomist does.
When it becomes easy for any developer to script this coordination and the reactions just happen — “Tell me when an issue I reported was closed” “Create a new issue for this commit and then mark it closed as soon as this branch is merged” — then we can all find breakages earlier and we can all keep good records. This automates my work at a higher level than coding. This way whenever I feel annoyed by writing a status report, or when I forget to update the version in one place to match the version in another, my job is not to add an item to a checklist. My job is to create an Atomist handler script to make that happen with no attention from me.

My secret

I love shaving yaks. Shaving them deeply, tenderly, finding the hidden wisdom under their hair. I love adding a useful feature, and then asking “How could that be easier?” and then “How could making that easier be easier?” This is Atomist’s level of meta: We are making software to make it easier for you to make your work easier, as you work to make software to make your customers’ lives easier.
I think we’re doing this in depths and ways other development tools don’t approach. At this level of meta (software for building software for building software for doing work), there’s a lot of leverage, a lot of potential. This level of meta is where orders-of-magnitude changes happen. Software changes the world. I want to be part of changing the software world again, so we can change the real world even faster.
With Atomist, I get to design and specify my own reality, the reality of my team’s work. (Atomist does the operations bit.) Without spending tons of time on it! Well, I get to spend tons of time on it because I get to work for Atomist, because that’s my thing. But you don’t have to spend tons of time on it! You get to specify what you want to happen, in the simplest language we can devise.
We’re looking for teams to work with us on alpha-testing, if you’re interested now. (join our slack, or email me) Let’s learn together the next level of productivity and focus in software development.

May I please have inlining for small dependencies?

Franklin Webber[1]  went on a quest to find out whether
utility code was better or worse than core code in open source
projects. Utility code is defined as anything necessary but not directly related to the application’s
mission. Franklin found that overall quality of utility code was higher, perhaps because this code changes less. He also found a lot of repetition among projects, many
people doing the same thing. A function to find the home directory
in several possible env vars, for instance. HashWithIndifferentAccess,
which doesn’t care whether you pass the key as string or symbol (keyword
in Clojure). He asks, why not break these out into tiny little gems?

Dependency management is an unsolved problem in our industry. We want
single-responsibility, and we want re-use. Reuse means separately
released units. But then when we compile those into one unit, transitive
dependencies conflict. Only one version of a particular library can make it into the final executable. In the example below, the Pasta library might not be compatible with the v2.1.0 of Stir, but that’s what it’ll be calling after it’s compiled into Dinner.

Programs like WebLogic went to crazy ends to provide different
classloaders, so that they could run apps in the same process without
this dependency conflict. That’s a mess!

We could say all libraries should all be backwards compatible. Compatibility is great thing, but
it’s also limiting. It prevents API improvements, and makes code ugly.
There’s more than API concerns as well: if the home-directory-finding
function adds a new place to look, its behavior can change, surprising
existing systems. We want stability and progress, in different places.

With these itty-bitty libraries that Franklin proposes we break out,
compatibility is a problem. So instead of dependency hell, we have chosen
duplication, copying them into various projects.

What if, for little ones, we could inline the dependency? Like
#include in C: copy in the version we want, and link whatever calls
it to that specific implementation. Then other code could depend on
other versions, inlining those separately. These transitive dependencies
would then not be available at the top level.

Seems like this could help with dependency hell. We would then be free
to incorporate tiny utility libraries, individually, without concern
about conflicting versions later.
I wonder if it’s something that could be built into bundler, or leinengen?

This isn’t the right solution for all our dependency problems: not large libraries, and not external-connection clients. Yet if we had this choice, if a library could flag itself as suitable for inlining, then microlibraries become practical, and that could be good practice.

——-
[1] Talk from Steel City Ruby 2014
[2] Angry face:

Icon made by Freepik from www.flaticon.com