Monday, January 16, 2017

Dependencies

Dependency management.
Nobody wants to think about it. We just want this stuff to work.
It is one of the nasty sneaky unsolved problems in software.

Each language system says, we've got a package manager and a build tool. This is how dependencies work. Some are better than others (I <3 Elm; npm OMG) but none of them are complete. We avert our eyes.

Dependencies are important. They're the edges in the software graph, and edges are always where the meaning lies. Edges are also harder to focus on than nodes.

They can be relatively explicit, declared in a pom.xml or package.json.
They can be hard to discover, like HTTP calls to URLs constructed from configuration + code + input.

Dependencies describe how we hook things together. Which means, they also determine our options for breaking things apart. And breaking things apart is how we scale a software system -- scale in our heads, that is; scale in terms of complexity, not volume.

If we look carefully at them, stop wishing they would stop being such a bother, maybe we can get this closer to right. There's a lot more to this topic than is covered in this post, but it's a start.

Libraries v Services


The biggest distinction. Definitions:

Libraries are compiled in. They're separate modules, in different repositories (or directories in a giant repository of doom (aka monorepo)). They are probably maintained by different companies or at least teams. Code re-use is achieved by compiling the same code into multiple applications (aka services). [I'm talking about compile-scope libraries here. Provided-scope, and things like .dll's (what is that even called) are another thing that should probably be a separate category in this post but isn't included.]

Services: one application calls another over the network (or sockets on the same machine); the code runs in different processes. There's some rigmarole in how to find each other: service discovery is totally a problem of its own, with DNS as the most common solution.

Tradeoffs


Visibility


Libraries are declared explicitly, although not always specifically. Something physically brings their code into my code, whether as a jar or as code explicitly.

Service dependencies are declared informally if at all. They may be discovered in logging. They may be discernible from security groups, if you're picky about which applications are allowed to access which other ones.

Deployment


Here's a crucial difference IMO. Libraries: you can release it and ask people to upgrade. If your library is internal, you may even upgrade the version in other teams' code. But it's the users of your library that decide when that new version goes into production. Your new code is upgraded when your users choose to deploy the upgraded code.

Services: You choose when it's upgraded. You deploy that new code, you turn off the old code, and that's it. Everyone who uses your service is using the new code. Bam. You have the power.
This also means you can choose how many of them are running at a time. This is the independent-scalability thing people get excited about.

If your library/service has data backing it, controlling code deployment means a lot for the format of the data. If your database is accessed only by your service, then you can any necessary translations into the code. If your database is accessed by a library that other people incorporate, you'd better keep that schema compatible.

Versioning


There's a lovely Rich Hickey talk, my notes here, about versioning libraries. Much of it also applies to services.

If you change the interface to a library, what you have is a different library. If you name it the same and call it a new version, then what you have is a different library that refuses to compile with the other one and will fight over what gets in. Then you get into the whole question of version conflicts, which different language systems resolve in different ways. Version conflicts occur when the application declares dependencies on two libraries, each of which declares a dependency on the same-name fourth library. In JavaScript, whatever, we can compile in both of them, it's just code copied in anyway. In Java, thou mayest have only one definition of each class name in a given ClassLoader, so the tools choose the newest version and hope everyone can cope.

Services, you can get complicated and do some sort of routing by version; you can run multiple versions of a service in production at the same time. See? You call it two versions of the same service, but it's actually two different services. Same as the libraries. Or, you can support multiple versions of the API within the same code. Backwards compatibility, it's all the pain for you and all the actual-working-software for your users.

API Changes and Backwards Compatibility


So you want to change the way users interact with your code. There's an important distinction here: changing your code (refactoring, bug fix, complete rewrite) is very different from requiring customers to change their code in order to change yours correctly. That's a serious impact.

Services: who uses it? Maybe it's an internal service and you have some hope of grepping all company code for your URL. You have the option of personally coordinating with those teams to change the usage of your service.
Or it's a public-facing service. DON'T CHANGE IT. You can never know who is using it. I mean maybe you don't care about your users, and you're OK with breaking their code. Sad day. Otherwise, you need permanent backwards-compatibility forever, and yes, your code will be ugly.

Libraries: if your package manager is respectable (meaning: immutable, if it ever provides a certain library-version is will continue to provide the same download forever), then your old versions are still around, they can stay in production. You can't take that code away. However, you can break any users who aren't ultra-specific about their version numbers. That's where semantic versioning comes in; it's rude to change the API in anything short of a major version, and people are supposed to be careful about picking up a new major version of your library.
But if you're nice you could name it something different, instead of pretending it's a different number of the same thing.

Isolation


A trick about libraries: it's way harder to know "what is an API change?"
With services it's clear; we recognize certain requests, and provide certain responses.
With libraries, there's all the public methods and public classes and packages and ... well, as a Java/Scala coder, I've never been especially careful about what I expose publicly. But library authors need to be if they're ever going to safely change anything inside the library.

Services are isolated: you can't depend on my internals because you physically can't access them. In order to expose anything to external use I have to make an explicit decision. This is much stronger modularity. It also means you can write them in different languages. That's a bonus.

There are a few companies that sell libraries. Those are some serious professionals, there. They have to test versions from way-back, on every OS they could run on. They have to be super aware of what is exposed, and test the new versions against a lot of scenarios. Services are a lot more practical to throw out there - even though backwards compatibility is a huge pain, at least you know where it is.

Failure


Libraries: it fails, your code fails. It runs out of memory, goodbye process. Failures are communicated synchronously, and if it fails, the app knows it.

Services: it fails, or it doesn't respond, you don't really know that it fails ... ouch. Partial failures, indeterminate failures, are way harder. Even on the same machine coordinating over a socket, we can't guarantee the response time or whether responses are delivered at all. This is all ouch, a major cost of using this modularization mechanism.

Conclusions


I think the biggest consideration in choosing whether to use libraries or services for distribution of effort / modularization is that choice of who decides when it deploys. Who controls which code is in production at a given time.

Libraries are more efficient and easier to handle failures. That's a thing. In-process communication is faster and failures are much easier to handle and consistency is possible.

Services are actual decoupling. They let a team be responsible for their own software, writing it and operating it. They let a team choose what is in production at a given time -- which means there's hope of ever changing data sources or schemas. Generally, I think the inertia present in data, data which has a lot of value, is underemphasized in discussion of software architecture. If you have a solid service interface guarding access to your data, you can (with a lot of painful work) move it into a different database or format. Without that, data migrations may be impossible.

Decoupling of time-of-deployment is essential for maintaining forward momentum as an organization grows from one team to many. Decoupling of features and of language systems, versions, tools helps too. To require everyone use the same tools (or heaven forbid, repository) is to couple every team to another in ways that are avoidable. I want my teams and applications coupled (integrated) in ways that streamline the customer's experience. I don't need them coupled in ways that streamline the development manager's experience.

Overall: libraries are faster until coordination is the bottleneck. Services add more openings to your bottle. That can make your bottle harder to understand.

There's a lot more to the problems of dependency management. This is one crucial distinction. All choices are valid, when made consciously in context. Try to focus through your tears.


Friday, January 13, 2017

Today's Rug: maven executable jar

I like being a polyglot developer, even though it's painful sometimes. I use lots of languages, and in every one I have to look stuff up. That costs me time and concentration.

Yesterday I wanted to promote my locally-useful project from "I can run it in the IDE" to "I can run it at the command line." It's a Scala project built in maven, so I need an executable jar. I've looked this up and figured this out at least twice before. There's a maven plugin you have to add, and then I have to remember how to run an executable jar, and put that in a script. All this feels like busywork.

What's more satisfying than cut-and-pasting into my pom.xml and writing another script? Automating these! So I wrote a Rug editor. Rug editors are code that changes code. There's a Pom type in Rug already, with a method for adding a build plugin, so I cut and paste the example from the internet into my Rug. Then I fill in the main class; that's the only thing that changes from project to project so it's a parameter to my editor. Then I make a script that calls the jar. (The script isn't executable. I submitted an issue in Rug to add that function.) The editor prints out little instructions for me, too.

$ rug edit -lC ~/code/scala/org-dep-graph MakeExecutableJar main_class=com.jessitron.jessMakesAPicture.MakeAPicture

Resolving dependencies for jessitron:scattered-rugs:0.1.0 ← local completed
Loading jessitron:scattered-rugs:0.1.0 ← local into runtime completed
run `mvn package` to create an executable jar
Find a run script in your project's bin directory. You'll have to make it executable yourself, sorry.
Running editor MakeExecutableJar of jessitron:scattered-rugs:0.1.0 ← local completed

→ Project
  ~/code/scala/org-dep-graph/ (8 mb in 252 files)

→ Changes
  ├── pom.xml updated 2 kb
  ├── pom.xml updated 2 kb
  ├── bin/run created 570 bytes
  └── .atomist.yml created 702 bytes

Successfully edited project org-dep-graph

It took a few iterations to get it working, probably half an hour more than doing the task manually.
It feels better to do something permanently than to do it again.

Encoded in this editor is knowledge:
* what is that maven plugin that makes an executable jar? [1]
* how do I add it to the pom? [2]
* what's the maven command to build it? [3]
* how do I get it to name the jar something consistent? [4]
* how do I run an executable jar? [5]
* how do I find the jar in a relative directory from the script? [6]
* how do I get that right even when I call the script from a symlink? [7]

It's like saving my work, except it's saving the work instead of the results of the work. This is going to make my brain scale to more languages and build tools.

--------------------------------------------
below the fold: the annotated editor. source here, instructions here in case you want to use it -> or better, change it -> or even better, make your own.

@description "teaches a maven project how to make an executablejar"
@tag "maven"
editor MakeExecutableJar

@displayName "Main Class"
@description "Fully qualified Java classname"
@minLength 1
@maxLength 100
param main_class: ^.*$


let pluginContents = """<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <configuration>
          <outputFile>target/executable.jar</outputFile>
[4]         </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>__I_AM_THE_MAIN__</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
""" [2]

let runScript = """#!/bin/bash

# http://stackoverflow.com/questions/59895/getting-the-source-directory-of-a-bash-script-from-within
SOURCE="${BASH_SOURCE[0]}" 
while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
  DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
  SOURCE="$(readlink "$SOURCE")"
  [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
done [7]
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
[6]

java -jar $DIR/../target/executable.jar "$@" [5]
"""


with Pom p
  do addOrReplaceBuildPlugin "org.apache.maven.plugins" "maven-shade-plugin" pluginContents [1]

with File f when path = "pom.xml" begin
  do replace "__I_AM_THE_MAIN__" main_class
  do eval { print("run `mvn package` to create an executable jar")[3]
end

with Project p begin
  do eval { print("Find a run script in your project's bin directory. You'll have to make it executable yourself, sorry") }
  do addFile "bin/run" runScript
end




Monday, January 9, 2017

It's Atomist Time!

I'm hella excited to get to work on Atomist full-time starting now (January 2017). Why? What do they do? Oh let me tell you!

I love developing software, not least because we (as an industry) have not yet figured out how to software. We know it's powerful, but not yet how powerful. Software is like engineering except that the constraints aren't in what we can build, but what we can design and specify. Atomist is expanding this capacity.
Atomist builds tooling to smooth some bumps in software development. There are three components that I'm excited about, three components that open new options in how we develop software.

Component 1: Code that changes code


First, there are code editors, called Rugs. On the surface, these automate the typing part. Like code generators, except they continue to work with the code after you modify it. Like refactorings in an IDE, except they appear as a pull request, and then you can continue development on that branch. If you have some consistent code structure (and if you use a framework, you do), Rugs can perform common feature-adding or upgrading or refactoring operations. Use standard Rugs to, say, add graph database support to an existing Spring Boot project. Customize Rugs to set up your Travis build uniformly in your projects. Create your own Rugs to implement metrics integration according to your company's standards -- and to upgrade existing code when those standards change.

On the surface this is an incremental improvement over existing code generation and IDE refactoring tools. Yet, I see it as something more. I see it as a whole new answer to the question of "indirection or repetition?" in code. Take for instance: adding a field to a Rails app makes us change the controller, the model, and four other places. Or creating a new service means changing deployment configuration, provisioning, and service discovery. Whenever a single conceptual change requires code changes in multiple spots, we complain about the work and we make mistakes. Then we start to get clever with it: we introduce some form of indirection that localizes that change to one place. Configuration files get generated in the build, Ruby metaprogramming introduces syntax that I can't even figure out how it's executable -- magic happens. The code gets less explicit, so that we can enforce consistency and make changing it ... well, I'm not gonna say "easier" because learning to cast the spell is tricky, but it is less typing.

Atomist introduces a third alternative: express that single intention ("create a new service" or "add this field") as a Rug editor. This makes writing it one step, and then the editor makes all those code changes in a single commit in a branch. From there, customize your field or your new service; each commit that you make shows how your feature is special. The code remains explicit, without additional magic. When I come back and read it, I have some hope of understanding what it's doing. When I realize that I forgot something ("oops! I also need to add that service to the list of log sources") then I fix it once, in the NewService.rug editor. Now I never forget, and I never have to remember.

I love this about developing with Rugs: as I code, I'm asking myself, "how could I automate this?" and then what I learn is encoded in the Rug, for the benefit of future-me and (if I publish it) of future-everyone-else. That is when I feel productive.

Component 2: Coordination between projects


Editors are cute when applied to one project. When applied across an organization, they start to look seriously useful. Imagine: A library released a security update, and we need to upgrade it across the organization. Atomist creates a pull request on every project that uses that library. The build runs, maybe we even auto-merge it when the build passes. Or perhaps there are breaking changes; the editor can sometimes be taught how to make those changes in our code.

And if a Rug can change the way we use a library, then it can change the way we use ours. This is cross-repository refactoring: I publish an internal library, and I want to rename this function in the next version. Here's my game: I publish not only the new version of my library, but an editor - and then I ask Atomist to create pull requests across the organization. Now it is a quick code review and "accept" for teams to upgrade to the new version.

Atomist coordinates with teams in GitHub and in Slack. Ask Atomist in Slack to start that new feature for you, or to check all repositories in the organization and create pull requests. Atomist can also coordinate with continuous integration. It ties these pieces together across repositories, and including humans. It can react to issues, to build results, to merges; and it can ping you in Slack if it needs more information to act appropriately. I have plans to use this functionality to link libraries to the services that use them: when the build passes on my branch, go build the app that uses my library with this new version, and tell me whether those tests pass.

This is cross-repository refactoring and cross-repository build coordination. This gives companies an alternative to the monorepo, to loading all their libraries and services into one giant repository in order to test them together. The monorepo is a lie: our deployments are heterogenous, so while the monorepo is like "look at this lovely snapshot of a bunch of code that works together" the production environment is something different. The monorepo is also painful because git gets slow when the repository gets large; because it's hard to tell which commits affect which deployed units; and because application owners lose control over when library upgrades are integrated. Atomist will provide a layer on top of many repositories, letting us coordinate change while our repositories reflect production realities.

Atomist tooling will make multirepo development grow with our codebases.

Component 3: is still a secret


I'm not sure I can talk about the third piece of possibility-expanding tooling yet. So have this instead:

Automated coordination among systems and people who interact with code -- this is useful everywhere, but it's a lot of work to create our own bots for this. Some companies put the resources into creating enough automation for their own needs. No one business-software-building organization has a reason to develop, refine, and publish a general solution for this kind of development-process automation. Atomist does.

When it becomes easy for any developer to script this coordination and the reactions just happen -- "Tell me when an issue I reported was closed" "Create a new issue for this commit and then mark it closed as soon as this branch is merged" -- then we can all find breakages earlier and we can all keep good records. This automates my work at a higher level than coding. This way whenever I feel annoyed by writing a status report, or when I forget to update the version in one place to match the version in another, my job is not to add an item to a checklist. My job is to create an Atomist handler script to make that happen with no attention from me.

My secret


I love shaving yaks. Shaving them deeply, tenderly, finding the hidden wisdom under their hair. I love adding a useful feature, and then asking "How could that be easier?" and then "How could making that easier be easier?" This is Atomist's level of meta: We are making software to make it easier for you to make your work easier, as you work to make software to make your customers' lives easier.

I think we're doing this in depths and ways other development tools don't approach. At this level of meta (software for building software for building software for doing work), there's a lot of leverage, a lot of potential. This level of meta is where orders-of-magnitude changes happen. Software changes the world. I want to be part of changing the software world again, so we can change the real world even faster.

With Atomist, I get to design and specify my own reality, the reality of my team's work. (Atomist does the operations bit.) Without spending tons of time on it! Well, I get to spend tons of time on it because I get to work for Atomist, because that's my thing. But you don't have to spend tons of time on it! You get to specify what you want to happen, in the simplest language we can devise.
We're looking for teams to work with us on alpha-testing, if you're interested now. (join our slack, or email me) Let's learn together the next level of productivity and focus in software development.