Video: Automating at a Higher Level with Atomist

Philly ETE has published my video on InfoQ!

 This is about development automation: why it’s a great idea, why we should do more of it, and how Atomist is enabling it. There is live coding, there is me getting excited, and there is an awesome sparkly dinosaur shirt. I love this shirt. I also love automation, can you tell?


The talk is from April 2017. Abstract:

As developers, we automate. We automate other work, and sometimes we automate our own. We save typing with templates or IDEs. We save searching out information with Slack integrations. Many companies have custom internal bots to tie together chat, version control, and build servers.

Most companies can’t and shouldn’t dedicate whole teams to streamlining development. And they shouldn’t have to. Atomist is building programmable developer automation, from typing code to tracking status. Keep your code up-to-date with automated pull requests and generators. Keep yourselves up-to-date with mini-dashboards in chat that correlate commits with builds, complete with action buttons to cut down your context switches.

This session will demonstrate the standard Atomist coordination and automation tools, plus how to program instant automation for your code and team.

Scala Maven rugs

“Add a pom to my toy Scala project so I can build it with Maven” sounds simple. It is, if you do it every day. I can look it up, yet again. And get it wrong, yet again. And consult an expert with “Why doesn’t it see my Scala sources! This works fine from the IDE.” And get the answer, and then forget it again.

This time I encoded the answer into a Rug. Now I can run this program to add the pom.xml to toy Scala projects. While I was at it, I encoded how to add Scalatest, so I can stop looking that up over and over.

These rugs aren’t published. To use them, clone the repository and run rug in local mode. That repository has collected three related Rugs now; it seems worthwhile to publish them as a group. Then nobody else would have to look this up again either! I might do this … after I create an Atomist executor to move Rugs from one repository to another. (I have a feeling I’m going to do a lot of this while working on Atomist. Automate all the things!!)

That good feeling that I get from encoding this knowledge so that I don’t have to look it up again is deceptive. Version numbers increment, practices evolve. This carefully encoded knowledge grows stale.

Publishing the Rugs is a responsibility of ownership, of keeping them up to date. Only then do they shine with enduring value to the community. Am I ready to accept that obligation? Not today, because I’m waiting until I can automate that too. This day will come.

For now, I’m happy with making my near-future-self more productive — when it’s a step toward making oodles of developers more productive someday.

Today’s Rug: maven executable jar

Today’s Rug: maven executable jar

this is supposed to be a rug with a jar on it. An active jar.

I like being a polyglot developer, even though it’s painful sometimes. I use lots of languages, and in every one I have to look stuff up. That costs me time and concentration.

Yesterday I wanted to promote my locally-useful project from “I can run it in the IDE” to “I can run it at the command line.” It’s a Scala project built in maven, so I need an executable jar. I’ve looked this up and figured this out at least twice before. There’s a maven plugin you have to add, and then I have to remember how to run an executable jar, and put that in a script. All this feels like busywork.

What’s more satisfying than cut-and-pasting into my pom.xml and writing another script? Automating these! So I wrote a Rug editor. Rug editors are code that changes code. There’s a Pom type in Rug already, with a method for adding a build plugin, so I cut and paste the example from the internet into my Rug. Then I fill in the main class; that’s the only thing that changes from project to project so it’s a parameter to my editor. Then I make a script that calls the jar. (The script isn’t executable. I submitted an issue in Rug to add that function.) The editor prints out little instructions for me, too.

$ rug edit -lC ~/code/scala/org-dep-graph MakeExecutableJar main_class=com.jessitron.jessMakesAPicture.MakeAPicture
run `mvn package` to create an executable jar
Find a run script in your project’s bin directory. You’ll have to make it executable yourself, sorry.

→ Changes
├── pom.xml updated 2 kb
├── pom.xml updated 2 kb
├── bin/run created 570 bytes
└── .atomist.yml created 702 bytes

Successfully edited project org-dep-graph

It took a few iterations to get it working, probably half an hour more than doing the task manually. It feels better to do something permanently than to do it again. Encoded in this editor is knowledge: * what is that maven plugin that makes an executable jar? [1] * how do I add it to the pom? [2] * what’s the maven command to build it? [3] * how do I get it to name the jar something consistent? [4] * how do I run an executable jar? [5] * how do I find the jar in a relative directory from the script? [6] * how do I get that right even when I call the script from a symlink? [7]

It’s like saving my work, except it’s saving the work instead of the results of the work. This is going to make my brain scale to more languages and build tools.

— — — — — — — — — — — — — — — — — — — — — —

below the fold: the annotated editor. source here, instructions here in case you want to use it -> or better, change it -> or even better, make your own.

@description “teaches a maven project how to make an executablejar” @tag “maven” editor MakeExecutableJar
@displayName “Main Class”
@description “Fully qualified Java classname”
@minLength 1 @maxLength 100
param main_class: @java_package
let pluginContents = “””<plugin><groupId>org.apache.maven.plugins</groupId>
<transformer implementation=”org.apache.maven.plugins.shade.resource.ManifestResourceTransformer”>
</transformer> </transformers> </configuration> </execution> </executions>
</plugin>“”” [2]
let runScript = “””#!/bin/bash
while [ -h “$SOURCE” ]; do
DIR=”$( cd -P “$( dirname “$SOURCE” )” && pwd )”
SOURCE=”$(readlink “$SOURCE”)”
[[ $SOURCE != /* ]] && SOURCE=”$DIR/$SOURCE”
done [7]
DIR=”$( cd -P “$( dirname “$SOURCE” )” && pwd )” [6]
java -jar $DIR/../target/executable.jar “$@” [5]
with Pom p
do addOrReplaceBuildPlugin “org.apache.maven.plugins” “maven-shade-plugin” pluginContents [1]
with File f when path = “pom.xml” begin
do replace “__I_AM_THE_MAIN__” main_class
do eval { print(“run `mvn package` to create an executable jar”)} [3]
with Project p begin
do eval { print(“Find a run script in your project’s bin directory. You’ll have to make it executable yourself, sorry”) }
do addFile “bin/run” runScript

Originally published at on January 13, 2017.

Declarative style in Java

The other day at work, we had this problem:

Sort a sequence of rows, based on an input list of columns, where each column might be sorted ascending or descending, and each column might require a transformation first to get to something comparable.

The function interface looks like this:

Here is a concrete example where we sort food items first by their recommended meal (with a transformation that says “breakfast is first, then lunch, then dinner”),  and then by food name in reverse order:

What is the absolutely stupidest way to do this?

The stupidest way to do this is to write your own sort, that happens to order rows by looping through the instruction set. But we don’t do it that way. We know very well that sort is hard, and there are library algorithms to do it. Put the ordering logic in a custom Comparator and pass that into sort.

In Java, that looks like

Collections.sort(input, comparator);

This is a declarative style: sort this, to this specification. It separates the algorithm of sorting from the business logic of which row belongs before which other row. A professional Java developer would not consider rewriting sort; they wrap the logic in a Comparator and pass it in.

Heck, even in the C standard libraries, qsort accepts a function pointer. Sort is hard enough that we don’t complain about passing in comparison logic, however awkward.

As opposed to filter, which we rewrite every time we need part of a list. If I wanted just the breakfast foods, I might write

List breakfastFoods = new ArrayList();
for (item : allFoods) {
  if(item.getMeal().equals(“Breakfast”)) {
return breakfastFoods;

which would be completely imperative, describing the filter algorithm and even allocation and mutation of a new object. Yet this is easy enough that we used to do it all the time. Fortunately Java 8 fixes this:
  .filter(item -> item.getMeal().equals(“Breakfast”));

Or in Java 6:

Iterables.filter(allFoods, BREAKFAST_PREDICATE);[1]

Idiomatic Java is moving further and further toward a declarative style, and that’s a beautiful thing. While filter may be tremendously simpler than sort, but it’s still repetition, and re-implementing it every time is still limiting. The filter method on Stream can perform the operation in parallel, it can weave it in with other stream operations and skip instantiating intermediate lists, it can be lazy and do less filtering — there are many possibilities when we let go of “how” something works and specify only what it must do.

If your co-workers complain about the new ways of thinking, ask them whether they’d rewrite sort. It’s all a matter of degree.

[1] This imports the Guava library and defines a constant:

final Predicate BREAKFAST_PREDICATE = new Predicate() {
   public boolean apply(FoodRow item) {
     return item -> item.getMeal().equals(“Breakfast”);

Left to right, top to bottom

TL;DR – Clojure’s threading macro keeps code in a legible order, and it’s more extensible than methods.

When we create methods in classes, we like that we’re grouping operations with related data. It’s a useful organizational scheme. There’s another reason to like methods: they put the code in an order that’s easy to read. In the old days it might read top-to-bottom, with subject and then verb and then the objects of the verb:

With a fluent interface that supports immutability, methods still give us a pleasing left-to-right ordering:
Methods look great, but it’s hard to add new ones. Maybe I sometimes want to add functionality for returns, or print a gift receipt. With functions, there is no limit to this. The secret is: methods are the same thing as functions, except with an extra secret parameter called this
For example, consider JavaScript. (full gist) A method there can be any old function, and it can use properties of this.

completeSale = function(num) {
console.log("Sale " + num + ": selling " 

+ this.items + " to " + this.customer);

Give that value to an object property, and poof, the property is a method:

var sale = {

customer: "Fred",

items: ["carrot","eggs"],

complete: completeSale

// Sale 99: selling carrot,eggs to Fred

Or, call the function directly, and the first argument plays the role of “this”:, 100)
// Sale 100: selling carrot,eggs to Fred
In Scala we can create methods or functions for any operation, and still organize them right along with the data. I can choose between a method in the class:
class Sale(…) {
   def complete(num: Int) {…}
or a function in the companion object:
object Sale {
   def complete(sale: Sale, num: Int) {…}
Here, the function in the companion object can even access private members of the class[1]. The latter style is more functional. I like writing functions instead of methods because (1) all input is explicit and (2) I can add more functions as needed, and only as needed, and without jumbling up the two styles. When I write functions about data, instead of attaching functions to data, I can import the functions I need and no more. Methods are always on a class, whether I like it or not.
There’s a serious disadvantage to the function-with-explicit-parameter choice, though. Instead of a nice left-to-right reading style, we get:

It’s all inside-out-looking! What happens first is in the middle, and the objects are separated from the verbs they serve. Blech! It sucks that function application reads inside-out, right-to-left. The code is hard to follow.

We want the output of addCustomer to go to addItems, and the output of addItems to go to complete. Can I do this in a readable order? I don’t want to stuff all my functions into the class as methods.
In Scala, I wind up with this:

Here it reads top-down, and the arguments aren’t spread out all over the place. But I still have to draw lines, mentally, between what goes where. And sometimes I screw it up.

Clojure has the ideal solution. It’s called the threading macro. It has a terrible name, because there’s no relation to threads, nothing asynchronous. Instead, it’s about cramming the output of one function into the first argument of the next. If addCustomer, addItems, and complete are all functions which take a sale as the first parameter, the threading macro says, “Start with this. Cram it into first argument of the function call, and take that result and cram it into the first argument of the next function call, and so on.” The result of the last operation comes out. (full gist

\\ Sale 99 : selling [carrot eggs] to Fred

This has a clear top-down ordering to it. It’s subject, verb, object. It’s a great substitute for methods. It’s kinda like stitching the data in where it belongs, weaving the operations together. Maybe that’s why it’s called the threading macro. (I would have called it cramming instead.)

Clojure’s prefix notation has a reputation for being harder to read, but this changes that. The threading macro pulls the subject out of the first function argument and puts it at the top, at the beginning of the sentence. I wish Scala had this!
In case you’re still interested, here’s a second example: list processing.

Methods in Scala look nice:

but they’re not extensible. If these were functions I’d have:

which is hideous. So I wind up with:
That is easy to mess up; I have to get the intermediate variables right.
In Haskell it’s function composition:
That reads backwards, right-to-left, but it does keep the objects with the verbs.

Notice that in Haskell the map, filter, reduce functions take the data as their last parameter.[2] This is also the case in Clojure, so we can use the last-parameter threading macro. It has the cramming effect of shoving the previous result into the last parameter:

Once again, Clojure gives us a top-down, subject-verb-object form. See? the Lisp is perfectly readable, once you know which paths to twist your brain down.

Update: As @ppog_penguin reminded me, F# has the best syntax of all. Its pipe operator acts a lot like the Unix pipe, and sends data into the last parameter.
F# is my favorite!
[1] technical detail: the companion object can’t see members that are private[this]
[2] technical detail: all functions in Haskell take one parameter; applying map to a predicate returns a function of one parameter that expects the list.

Abstractions over Threads in Java and Scala

TL;DR In Java, get a library that makes Futures work like Scala’s, and then never use ExecutorService directly.

In the beginning, there were Threads. And Java threads were nice and simple. That is, Java threads are simple like some assembly languages are simple: there’s only a few things you can do.

Since then, Java and then Scala have created higher-level abstractions. These are what you want to use. This post explains the differences between them, and what happens when exceptions are thrown.

Java’s Executor, introduced in Java 5, implements thread pooling for you. The only method on an Executor is execute(Runnable). That’s simple too! Give it something to do, and it eventually does it. If an exception trickles up, it goes to the thread’s UncaughtExceptionHandler, which typically prints the stack trace to System.err.

All the implementations provided in Executors also implement ExecutorService, a more extensive interface. Pass the submit() method a Callable or a Runnable, and get back a java.util.concurrent.Future. Please note that Java’s Future is limited. You can’t ask it to do anything on completion or failure. You can pretty much only call get(), which blocks until your task is complete, then returns its result or throws its exception.[1]

If you submitted a task for its side effects, and you never call get() on the Java Future, then no one will ever know about any Exception it throws. It never makes it to the Thread’s UncaughtExceptionHandler, and it never gets output. To get an ExecutorService that never hides exceptions, extend ThreadPoolExecutor, override afterExecute and guarantee that get() is called. What a pain!

Now I’ll switch over to Scala-land, because it has something to tell us about Java Futures.

Scala’s ExecutionContext interface (trait) extends Executor, providing that execute() method. You’ll probably never use this directly, but you’ll need one to work with Scala Futures. There are two good ways to get it. First, use the default; it’s good. Second, if you want your own thread pool, the factory ExecutionContext.fromExecutorService creates a thin wrapper that delegates to your carefully chosen ExecutorService.

To start an asynchronous task in Scala, call

val doStuff = Future { /* do stuff */ } (executionContext)

This will execute the stuff on that executionContext[2], and give you a reference that’s useful.

When you make a Java Future by submitting on an ExecutorService, you have to pass in the whole sequence of code that you want executed. All the error handling has to be there. When you want to do something after that asynchronous code completes, there’s nothing to do but block until it completes.

Scala Futures remove that restriction. You can start something going, then add error handling by calling onFailure.[3] You can extend the asynchronous work with onSuccess. You can even say, “after these three Futures complete, then do this other thing with all three results.” This lets you separate deciding what needs to happen in what order from defining each thing that needs to happen. Yay separation of concerns! I like how this style of programming lets me code the interesting bits first and then fill in the rest.

All these Future-extending and Future-combining services create asynchronous computations of their own, and want an ExecutionContext. This does not have to be the same one the Future is running on. Once a Future is constructed, it does not remember the ExecutionContext.

A task tacked on to another Future will automatically run when it can. Failures will be handled, successes will proceed. This means you aren’t required to ask a Scala Future for its result. It’s possible to do so (and I often do in test code), but discouraged. If you want to do something with the value, use onSuccess. You never have to block a thread!

We can work this way in Java too. In Java 8 there’s native support. Earlier, we can use alternative futures provided in libraries such as Guava. Use this to define asynchronous tasks in smaller, more flexible bits.

This culminates a series of posts on choosing the right ExecutorService. See also Pool-Induced DeadlockForkJoinPool, and Scala’s global ExecutionContext.

For Scala developers:
[3] I said that Scala futures let you handle errors with onFailure. This isn’t true for what Scala considers Fatal errors; these remain uncaught. They propagate to the UncaughtExceptionHandler, which prints to stdout, and that’s it. The thread dies. Your onComplete, onFailure, onSuccess methods, they’re never called. Silent death. If you Await its result, the Await will timeout. Very bad! In the Scala source as of this writing, this happens only for very serious errors: VirtualMachineError, ThreadDeath, InterruptedException, LinkageError, ControlThrowable. However, in Scala 2.10.x, NotImplementedError is “fatal”. When I left a method as ???, the thread disappeared and my program hung. That took forever to debug.

One alternative is to use scalaz.

The scalaz library provides its own Future. The scalaz.concurrent.Future wants an ExecutorService. (This means you can’t use the global ExecutionContext.) Some important differences:
* scalaz defaults the implicit ExecutorService parameter to one with a FixedThreadPool. Because you aren’t required to supply one at all, you don’t always realize you’re using that default.
* Because scalaz calls submit() on the ExecutorService, uncaught exceptions do not hit the UncaughtExceptionHandler and are never printed. Do not use scalaz’s Future directly: use Task instead, which wraps everything in a try {} catch {}.
* In the standard constructor of Task {…} (and Future { … }), the work is not submitted immediately. It is submitted on a call to run or attemptRun.
* Also if you use this standard constructor, then every time you run a Task, the work will be repeated. This is not true of Scala’s future; those will run exactly once.

Hopefully, once you choose a good abstraction, you won’t have to think about this stuff ever again.

[1] You can also cancel a Java Future, if you care about that.
[2] If it’s the global, it’ll sneakily fork a ForkJoinTask.
[3] is in the “For Scala developers” bit above
[4] The behavior of the UncaughtExceptionHandler can be configured on Threads created in a ThreadFactory that you supply to an ExecutorService of your own construction that you then wrap in an ExecutionContext. And good luck figuring out anything more useful than printing them.

ForkJoinPool: the Other ExecutorService

In Java, an ExecutorService manages a pool of threads that can run tasks. Most ExecutorServices treat all tasks the same. Somebody hands it something to do, the ExecutorService parcels it out to a thread, the thread runs it. Next!

A ForkJoinPool is an ExecutorService that recognizes explicit dependencies between tasks. It is designed for the kind of computation that wants to run in parallel, and then maybe more parallel, and then some of those can run in parallel too. Results of the parallel computations are then combined, so it’s like the threads want to split off and then regroup.

Maybe it’s a computation like, “What’s the shortest path of followers from me to @richhickey?” or “What is the total size of all files in all directories?” or “What’s the most relevant ad to display to this customer?” where we don’t know what all we’re going to have to execute until we’re in the middle of executing it.

On an ordinary ExecutorService, when we split a computation up, each task goes its separate way. Each one is allocated to a thread individually. This becomes a problem when the tasks are small, and the overhead of allocating them to threads takes longer than running them. It becomes a bigger problem when threads split off tasks and wait for all the results to come back to combine them: pretty soon so many threads are waiting that there are no more threads to do the work. This can reach deadlock.

ForkJoinPool embraces many small computations that spawn off and then come back together. It says, “When my thread wants to split its work into many small computations, it shall create them, and then start working on them. If another thread wants to come along and help, great.”

A computation in a ForkJoinPool is like a mother who told all her children to clean the house. While she’s waiting for them to finish their level on the video game she starts picking up. Eventually some kids get up and start helping. When Evelyn starts sweeping and isn’t done by the time Linda has finished the bathroom, then Linda picks up a broom and helps. Eventually the mother takes stock and says, “Hurray! The house is clean.”

That’s a completely unrealistic scenario in my household, but ForkJoinPools are more disciplined than my children. They support unpredictable parallel computation, preventing pool-induced deadlock, and minimize the work of switching back and forth between threads on the CPU.

What’s not to love? Well, a ForkJoinPool is harder to use than a regular old ExecutorService. It’s more complicated than calling “submit.” External threads submit jobs to a ForkJoinPool in an ordinary way, but within the pool, tasks are created differently. ForkJoinTask subclasses get constructed, forked off for execution, and then joined. It’s custom handling, and that requires planning ahead, and that means you have to guess that ForkJoinPool is the solution before you start coding. Or retrofit it later.

Scala does something clever to hide the difference between ForkJoinPools and regular ExecutorServices, so that its Futures work this way by default. Akka uses ForkJoinPools behind the scenes for its actor messaging. Clojure uses ForkJoinPools in its collection processing with Reducers. In Scala and Clojure, you can get these optimizations without extra headache. The abstractions, they keep getting deeper!

Doug Lea wrote ForkJoin for Java and Scala.

Choosing an ExecutorService


When you need an ExecutorService, the Executors class has several for you. Sometimes it doesn’t matter which you choose, and other times it matters a LOT. The above flow chart is approximate. If it’s simple parallel computation you’re doing, then a fixed pool with as many threads as CPUs works. If those computations start other computations, then you’ll want a ForkJoinPool (that’s another post). If the purpose of threading is to avoid blocking on I/O, that’s different. Maybe you don’t want to limit the number of threads that can wait on I/O, and then Executors.newCachedThreadPool is a good choice.

When you would like to limit the number of threads you start AND your tasks might start other tasks in the same thread pool, then you must worry about Pool Induced Deadlock. Then you need to think about what happens when one task needs to start another. It’s time to get into the nitty-gritty of a ThreadPoolExecutor. This is what the methods on Executors construct for you under the hood, and you can make your own for finer control. To choose them, you need to understand what the ThreadPoolExecutor does when a new task comes in.

Here’s a nice new ThreadPoolExecutor.

Some tasks come in, Runnables passed through execute or submit. At first the ThreadPoolExecutor happily starts a new thread per task. It does this up to the core pool size. Note that even if a thread is idle, it’ll still start a new thread per task until the core number of threads are running. These are up until the pool shuts down, unless you configure it otherwise.[1]

If all the core threads are busy, then the ThreadPoolExecutor will begin storing Runnables in its queue. The BlockingQueue passed in at construction can have capacity 0, MAX_INT, or anywhere in between. Note that Runnables are stored here even if the maximum number of threads are not running. This can cause deadlock if the tasks in the core threads are waiting on tasks they submitted.

Only if the queue is full will the ThreadPoolExecutor start more than the core number of threads. It’ll start them up to the maximum pool size, only as long as the queue is full and more tasks are coming in.

Finally, if there’s no more room in the queue and no more threads allowed, submitting a task to this ThreadPoolExecutor will throw RejectedExecutionException. (You can configure it to drop work, or to tell the calling thread to do its own task instead.[2])

The pictured case is an unusual one, where core size, queue capacity, and max size are all nonzero and finite. A more common case is FixedThreadPool, with a fixed number of threads and effectively infinite queue. Threads will start up and stay up, and tasks will wait their turns. The other common case is CachedThreadPool, with an always-empty queue and effectively infinite threads. Here, the threads will time out when they’re not busy.

If you need something in between, you can construct it yourself. The fixed thread pool is good to put a limit on I/O or CPU context switches. The cached one avoids pool-induced deadlock. If you’re doing interesting recursive calculations, then look into ForkJoinPool.

Aside: All of these ExecutorServices will, by default, start Threads that keep your application alive until they’re done. If you don’t like that, build a ThreadFactory that sets them up as daemon threads, and pass it in when you create the ExecutorService.

Bonus material: Scala REPL code that I used to poke around in the REPL and check these out.
If you’re in Scala, a great way to create a thread pool is Akka’s ThreadPoolBuilder.

Example: The thermometer-looking drawings represent tasks either in process or queued inside the ExecutorService. Red and green represent tasks in process, and amber ones are sitting in the queue. As tasks are completed, the level drops. As tasks are submitted to the ExecutorService, the level goes up.

If I set up an ExecutorService like this:

new ThreadPoolExecutor(
  5, // core pool size
  8, // max pool size, for at most (8 – 5 = 3) red threads
  10, TimeUnit.SECONDS, // idle red threads live this long
  new ArrayBlockingQueue(7)); // queue with finite maximum size

Assuming no tasks complete yet: The first 5 tasks submitted will start up threads; the next 7 tasks will queue; the next 3 tasks coming in will cause more threads to be started. (Tasks will be pulled from the front of the queue and put on the new threads, so the last tasks submitted can fit at the back of the queue.) Any more submission attempts throw RejectedExecution exception.

[1] Core threads will be closed when idle only if you set that up: threadPoolExecutor.allowCoreThreadTimeout(true)
[2] Do this by supplying a different RejectedExecutionHandler in the constructor.