Monday, February 3, 2014

Abstractions over Threads in Java and Scala

TL;DR In Java, get a library that makes Futures work like Scala's, and then never use ExecutorService directly.

In the beginning, there were Threads. And Java threads were nice and simple. That is, Java threads are simple like some assembly languages are simple: there's only a few things you can do.

Since then, Java and then Scala have created higher-level abstractions. These are what you want to use. This post explains the differences between them, and what happens when exceptions are thrown.

Java's Executor, introduced in Java 5, implements thread pooling for you. The only method on an Executor is execute(Runnable). That's simple too! Give it something to do, and it eventually does it. If an exception trickles up, it goes to the thread's UncaughtExceptionHandler, which typically prints the stack trace to System.err.

All the implementations provided in Executors also implement ExecutorService, a more extensive interface. Pass the submit() method a Callable or a Runnable, and get back a java.util.concurrent.Future. Please note that Java's Future is limited. You can't ask it to do anything on completion or failure. You can pretty much only call get(), which blocks until your task is complete, then returns its result or throws its exception.[1]

If you submitted a task for its side effects, and you never call get() on the Java Future, then no one will ever know about any Exception it throws. It never makes it to the Thread's UncaughtExceptionHandler, and it never gets output. To get an ExecutorService that never hides exceptions, extend ThreadPoolExecutor, override afterExecute and guarantee that get() is called. What a pain!

Now I'll switch over to Scala-land, because it has something to tell us about Java Futures.

Scala's ExecutionContext interface (trait) extends Executor, providing that execute() method. You'll probably never use this directly, but you'll need one to work with Scala Futures. There are two good ways to get it. First, use the default ExecutionContext.global; it's good. Second, if you want your own thread pool, the factory ExecutionContext.fromExecutorService creates a thin wrapper that delegates to your carefully chosen ExecutorService.

To start an asynchronous task in Scala, call

val doStuff = Future { /* do stuff */ } (executionContext)

This will execute the stuff on that executionContext[2], and give you a reference that's useful.

When you make a Java Future by submitting on an ExecutorService, you have to pass in the whole sequence of code that you want executed. All the error handling has to be there. When you want to do something after that asynchronous code completes, there's nothing to do but block until it completes.

Scala Futures remove that restriction. You can start something going, then add error handling by calling onFailure.[3] You can extend the asynchronous work with onSuccess. You can even say, "after these three Futures complete, then do this other thing with all three results." This lets you separate deciding what needs to happen in what order from defining each thing that needs to happen. Yay separation of concerns! I like how this style of programming lets me code the interesting bits first and then fill in the rest.

All these Future-extending and Future-combining services create asynchronous computations of their own, and want an ExecutionContext. This does not have to be the same one the Future is running on. Once a Future is constructed, it does not remember the ExecutionContext.

A task tacked on to another Future will automatically run when it can. Failures will be handled, successes will proceed. This means you aren't required to ask a Scala Future for its result. It's possible to do so (and I often do in test code), but discouraged. If you want to do something with the value, use onSuccess. You never have to block a thread!

We can work this way in Java too. In Java 8 there's native support. Earlier, we can use alternative futures provided in libraries such as Guava. Use this to define asynchronous tasks in smaller, more flexible bits.

This culminates a series of posts on choosing the right ExecutorService. See also Pool-Induced DeadlockForkJoinPool, and Scala's global ExecutionContext.

For Scala developers:
[3] I said that Scala futures let you handle errors with onFailure. This isn't true for what Scala considers Fatal errors; these remain uncaught. They propagate to the UncaughtExceptionHandler, which prints to stdout, and that's it. The thread dies. Your onComplete, onFailure, onSuccess methods, they're never called. Silent death. If you Await its result, the Await will timeout. Very bad! In the Scala source as of this writing, this happens only for very serious errors: VirtualMachineError, ThreadDeath, InterruptedException, LinkageError, ControlThrowable. However, in Scala 2.10.x, NotImplementedError is "fatal". When I left a method as ???, the thread disappeared and my program hung. That took forever to debug.

One alternative is to use scalaz.

The scalaz library provides its own Future. The scalaz.concurrent.Future wants an ExecutorService. (This means you can't use the global ExecutionContext.) Some important differences:
* scalaz defaults the implicit ExecutorService parameter to one with a FixedThreadPool. Because you aren't required to supply one at all, you don't always realize you're using that default.
* Because scalaz calls submit() on the ExecutorService, uncaught exceptions do not hit the UncaughtExceptionHandler and are never printed. Do not use scalaz's Future directly: use Task instead, which wraps everything in a try {} catch {}.
* In the standard constructor of Task {...} (and Future { ... }), the work is not submitted immediately. It is submitted on a call to run or attemptRun.
* Also if you use this standard constructor, then every time you run a Task, the work will be repeated. This is not true of Scala's future; those will run exactly once.

Hopefully, once you choose a good abstraction, you won't have to think about this stuff ever again.

----------------
[1] You can also cancel a Java Future, if you care about that.
[2] If it's the global, it'll sneakily fork a ForkJoinTask.
[3] is in the "For Scala developers" bit above
[4] The behavior of the UncaughtExceptionHandler can be configured on Threads created in a ThreadFactory that you supply to an ExecutorService of your own construction that you then wrap in an ExecutionContext. And good luck figuring out anything more useful than printing them.

1 comment:

  1. This blog on Java abstraction was very beneficial to me. Thanks a lot Jessica. If you want to know about training courses in Java, you can visit this link.
    http://www.fireboxtraining.com/java

    ReplyDelete