Thursday, March 20, 2014

Weakness and Vulnerability

Weakness and vulnerability are different. Separate the concerns: [1]

Vulnerability is an openness to being wounded.
Weakness is inability to live through wounds.

In D&D terms: vulnerability is a low armor class, weakness is low hit points. Armor class determines how hard it is for an enemy to hit you, and hit points determine how many hits you can take. So you have a choice: prevent hits, or endure more hits.

If you try to make your software perfect, so that it never experiences a failure, that's a high armor class. That's aiming for invulnerability.

Thing is, in D&D, no matter how high your armor class, if the enemy makes a perfect roll (a 20 on a d20, a twenty-sided die), that's a critical hit and it strikes you. Even if your software is bug-free, hardware goes down or misbehaves.

If you've spent all your energy on armor class and little on hit points, that single hit can kill you.

Embrace failure by letting go of ideal invulnerability, and think about recovery instead. I could implement signal handlers, and maintain them, and this is a huge pain and makes my code ugly. Or I could implement a separate cleanup mechanism for crashed processes. That's a separation of concerns, and it's more robust: signal handlers don't help when the app is out of memory, a separate recovery does.

In the software I currently work on, I take the strategy of building safety nets at the application, process, subsystem, and module levels, as feasible.[3] Then while I try to get my code right, I don't convolute my code looking for hardware and network failures, bad data and every error I can conceive. There are always going to be errors I don't conceive. Fail gracefully, and pick up the pieces.

-----
An expanded version of this post, adding the human element, is on True in Software, True in Life.

-----
[1] Someone tweeted a quote from some book on this, on the difference between weakness and vulnerability, a few weeks ago and it clicked with me. I can't find the tweet or the quote anymore. Anyone recognize this?
[3] The actor model (Akka in my case) helps with recovery. It implements "Have you restarted your computer?" at the small scale.

Wednesday, March 19, 2014

Testing in Akka: sneaky automatic restarts

Restarts are awesome when stuff fails and you want it to work. Akka does this by default for every actor, and that's great in production. In testing, we're looking for failure. We want to see it, not hide it. In testing, it's sneaky of Akka to restart our whole actor hierarchy when it barfs.

Change this sneaky behavior in actor system configuration by altering the supervisor strategy of the guardian actor. The guardian is the parent of any actor created with system.actorOf(...), so its supervision strategy determines what happens when your whole hierarchy goes down.

Then, in your test's assertions, make sure your actor hasn't been terminated. If the guardian has a StoppingSupervisionStrategy and your actor goes down, it'll stay down.

Code example in this gist.

Tuesday, March 4, 2014

Modularity in Scala: Isolation of dependencies

Today at work, I said, "I wish I could express the right level of encapsulation here." Oh, but this is Scala! Many things are possible!

We have a class, an akka Actor, whose job is to keep an eye on things. Let's pretend its job is to clean up every so often: sweep the corners, wash the dishes, and wipe the TV screen. At construction, it receives all the tools needed to do these things.

class CleanerUpper(broomBroom
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
...
  def work(...) {
    broom.sweep(config, corners)
    rag.wipe(config, dishes) 
    wiper.clear(tv)
  }
}

Today, we added reporting to the sweeping functionality. This made the sweeping part complicated enough to break out into its own class. At construction of the Sweeper, we provide everything that remains constant (from config) and the tools it needs (the broom). When it's time to sweep, we pass in the parts that vary each time (the corners).[1]

class CleanerUpper(broom: Broom
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
  val sweeper = new Sweeper(configbroom)
...
  def work(...) {
    sweeper.sweep(corners)
    rag.wipe(config, dishes) 
    wiper.clear(tv)
  }
}

Looking at this, I don't like that broom is still available everywhere in the CleanerUpper. With the refactor, all broom-related functionality belongs in the Sweeper. The Broom constructor parameter serves only to construct the dependency. Yet, nothing stops me (or someone else) from adding a call directly to broom anywhere in CleanerUpper. Can I change this?

One option for is to construct the Sweeper outside and pass it in, in place of the Broom. Then construction would look like

new CleanerUpper(new Sweeper(configbroom), rag, wiper, config)

I don't like this because no one outside of CleanerUpper should have to know about the submodules that CleanerUpper uses. I want to keep this internal refactor from having so much impact on callers.

More importantly, I want to express "A Broom is needed to initialize dependencies of CleanerUpper. After that it is not available."

The solution we picked separates construction of dependencies from the class's functionality definition. I made the class abstract, with an uninitialized Sweeper field. The Broom is gone.

abstract class CleanerUpper
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
  val sweeper: Sweeper
...
  def work(...) {
    sweeper.sweep(corners)
    rag.wipe(config, dishes) 
    wiper.clear(tv)
  }
}

Construction happens in the companion object. Its apply method accepts the same arguments as the original constructor -- the same objects a caller is required to provide. Here, a Sweeper is initialized.

object CleanerUpper {
  def apply(broom: Broom
            rag: Dishcloth,
            wiper: MicrofiberTowel, 
            config: CleanConfig): CleanerUpper = 
    new CleanerUpper(rag, wiper, config) {
      val sweeper = new Sweeper(config, broom)
    }
}

The only change to construction is use of the companion object instead of explicitly new-ing one up. Next time I make a similar refactor, it'll require no changes to external construction.

val cleaner = CleanerUpper(broom, rag, wiper, config)

I like this solution because it makes the dependency on submodule Sweeper explicit in CleanerUpper. Also, construction of that dependency is explicit.

There are several other ways to accomplish encapsulation of the broom within the sweeper. Scala offers all kinds of ways to modularize and break apart the code -- that's one of the fascinating things about the language. Modularity and organization are two of the biggest challenges in programming, and Scala offers many paths for exploring these.

-------------
[1] This example is silly. It is not my favorite kind of example, but all the realistic ones I came up with were higher-cognitive-load.

Wednesday, February 19, 2014

When OO and FP meet: returning the same type

In the left corner, we have Functional Programming. FP says, "Classes shall be immutable!"

In the right corner, we have Object-Oriented programming. It says, "Classes shall be extendable!"

The battlefield: define a method on the abstract class such that, when you call it, you get the same concrete class back. In Scala.
Fight!

Here comes the sample problem --

We have some insurance policies, auto policies and home policies. On any policy, you can adjust by a discount and receive a policy of the same type. Here is the test:

case class Discount(name: String)

  def test() {
    def adjust[P <: Policy](d: Discount, p: P): P = p.adjustBy(d)
    val p = new AutoPolicy
    val d = Discount("good driver")
    val adjustedP: AutoPolicy = adjust(d, p)
    println("if that compiles, we're good")
  }

OO says, no problem!

abstract class Policy {
   protected def changeCost(d: Discount)

   def adjustBy(d: Discount) : this.type = {
       changeCost(d:Discount)
       return this
   }
}

class AutoPolicy extends Policy {
  protected def changeCost(d: Discount) { /* MUTATE */ }
}

FP punches OO in the head and says, "Mutation is not allowed! We must return a new version of the policy and leave the old one be!"[1] The easiest way is to move the adjust method into an ordinary function, with a type parameter:

object Policy {
   def adjust[<: Policy](p: P, d: Discount): P = {
     case ap: AutoPolicy => new AutoPolicy
     ... all the other cases for all other policies ...
   }
}

But no no no, we'd have to change this code (and every method like it) every time we add a Policy subclass. This puts us on the wrong side of the Expression Problem.[2]

If we step back from this fight, we can find a better way. Where we declare adjustBy, we have access to two types: the superclass (Policy) and this.type, which is the special-snowflake type of that particular instance. The type we're trying to return is somewhere in between:

How can we specify this intermediate type? It seems obvious to us as humans. "It's the class that extends Policy!" but an instance of AutoPolicy has any number of types -- it could include lots of traits. Somewhere we need to specify "This is the type it makes sense to return," and then in Policy say "adjustBy returns the type that makes sense." Abstract types do this cleanly:

abstract class Policy {
  type Self <: Policy
   protected def changeCost(d: Discount): Self

   def adjustBy(d: Discount) : Self = {
       changeCost(d:Discount)
   }
}

class AutoPolicy extends Policy {
  type Self = AutoPolicy
  protected def changeCost(d: Discount) = 
    { /* copy self */ new AutoPolicy }
}

I like this because it expresses cleanly "There will be a type, a subclass of this one, that methods can return."
There's one problem:

error: type mismatch;
 found   : p.Self
 required: P
           def adjust[P <: Policy](d: Discount, p:P):P = p.adjustBy(d)

The adjust method doesn't return P; it returns the inner type P#Self. You and I know that's the same as P, but the compiler doesn't. OO punches FP in the head!

Wheeeet! The Scala compiler steps in as the referee. Scala offers us a way to say to the compiler, "P#Self is the same as P." Check this version out:

def adjust[P <: Policy](d: Discount, p: P)
               (implicit ev: P#Self =:= P): P = p.adjustBy(d)

This says, "Look Scala, these two things are the same, see?" And Scala says, "Oh you're right, they are." The compiler comes up with the implicit value by itself.
The cool part is, if we define a new Policy poorly, we get a compile error:
class BadPolicy extends Policy {
  type Self = AutoPolicy
  protected def changeCost(d: Discount) = { new AutoPolicy }
}
adjust(d, new BadPolicy)
error: Cannot prove that FunctionalVersion.BadPolicy#Self =:= FunctionalVersion.BadPolicy.
           adjust(d, new BadPolicy)

Yeah, bad Policy, bad.

This method isn't quite ideal, but it's close. The positive is: the abstract type is expressive of the intent. The negative is: any function that wants to work polymorphically with Policy subtypes must require the implicit evidence. If you don't like this, there's an alternative using type parameters, called F-bounded polymorphism. It's not quite as ugly as that sounds.

Scala is a language of many options. Something as tricky as combining OO and FP certainly demands it. See the footnotes for further discussion on this particular game.

The referee declares that FP can have its immutability, OO can have its extension. A few function declarations suffer, but only a little.

--------------
[1] FP prefers to simply return a Policy from adjustBy; all instances of an ADT have the same interface, so why not return the supertype? But we're not playing the Algebraic Data Type game. OO insists that AutoPolicy has additional methods (like penalizeForTicket) that we might call after adjustBy. The game is to combine immutability with extendible superclasses, and Scala is playing this game.
[2] The solution to the expression problem here -- if we want to be able to add both functions and new subclasses -- is typeclasses. I was totally gonna go there, until I found this solution. For the case where we don't plan to add functions, only subclasses, abstract types are easier.

More references:
F-bounded type polymorphism ("Give Up Now")
MyType problem
Abstract self types

Wednesday, February 5, 2014

Property-based testing of higher-order functions

Property-based testing and functional programming are friends, because they're both motivated by Reasoning About Code. Functional programming is about keeping your functions data-in, data-out so that you can reason about them. Property-based testing is about expressing the conclusions of that reasoning as properties, then showing that they're (probably) true by testing them with hundreds of different input values.

example:
Person: "By my powers of reasoning, I can tell that this code will always produce an output collection longer than the input collection."

Test: "For any generated collection listypoo, this function will return a collection of size > listypoo.size."

Testing Framework: "I will generate 100 different collections, throw them at that test, and try to make it fail."

Property-based testing gets tricky when I try to combine it with another aspect of functional programming: higher-order functions. If the function under test accepts a function as a parameter, how do I generate that? If it returns a function, how do I validate it's the right one?

In the trivial case, we pass in the identity function or a constant function, and base our expectations on that. But that violates the spirit of generative testing.

Test operate on values. Values go in, values go out and we can check those. Functions are tested by operating on values. So to test functions that operate on functions, we need to generate functions that operate on values, and generate values to test the functions that came out of the function under test. Then we'll need to state properties in terms of values going in and out of functions going in and out... ack! It's Higher-Order-Property-Based Testing.
<self: insert drawing>

My sources on twitter tell me that QuickCheck for Erlang and Haskell have generators for functions. Looking at this sample of the Haskell one, I think this is what it does:

Each generated function from A => B includes a random factor that's constant to the function, and a generator for B that can create Bs based on a seed. When an A is passed in, it's combined with the function's random factor to prod the generator into creating a deterministically-random B.

So, we can create random functions A=>B if we have a generator of B, a random-factor-generator, and some mapping from (A, random-factor) to the seed for the generator of B. That doesn't sound so bad.
<self: this could use a drawing too>

There are two other functionalities in good generators. When a test fails, we need enough information to figure out why and fix the bug. One tool for this is shrinking: make the generated input simpler and simpler until the test doesn't fail anymore, and then report the simplest failing case. I haven't even started thinking about that in function-generation. (Does Haskell or Erlang do it?)
The second tool for fixing failed tests is simpler: print out the input that revealed the bug. That's also hard with generated functions. Since the generated functions described above are random mappings from input to output, printing the actual input-output mapping used in the failing property evaluation should be sufficient. This is implemented in Haskell, with no state, so it must be possible in Scala and Erlang. And it's in F#.

Like everything in property-based testing, generic input generation is the easy part (even when it isn't easy). Describing properties expected of output functions based on input functions - now that sounds hard. When I write a test that does it, that'll be worth another post.

Thanks to @reiddraper, @ericnormand, @silentbicycle, @natpryce, @deech, @kot_2010 and everyone who chimed in. Please continue.

Monday, February 3, 2014

Abstractions over Threads in Java and Scala

TL;DR In Java, get a library that makes Futures work like Scala's, and then never use ExecutorService directly.

In the beginning, there were Threads. And Java threads were nice and simple. That is, Java threads are simple like some assembly languages are simple: there's only a few things you can do.

Since then, Java and then Scala have created higher-level abstractions. These are what you want to use. This post explains the differences between them, and what happens when exceptions are thrown.

Java's Executor, introduced in Java 5, implements thread pooling for you. The only method on an Executor is execute(Runnable). That's simple too! Give it something to do, and it eventually does it. If an exception trickles up, it goes to the thread's UncaughtExceptionHandler, which typically prints the stack trace to System.err.

All the implementations provided in Executors also implement ExecutorService, a more extensive interface. Pass the submit() method a Callable or a Runnable, and get back a java.util.concurrent.Future. Please note that Java's Future is limited. You can't ask it to do anything on completion or failure. You can pretty much only call get(), which blocks until your task is complete, then returns its result or throws its exception.[1]

If you submitted a task for its side effects, and you never call get() on the Java Future, then no one will ever know about any Exception it throws. It never makes it to the Thread's UncaughtExceptionHandler, and it never gets output. To get an ExecutorService that never hides exceptions, extend ThreadPoolExecutor, override afterExecute and guarantee that get() is called. What a pain!

Now I'll switch over to Scala-land, because it has something to tell us about Java Futures.

Scala's ExecutionContext interface (trait) extends Executor, providing that execute() method. You'll probably never use this directly, but you'll need one to work with Scala Futures. There are two good ways to get it. First, use the default ExecutionContext.global; it's good. Second, if you want your own thread pool, the factory ExecutionContext.fromExecutorService creates a thin wrapper that delegates to your carefully chosen ExecutorService.

To start an asynchronous task in Scala, call

val doStuff = Future { /* do stuff */ } (executionContext)

This will execute the stuff on that executionContext[2], and give you a reference that's useful.

When you make a Java Future by submitting on an ExecutorService, you have to pass in the whole sequence of code that you want executed. All the error handling has to be there. When you want to do something after that asynchronous code completes, there's nothing to do but block until it completes.

Scala Futures remove that restriction. You can start something going, then add error handling by calling onFailure.[3] You can extend the asynchronous work with onSuccess. You can even say, "after these three Futures complete, then do this other thing will all three results." This lets you separate deciding what needs to happen in what order from defining each thing that needs to happen. Yay separation of concerns! I like how this style of programming lets me code the interesting bits first and then fill in the rest.

All these Future-extending and Future-combining services create asynchronous computations of their own, and want an ExecutionContext. This does not have to be the same one the Future is running on. Once a Future is constructed, it does not remember the ExecutionContext.

A task tacked on to another Future will automatically run when it can. Failures will be handled, successes will proceed. This means you aren't required to ask a Scala Future for its result. It's possible to do so (and I often do in test code), but discouraged. If you want to do something with the value, use onSuccess. You never have to block a thread!

We can work this way in Java too. In Java 8 there's native support. Earlier, we can use alternative futures provided in libraries such as Guava. Use this to define asynchronous tasks in smaller, more flexible bits.

This culminates a series of posts on choosing the right ExecutorService. See also Pool-Induced DeadlockForkJoinPool, Scala's global ExecutionContext, and

For Scala developers:
[3] I said that Scala futures let you handle errors with onFailure. This isn't true for what Scala considers Fatal errors; these remain uncaught. They propagate to the UncaughtExceptionHandler, which prints to stdout, and that's it. The thread dies. Your onComplete, onFailure, onSuccess methods, they're never called. Silent death. If you Await its result, the Await will timeout. Very bad! In the Scala source as of this writing, this happens only for very serious errors: VirtualMachineError, ThreadDeath, InterruptedException, LinkageError, ControlThrowable. However, in Scala 2.10.x, NotImplementedError is "fatal". When I left a method as ???, the thread disappeared and my program hung. That took forever to debug.

One alternative is to use scalaz.

The scalaz library provides its own Future. The scalaz.concurrent.Future wants an ExecutorService. (This means you can't use the global ExecutionContext.) Some important differences:
* scalaz defaults the implicit ExecutorService parameter to one with a FixedThreadPool. Because you aren't required to supply one at all, you don't always realize you're using that default.
* Because scalaz calls submit() on the ExecutorService, uncaught exceptions do not hit the UncaughtExceptionHandler and are never printed. Do not use scalaz's Future directly: use Task instead, which wraps everything in a try {} catch {}.
* In the standard constructor of Task {...} (and Future { ... }), the work is not submitted immediately. It is submitted on a call to run or attemptRun.
* Also if you use this standard constructor, then every time you run a Task, the work will be repeated. This is not true of Scala's future; those will run exactly once.

Hopefully, once you choose a good abstraction, you won't have to think about this stuff ever again.

----------------
[1] You can also cancel a Java Future, if you care about that.
[2] If it's the global, it'll sneakily fork a ForkJoinTask.
[3] is in the "For Scala developers" bit above
[4] The behavior of the UncaughtExceptionHandler can be configured on Threads created in a ThreadFactory that you supply to an ExecutorService of your own construction that you then wrap in an ExecutionContext. And good luck figuring out anything more useful than printing them.

ForkJoinPool: the Other ExecutorService

In Java, an ExecutorService manages a pool of threads that can run tasks. Most ExecutorServices treat all tasks the same. Somebody hands it something to do, the ExecutorService parcels it out to a thread, the thread runs it. Next!

A ForkJoinPool is an ExecutorService that recognizes explicit dependencies between tasks. It is designed for the kind of computation that wants to run in parallel, and then maybe more parallel, and then some of those can run in parallel too. Results of the parallel computations are then combined, so it's like the threads want to split off and then regroup.


Maybe it's a computation like, "What's the shortest path of followers from me to @richhickey?" or "What is the total size of all files in all directories?" or "What's the most relevant ad to display to this customer?" where we don't know what all we're going to have to execute until we're in the middle of executing it.

On an ordinary ExecutorService, when we split a computation up, each task goes its separate way. Each one is allocated to a thread individually. This becomes a problem when the tasks are small, and the overhead of allocating them to threads takes longer than running them. It becomes a bigger problem when threads split off tasks and wait for all the results to come back to combine them: pretty soon so many threads are waiting that there are no more threads to do the work. This can reach deadlock.

ForkJoinPool embraces many small computations that spawn off and then come back together. It says, "When my thread wants to split its work into many small computations, it shall create them, and then start working on them. If another thread wants to come along and help, great."
A computation in a ForkJoinPool is like a mother who told all her children to clean the house. While she's waiting for them to finish their level on the video game she starts picking up. Eventually some kids get up and start helping. When Evelyn starts sweeping and isn't done by the time Linda has finished the bathroom, then Linda picks up a broom and helps. Eventually the mother takes stock and says, "Hurray! The house is clean."

That's a completely unrealistic scenario in my household, but ForkJoinPools are more disciplined than my children. They support unpredictable parallel computation, preventing pool-induced deadlock, and minimize the work of switching back and forth between threads on the CPU.

What's not to love? Well, a ForkJoinPool is harder to use than a regular old ExecutorService. It's more complicated than calling "submit." External threads submit jobs to a ForkJoinPool in an ordinary way, but within the pool, tasks are created differently. ForkJoinTask subclasses get constructed, forked off for execution, and then joined. It's custom handling, and that requires planning ahead, and that means you have to guess that ForkJoinPool is the solution before you start coding. Or retrofit it later.

Scala does something clever to hide the difference between ForkJoinPools and regular ExecutorServices, so that its Futures work this way by default. Akka uses ForkJoinPools behind the scenes for its actor messaging. Clojure uses ForkJoinPools in its collection processing with Reducers. In Scala and Clojure, you can get these optimizations without extra headache. The abstractions, they keep getting deeper!

-------------
Doug Lea wrote ForkJoin for Java and Scala. http://gee.cs.oswego.edu/dl/papers/fj.pdf