Tuesday, April 22, 2014

Left to right, top to bottom

TL;DR - Clojure's threading macro keeps code in a legible order, and it's more extensible than methods.

When we create methods in classes, we like that we're grouping operations with related data. It's a useful organizational scheme. There's another reason to like methods: they put the code in an order that's easy to read. In the old days it might read top-to-bottom, with subject and then verb and then the objects of the verb:

With a fluent interface that supports immutability, methods still give us a pleasing left-to-right ordering:
Methods look great, but it's hard to add new ones. Maybe I sometimes want to add functionality for returns, or print a gift receipt. With functions, there is no limit to this. The secret is: methods are the same thing as functions, except with an extra secret parameter called this
For example, consider JavaScript. (full gist) A method there can be any old function, and it can use properties of this.
var completeSale = function(num) { console.log("Sale " + num + ": selling " 
+ this.items + " to " + this.customer); }

Give that value to an object property, and poof, the property is a method:

var sale = {
customer: "Fred",
items: ["carrot","eggs"],
complete: completeSale
}; sale.complete(99); // Sale 99: selling carrot,eggs to Fred

Or, call the function directly, and the first argument plays the role of "this":

completeSale.call(sale, 100)
// Sale 100: selling carrot,eggs to Fred

In Scala we can create methods or functions for any operation, and still organize them right along with the data. I can choose between a method in the class:

class Sale(...) {
   def complete(num: Int) {...}

or a function in the companion object:

object Sale {
   def complete(sale: Sale, num: Int) {...}

Here, the function in the companion object can even access private members of the class[1]. The latter style is more functional. I like writing functions instead of methods because (1) all input is explicit and (2) I can add more functions as needed, and only as needed, and without jumbling up the two styles. When I write functions about data, instead of attaching functions to data, I can import the functions I need and no more. Methods are always on a class, whether I like it or not.

There's a serious disadvantage to the function-with-explicit-parameter choice, though. Instead of a nice left-to-right reading style, we get:

It's all inside-out-looking! What happens first is in the middle, and the objects are separated from the verbs they serve. Blech! It sucks that function application reads inside-out, right-to-left. The code is hard to follow.

We want the output of addCustomer to go to addItems, and the output of addItems to go to complete. Can I do this in a readable order? I don't want to stuff all my functions into the class as methods.
In Scala, I wind up with this:

Here it reads top-down, and the arguments aren't spread out all over the place. But I still have to draw lines, mentally, between what goes where. And sometimes I screw it up.

Clojure has the ideal solution. It's called the threading macro. It has a terrible name, because there's no relation to threads, nothing asynchronous. Instead, it's about cramming the output of one function into the first argument of the next. If addCustomer, addItems, and complete are all functions which take a sale as the first parameter, the threading macro says, "Start with this. Cram it into first argument of the function call, and take that result and cram it into the first argument of the next function call, and so on." The result of the last operation comes out. (full gist
\\ Sale 99 : selling [carrot eggs] to Fred
This has a clear top-down ordering to it. It's subject, verb, object. It's a great substitute for methods. It's kinda like stitching the data in where it belongs, weaving the operations together. Maybe that's why it's called the threading macro. (I would have called it cramming instead.)

Clojure's prefix notation has a reputation for being harder to read, but this changes that. The threading macro pulls the subject out of the first function argument and puts it at the top, at the beginning of the sentence. I wish Scala had this!

In case you're still interested, here's a second example: list processing.

Methods in Scala look nice:

but they're not extensible. If these were functions I'd have:

which is hideous. So I wind up with:
That is easy to mess up; I have to get the intermediate variables right.
In Haskell it's function composition:

That reads backwards, right-to-left, but it does keep the objects with the verbs.

Notice that in Haskell the map, filter, reduce functions take the data as their last parameter.[2] This is also the case in Clojure, so we can use the second-parameter threading macro:

this one with the double greater-than signs has the cramming effect:

Once again, Clojure gives us a top-down, subject-verb-object form. See? the Lisp is perfectly readable, once you know which paths to twist your brain down.

[1] technical detail: the companion object can't see members that are private[this]
[2] technical detail: all functions in Haskell take one parameter; applying map to a predicate returns a function of one parameter that expects the list.

Thursday, March 20, 2014

Weakness and Vulnerability

Weakness and vulnerability are different. Separate the concerns: [1]

Vulnerability is an openness to being wounded.
Weakness is inability to live through wounds.

In D&D terms: vulnerability is a low armor class, weakness is low hit points. Armor class determines how hard it is for an enemy to hit you, and hit points determine how many hits you can take. So you have a choice: prevent hits, or endure more hits.

If you try to make your software perfect, so that it never experiences a failure, that's a high armor class. That's aiming for invulnerability.

Thing is, in D&D, no matter how high your armor class, if the enemy makes a perfect roll (a 20 on a d20, a twenty-sided die), that's a critical hit and it strikes you. Even if your software is bug-free, hardware goes down or misbehaves.

If you've spent all your energy on armor class and little on hit points, that single hit can kill you.

Embrace failure by letting go of ideal invulnerability, and think about recovery instead. I could implement signal handlers, and maintain them, and this is a huge pain and makes my code ugly. Or I could implement a separate cleanup mechanism for crashed processes. That's a separation of concerns, and it's more robust: signal handlers don't help when the app is out of memory, a separate recovery does.

In the software I currently work on, I take the strategy of building safety nets at the application, process, subsystem, and module levels, as feasible.[3] Then while I try to get my code right, I don't convolute my code looking for hardware and network failures, bad data and every error I can conceive. There are always going to be errors I don't conceive. Fail gracefully, and pick up the pieces.

An expanded version of this post, adding the human element, is on True in Software, True in Life.

[1] Someone tweeted a quote from some book on this, on the difference between weakness and vulnerability, a few weeks ago and it clicked with me. I can't find the tweet or the quote anymore. Anyone recognize this?
[3] The actor model (Akka in my case) helps with recovery. It implements "Have you restarted your computer?" at the small scale.

Wednesday, March 19, 2014

Testing in Akka: sneaky automatic restarts

Restarts are awesome when stuff fails and you want it to work. Akka does this by default for every actor, and that's great in production. In testing, we're looking for failure. We want to see it, not hide it. In testing, it's sneaky of Akka to restart our whole actor hierarchy when it barfs.

Change this sneaky behavior in actor system configuration by altering the supervisor strategy of the guardian actor. The guardian is the parent of any actor created with system.actorOf(...), so its supervision strategy determines what happens when your whole hierarchy goes down.

Then, in your test's assertions, make sure your actor hasn't been terminated. If the guardian has a StoppingSupervisionStrategy and your actor goes down, it'll stay down.

Code example in this gist.

Tuesday, March 4, 2014

Modularity in Scala: Isolation of dependencies

Today at work, I said, "I wish I could express the right level of encapsulation here." Oh, but this is Scala! Many things are possible!

We have a class, an akka Actor, whose job is to keep an eye on things. Let's pretend its job is to clean up every so often: sweep the corners, wash the dishes, and wipe the TV screen. At construction, it receives all the tools needed to do these things.

class CleanerUpper(broomBroom
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
  def work(...) {
    broom.sweep(config, corners)
    rag.wipe(config, dishes) 

Today, we added reporting to the sweeping functionality. This made the sweeping part complicated enough to break out into its own class. At construction of the Sweeper, we provide everything that remains constant (from config) and the tools it needs (the broom). When it's time to sweep, we pass in the parts that vary each time (the corners).[1]

class CleanerUpper(broom: Broom
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
  val sweeper = new Sweeper(configbroom)
  def work(...) {
    rag.wipe(config, dishes) 

Looking at this, I don't like that broom is still available everywhere in the CleanerUpper. With the refactor, all broom-related functionality belongs in the Sweeper. The Broom constructor parameter serves only to construct the dependency. Yet, nothing stops me (or someone else) from adding a call directly to broom anywhere in CleanerUpper. Can I change this?

One option for is to construct the Sweeper outside and pass it in, in place of the Broom. Then construction would look like

new CleanerUpper(new Sweeper(configbroom), rag, wiper, config)

I don't like this because no one outside of CleanerUpper should have to know about the submodules that CleanerUpper uses. I want to keep this internal refactor from having so much impact on callers.

More importantly, I want to express "A Broom is needed to initialize dependencies of CleanerUpper. After that it is not available."

The solution we picked separates construction of dependencies from the class's functionality definition. I made the class abstract, with an uninitialized Sweeper field. The Broom is gone.

abstract class CleanerUpper
                   rag: Dishcloth, 
                   wiper: MicrofiberTowel, 
                   config: CleanConfig) ... {
  val sweeper: Sweeper
  def work(...) {
    rag.wipe(config, dishes) 

Construction happens in the companion object. Its apply method accepts the same arguments as the original constructor -- the same objects a caller is required to provide. Here, a Sweeper is initialized.

object CleanerUpper {
  def apply(broom: Broom
            rag: Dishcloth,
            wiper: MicrofiberTowel, 
            config: CleanConfig): CleanerUpper = 
    new CleanerUpper(rag, wiper, config) {
      val sweeper = new Sweeper(config, broom)

The only change to construction is use of the companion object instead of explicitly new-ing one up. Next time I make a similar refactor, it'll require no changes to external construction.

val cleaner = CleanerUpper(broom, rag, wiper, config)

I like this solution because it makes the dependency on submodule Sweeper explicit in CleanerUpper. Also, construction of that dependency is explicit.

There are several other ways to accomplish encapsulation of the broom within the sweeper. Scala offers all kinds of ways to modularize and break apart the code -- that's one of the fascinating things about the language. Modularity and organization are two of the biggest challenges in programming, and Scala offers many paths for exploring these.

[1] This example is silly. It is not my favorite kind of example, but all the realistic ones I came up with were higher-cognitive-load.

Wednesday, February 19, 2014

When OO and FP meet: returning the same type

In the left corner, we have Functional Programming. FP says, "Classes shall be immutable!"

In the right corner, we have Object-Oriented programming. It says, "Classes shall be extendable!"

The battlefield: define a method on the abstract class such that, when you call it, you get the same concrete class back. In Scala.

Here comes the sample problem --

We have some insurance policies, auto policies and home policies. On any policy, you can adjust by a discount and receive a policy of the same type. Here is the test:

case class Discount(name: String)

  def test() {
    def adjust[P <: Policy](d: Discount, p: P): P = p.adjustBy(d)
    val p = new AutoPolicy
    val d = Discount("good driver")
    val adjustedP: AutoPolicy = adjust(d, p)
    println("if that compiles, we're good")

OO says, no problem!

abstract class Policy {
   protected def changeCost(d: Discount)

   def adjustBy(d: Discount) : this.type = {
       return this

class AutoPolicy extends Policy {
  protected def changeCost(d: Discount) { /* MUTATE */ }

FP punches OO in the head and says, "Mutation is not allowed! We must return a new version of the policy and leave the old one be!"[1] The easiest way is to move the adjust method into an ordinary function, with a type parameter:

object Policy {
   def adjust[<: Policy](p: P, d: Discount): P = {
     case ap: AutoPolicy => new AutoPolicy
     ... all the other cases for all other policies ...

But no no no, we'd have to change this code (and every method like it) every time we add a Policy subclass. This puts us on the wrong side of the Expression Problem.[2]

If we step back from this fight, we can find a better way. Where we declare adjustBy, we have access to two types: the superclass (Policy) and this.type, which is the special-snowflake type of that particular instance. The type we're trying to return is somewhere in between:

How can we specify this intermediate type? It seems obvious to us as humans. "It's the class that extends Policy!" but an instance of AutoPolicy has any number of types -- it could include lots of traits. Somewhere we need to specify "This is the type it makes sense to return," and then in Policy say "adjustBy returns the type that makes sense." Abstract types do this cleanly:

abstract class Policy {
  type Self <: Policy
   protected def changeCost(d: Discount): Self

   def adjustBy(d: Discount) : Self = {

class AutoPolicy extends Policy {
  type Self = AutoPolicy
  protected def changeCost(d: Discount) = 
    { /* copy self */ new AutoPolicy }

I like this because it expresses cleanly "There will be a type, a subclass of this one, that methods can return."
There's one problem:

error: type mismatch;
 found   : p.Self
 required: P
           def adjust[P <: Policy](d: Discount, p:P):P = p.adjustBy(d)

The adjust method doesn't return P; it returns the inner type P#Self. You and I know that's the same as P, but the compiler doesn't. OO punches FP in the head!

Wheeeet! The Scala compiler steps in as the referee. Scala offers us a way to say to the compiler, "P#Self is the same as P." Check this version out:

def adjust[P <: Policy](d: Discount, p: P)
               (implicit ev: P#Self =:= P): P = p.adjustBy(d)

This says, "Look Scala, these two things are the same, see?" And Scala says, "Oh you're right, they are." The compiler comes up with the implicit value by itself.
The cool part is, if we define a new Policy poorly, we get a compile error:
class BadPolicy extends Policy {
  type Self = AutoPolicy
  protected def changeCost(d: Discount) = { new AutoPolicy }
adjust(d, new BadPolicy)
error: Cannot prove that FunctionalVersion.BadPolicy#Self =:= FunctionalVersion.BadPolicy.
           adjust(d, new BadPolicy)

Yeah, bad Policy, bad.

This method isn't quite ideal, but it's close. The positive is: the abstract type is expressive of the intent. The negative is: any function that wants to work polymorphically with Policy subtypes must require the implicit evidence. If you don't like this, there's an alternative using type parameters, called F-bounded polymorphism. It's not quite as ugly as that sounds.

Scala is a language of many options. Something as tricky as combining OO and FP certainly demands it. See the footnotes for further discussion on this particular game.

The referee declares that FP can have its immutability, OO can have its extension. A few function declarations suffer, but only a little.

[1] FP prefers to simply return a Policy from adjustBy; all instances of an ADT have the same interface, so why not return the supertype? But we're not playing the Algebraic Data Type game. OO insists that AutoPolicy has additional methods (like penalizeForTicket) that we might call after adjustBy. The game is to combine immutability with extendible superclasses, and Scala is playing this game.
[2] The solution to the expression problem here -- if we want to be able to add both functions and new subclasses -- is typeclasses. I was totally gonna go there, until I found this solution. For the case where we don't plan to add functions, only subclasses, abstract types are easier.

More references:
F-bounded type polymorphism ("Give Up Now")
MyType problem
Abstract self types

Wednesday, February 5, 2014

Property-based testing of higher-order functions

Property-based testing and functional programming are friends, because they're both motivated by Reasoning About Code. Functional programming is about keeping your functions data-in, data-out so that you can reason about them. Property-based testing is about expressing the conclusions of that reasoning as properties, then showing that they're (probably) true by testing them with hundreds of different input values.

Person: "By my powers of reasoning, I can tell that this code will always produce an output collection longer than the input collection."

Test: "For any generated collection listypoo, this function will return a collection of size > listypoo.size."

Testing Framework: "I will generate 100 different collections, throw them at that test, and try to make it fail."

Property-based testing gets tricky when I try to combine it with another aspect of functional programming: higher-order functions. If the function under test accepts a function as a parameter, how do I generate that? If it returns a function, how do I validate it's the right one?

In the trivial case, we pass in the identity function or a constant function, and base our expectations on that. But that violates the spirit of generative testing.

Test operate on values. Values go in, values go out and we can check those. Functions are tested by operating on values. So to test functions that operate on functions, we need to generate functions that operate on values, and generate values to test the functions that came out of the function under test. Then we'll need to state properties in terms of values going in and out of functions going in and out... ack! It's Higher-Order-Property-Based Testing.
<self: insert drawing>

My sources on twitter tell me that QuickCheck for Erlang and Haskell have generators for functions. Looking at this sample of the Haskell one, I think this is what it does:

Each generated function from A => B includes a random factor that's constant to the function, and a generator for B that can create Bs based on a seed. When an A is passed in, it's combined with the function's random factor to prod the generator into creating a deterministically-random B.

So, we can create random functions A=>B if we have a generator of B, a random-factor-generator, and some mapping from (A, random-factor) to the seed for the generator of B. That doesn't sound so bad.
<self: this could use a drawing too>

There are two other functionalities in good generators. When a test fails, we need enough information to figure out why and fix the bug. One tool for this is shrinking: make the generated input simpler and simpler until the test doesn't fail anymore, and then report the simplest failing case. I haven't even started thinking about that in function-generation. (Does Haskell or Erlang do it?)
The second tool for fixing failed tests is simpler: print out the input that revealed the bug. That's also hard with generated functions. Since the generated functions described above are random mappings from input to output, printing the actual input-output mapping used in the failing property evaluation should be sufficient. This is implemented in Haskell, with no state, so it must be possible in Scala and Erlang. And it's in F#.

Like everything in property-based testing, generic input generation is the easy part (even when it isn't easy). Describing properties expected of output functions based on input functions - now that sounds hard. When I write a test that does it, that'll be worth another post.

Thanks to @reiddraper, @ericnormand, @silentbicycle, @natpryce, @deech, @kot_2010 and everyone who chimed in. Please continue.

Monday, February 3, 2014

Abstractions over Threads in Java and Scala

TL;DR In Java, get a library that makes Futures work like Scala's, and then never use ExecutorService directly.

In the beginning, there were Threads. And Java threads were nice and simple. That is, Java threads are simple like some assembly languages are simple: there's only a few things you can do.

Since then, Java and then Scala have created higher-level abstractions. These are what you want to use. This post explains the differences between them, and what happens when exceptions are thrown.

Java's Executor, introduced in Java 5, implements thread pooling for you. The only method on an Executor is execute(Runnable). That's simple too! Give it something to do, and it eventually does it. If an exception trickles up, it goes to the thread's UncaughtExceptionHandler, which typically prints the stack trace to System.err.

All the implementations provided in Executors also implement ExecutorService, a more extensive interface. Pass the submit() method a Callable or a Runnable, and get back a java.util.concurrent.Future. Please note that Java's Future is limited. You can't ask it to do anything on completion or failure. You can pretty much only call get(), which blocks until your task is complete, then returns its result or throws its exception.[1]

If you submitted a task for its side effects, and you never call get() on the Java Future, then no one will ever know about any Exception it throws. It never makes it to the Thread's UncaughtExceptionHandler, and it never gets output. To get an ExecutorService that never hides exceptions, extend ThreadPoolExecutor, override afterExecute and guarantee that get() is called. What a pain!

Now I'll switch over to Scala-land, because it has something to tell us about Java Futures.

Scala's ExecutionContext interface (trait) extends Executor, providing that execute() method. You'll probably never use this directly, but you'll need one to work with Scala Futures. There are two good ways to get it. First, use the default ExecutionContext.global; it's good. Second, if you want your own thread pool, the factory ExecutionContext.fromExecutorService creates a thin wrapper that delegates to your carefully chosen ExecutorService.

To start an asynchronous task in Scala, call

val doStuff = Future { /* do stuff */ } (executionContext)

This will execute the stuff on that executionContext[2], and give you a reference that's useful.

When you make a Java Future by submitting on an ExecutorService, you have to pass in the whole sequence of code that you want executed. All the error handling has to be there. When you want to do something after that asynchronous code completes, there's nothing to do but block until it completes.

Scala Futures remove that restriction. You can start something going, then add error handling by calling onFailure.[3] You can extend the asynchronous work with onSuccess. You can even say, "after these three Futures complete, then do this other thing will all three results." This lets you separate deciding what needs to happen in what order from defining each thing that needs to happen. Yay separation of concerns! I like how this style of programming lets me code the interesting bits first and then fill in the rest.

All these Future-extending and Future-combining services create asynchronous computations of their own, and want an ExecutionContext. This does not have to be the same one the Future is running on. Once a Future is constructed, it does not remember the ExecutionContext.

A task tacked on to another Future will automatically run when it can. Failures will be handled, successes will proceed. This means you aren't required to ask a Scala Future for its result. It's possible to do so (and I often do in test code), but discouraged. If you want to do something with the value, use onSuccess. You never have to block a thread!

We can work this way in Java too. In Java 8 there's native support. Earlier, we can use alternative futures provided in libraries such as Guava. Use this to define asynchronous tasks in smaller, more flexible bits.

This culminates a series of posts on choosing the right ExecutorService. See also Pool-Induced DeadlockForkJoinPool, Scala's global ExecutionContext, and

For Scala developers:
[3] I said that Scala futures let you handle errors with onFailure. This isn't true for what Scala considers Fatal errors; these remain uncaught. They propagate to the UncaughtExceptionHandler, which prints to stdout, and that's it. The thread dies. Your onComplete, onFailure, onSuccess methods, they're never called. Silent death. If you Await its result, the Await will timeout. Very bad! In the Scala source as of this writing, this happens only for very serious errors: VirtualMachineError, ThreadDeath, InterruptedException, LinkageError, ControlThrowable. However, in Scala 2.10.x, NotImplementedError is "fatal". When I left a method as ???, the thread disappeared and my program hung. That took forever to debug.

One alternative is to use scalaz.

The scalaz library provides its own Future. The scalaz.concurrent.Future wants an ExecutorService. (This means you can't use the global ExecutionContext.) Some important differences:
* scalaz defaults the implicit ExecutorService parameter to one with a FixedThreadPool. Because you aren't required to supply one at all, you don't always realize you're using that default.
* Because scalaz calls submit() on the ExecutorService, uncaught exceptions do not hit the UncaughtExceptionHandler and are never printed. Do not use scalaz's Future directly: use Task instead, which wraps everything in a try {} catch {}.
* In the standard constructor of Task {...} (and Future { ... }), the work is not submitted immediately. It is submitted on a call to run or attemptRun.
* Also if you use this standard constructor, then every time you run a Task, the work will be repeated. This is not true of Scala's future; those will run exactly once.

Hopefully, once you choose a good abstraction, you won't have to think about this stuff ever again.

[1] You can also cancel a Java Future, if you care about that.
[2] If it's the global, it'll sneakily fork a ForkJoinTask.
[3] is in the "For Scala developers" bit above
[4] The behavior of the UncaughtExceptionHandler can be configured on Threads created in a ThreadFactory that you supply to an ExecutorService of your own construction that you then wrap in an ExecutionContext. And good luck figuring out anything more useful than printing them.