Thursday, March 22, 2012

Bring the data to the code, or the code to the data?

Object-oriented code was conceived as message-passing between objects. Service-oriented architecture emphasizes delegation to another system. The entire web is a whole bunch of requests flying around. There is one clear way to be efficient about this: stop waiting for results.

When we're writing imperative code, we want to write the operations in the order they should happen. This is straightforward and makes sense to our brains. Pseudocode:
let filename = // calculate filename
let data = new File(file).readAllLines();
// filter the data
// summarize the data
// reformat the output
let status = new File("output").writeAllLines(newData);
println("done!");
sendEmail("done! status = " + status);
println("email sent");
This describes the order in which operations need to happen. The problem is, it is not efficient. We're holding up a thread waiting for I/O. Take the part where we read lines from the file, for instance -- we can't proceed until we get that result back, right? the rest of our code needs that data.

There is an alternative.

Instead of bringing the data to our code, we can ship our code to the data.
With functions-as-values, we can send our code along with the request for the data. This frees up our thread to continue processing, and then our code can execute when the data is ready. When passed as a parameter, the code to execute after completion is known as a callback or a continuation.

Instead of waiting for the data to come back from the file read, we can pass the code that needs to operate on the data. That way whatever thread winds up with the data can execute the code: code and data are brought together.

The pseudocode example has three asynchronous operations. In each case we can change the rest of the code in the block into a callback.
let filename = // calculate filename
new File(file).readAllLines().andThen( { data -> 
   // filter the data
   // summarize the data
   // reformat the data
   new File("output").writeAllLines(newData).andThen( { status ->
       println("done!")
       sendEmail("done! status = " + status).andThen( {
          println("email sent")
       })
   })
})
When this executes, the filename is calculated, the read is triggered and then our program goes about its business doing whatever's next. Everything needed to process the data is bottled up in that function we passed, that continuation. We're passing the code to where the data is, instead of freezing the code in place until the data is available.

The idea of putting functions into values and passing them to the data, instead of bringing data back to the code, facilitates the message-passing that OO was based on. It facilitates a faster service-oriented architecture. It can make a faster web. JavaScript is all over this technique; AJAX and Node.js use this principle.

Continuation style is a lovely combination of imperative style -- everything happens in the order specified -- with the functional concept of code-as-data. It frees the browser or the runtime to optimize and keep open threads busy.

If your reaction is, "yeah, but it's fugly!" then look for my next post.

3 comments:

  1. Interestingly enough, Item #4 from "Effective Enterprise Java" says almost exactly the same thing: "Keep data and processors close together", citing the cost of round trips. In that context, it was talking about the cost of shuffling data back and forth across remote boundaries, and suggesting that one way to do this is to consider stored procs on the database tier, but the concept is the same: the further away your code is from your data (or vice versa), the more pain you have to go through to get them back together again. Just keep 'em close to one another, and avoid the whole mess in the first place.

    Now, ironically, the thought of this being applied to the async context hadn't really occurred to me, much as I'd love to claim that I was ahead of my time. I wasn't. *sigh* Oh, well. Proves I'm human. :-)

    ReplyDelete
  2. I haven't found yet a mechanism that will send my code over the wire, but at least we can move it off to whatever thread has the data. That's some sort of progress.

    ReplyDelete