Thursday, March 22, 2012

Continuation Style Without the Fugly

The previous post discussed advantages of using a continuation style to send our code to the data instead of making our code wait for the data to come back to it.

This pseudocode example has three I/O operations, each encasing the operations which should follow in a callback function parameter.
let filename = // calculate filename
new File(file).readAllLines().andThen( { data -> 
   // filter the data
   // summarize the data
   // reformat the data
   new File("output").writeAllLines(newData).andThen( { status ->
       println("done!")
       sendEmail("done! status = " + status).andThen( {
          println("email sent")
       })
   })
})

There's an obvious negative here: it's ugly. All this stuff happens in a clear sequence, the same sequence as in the first pseudocode, but it's hard to see that with all those indentations.

It would be nice if we could describe this without all those curly braces and indentation.
F# async workflows are an example of that goal achieved. The following is pseudocode stuck in the middle of an F# async workflow block.
async {
  let filename = // calculate filename
  let! data = new File(file).readAllLines();
   // filter the data
   // summarize the data
   // reformat the output
   let! status = new File("output").writeAllLines(newData);
   println("done!");
   do! sendEmail("done! status = " + status);
   println("email sent");
} |> Async.Start

This is precisely what we'd like the code to look like, with a few exceptions. The "async" at the beginning and the piping of the block to Async.Start are the F# overhead to trigger the magic asynchronicity.
The let! and do! keywords are where the magic happens: they trigger F# to wrap the rest of the block into a super-sneaky callback that gets passed to an asynchronous operation which evaluates the expression in the let! or do! in another thread. When the file read is complete, the rest of the code proceeds, in whatever thread it's in.

The second example executes exactly like the first one. But it reads sooo much more smoothly!

The tricky bit is that let! and do! sneakily break up the block of code. Anything after the let! may happen in a different thread, and then the next let! might switch execution to yet another thread. As a programmer we don't have to worry about that, but it's mind-bending when the code looks so sequential.

I hope we'll see more languages and frameworks that can operate like this: pass the code to the data, but do so in a manner that keeps our code readable and organized.

Bring the data to the code, or the code to the data?

Object-oriented code was conceived as message-passing between objects. Service-oriented architecture emphasizes delegation to another system. The entire web is a whole bunch of requests flying around. There is one clear way to be efficient about this: stop waiting for results.

When we're writing imperative code, we want to write the operations in the order they should happen. This is straightforward and makes sense to our brains. Pseudocode:
let filename = // calculate filename
let data = new File(file).readAllLines();
// filter the data
// summarize the data
// reformat the output
let status = new File("output").writeAllLines(newData);
println("done!");
sendEmail("done! status = " + status);
println("email sent");
This describes the order in which operations need to happen. The problem is, it is not efficient. We're holding up a thread waiting for I/O. Take the part where we read lines from the file, for instance -- we can't proceed until we get that result back, right? the rest of our code needs that data.

There is an alternative.

Instead of bringing the data to our code, we can ship our code to the data.
With functions-as-values, we can send our code along with the request for the data. This frees up our thread to continue processing, and then our code can execute when the data is ready. When passed as a parameter, the code to execute after completion is known as a callback or a continuation.

Instead of waiting for the data to come back from the file read, we can pass the code that needs to operate on the data. That way whatever thread winds up with the data can execute the code: code and data are brought together.

The pseudocode example has three asynchronous operations. In each case we can change the rest of the code in the block into a callback.
let filename = // calculate filename
new File(file).readAllLines().andThen( { data -> 
   // filter the data
   // summarize the data
   // reformat the data
   new File("output").writeAllLines(newData).andThen( { status ->
       println("done!")
       sendEmail("done! status = " + status).andThen( {
          println("email sent")
       })
   })
})
When this executes, the filename is calculated, the read is triggered and then our program goes about its business doing whatever's next. Everything needed to process the data is bottled up in that function we passed, that continuation. We're passing the code to where the data is, instead of freezing the code in place until the data is available.

The idea of putting functions into values and passing them to the data, instead of bringing data back to the code, facilitates the message-passing that OO was based on. It facilitates a faster service-oriented architecture. It can make a faster web. JavaScript is all over this technique; AJAX and Node.js use this principle.

Continuation style is a lovely combination of imperative style -- everything happens in the order specified -- with the functional concept of code-as-data. It frees the browser or the runtime to optimize and keep open threads busy.

If your reaction is, "yeah, but it's fugly!" then look for my next post.

Wednesday, March 21, 2012

The confluence of FP and OO

Michael Feathers has a great blog post up today about how object-oriented and functional programming styles are suited to different portions of an application.

Specifically, the object-oriented style fits the high-level application components. The original intention of OO was for objects to pass messages to each other, which is suited to a SOA style and asynchronous message-passing. At this level, components should tell each other what to do, rather than asking for information (as return values). OO and top-down design work well together.

The functional style is better suited to the low-level components. Within objects, a functional style is cleaner and simpler for computations. A functional style is friendlier to an iterative, test-driven approach.

There is one more piece where functional precepts can help us become even more object-oriented, in the message-passing sense. We can make more messages asynchronous when we can pass callbacks. These functions-as-data let us specify what happens next, removing the need to wait for a return value. This looks different from an imperative style, but it is embraced by JavaScript and Node.js, and it's getting more ubiquitous by the day. It's suited to asynchronicity, which enables more OO-style message passing.

We can embrace OO and FP in their areas of strength. F# and Scala seem very important in this light. They take away the pain of crossing the seam between OO and FP. We can even embrace the callback (or continuation) style without making our code unreadable: in F#, computation expressions can break imperative-looking code into pieces that execute as each message-pass is completed.

The hybrid languages give us opportunity to optimize our code at both high and low levels, to break our application into large object-oriented chunks and implement each solution functionally. When both paradigms have full language support, we can employ the strengths of each and interoperate at will.

Sunday, March 18, 2012

One level deeper

It is often said that the developer should understand one level deeper than she's working. If she's writing Java, she should know how the JVM works. If he's using a container, he should know conceptually what's going on inside the container.

This statement is true for more than just runtimes and frameworks, but all the abstractions and innovations we're building on. If we understand why our language provides certain features, then we can know when to use those features.

For instance, if we know the purposes of static typing, then we can know when to use it to the varying degrees available.
These purposes include:
1) Guarantee certain errors will not happen.
2) Document what each value means.
3) Identify places where a change impacts the code.
4) Prevent other coders from accidentally misusing our abstractions.

Based on the importance of the above purposes to our project -- how bad is a runtime error? how many other developers need to use this? how confusing is it? -- we might choose to use different levels of static typing.

In Java, for instance, we might shove information into a List of Strings all day long. Or, we might create custom datatypes to express an ordered collection of property names vs a nonempty collection of error messages. The more specifically we express our types, the more checking the compiler can do.

Yesterday I was bit in the butt when I instantiated an (unmodifiable) empty map and later tried to add to it. Why is there only one interface for Map? Why do I get UnsupportedOperationException instead of a red underline in my IDE saying "method 'put' does not exist on interface ReadableMap" ? When I have a read-only Map, I would like its type to express that.

Obtaining maximum benefit from the type system generally involves some attention on the part of the programmer, as well as a willingness to make good use of the facilities provided by the language." -- Pierce, Types and Programming Languages

The designers of Collections.emptyMap() did not put this level of attention into the type system.

Yet, in some cases typing this specificity is overkill. Pierce again:
The tension between conservativity and expressiveness is a fundamental fact of life in the design of type systems.
This means that static type checking sometimes prevents you from doing things that are perfectly valid. Maybe it's just fine to pass in a constant string as a FirstName, but if your method expects a FirstName instead of String, extra code is required by Java's type system.
Yesterday I took advantage of erasure to return a raw type when I didn't need the data governed by the type parameter. Dirty, I know - but the statically checked type was overly restrictive in that case.

Understanding the purposes of static type checking can help us know what level of effort we should go to be as expressive as possible.

This is one example of getting past how to use the type system, into the why of its existence, so that we know when to use it. Understand one level of abstraction below where we work.

Wednesday, March 14, 2012

Git: retroactive branching

Say you're working along on a branch (let's say master), and suddenly you realize that this task is harder than you thought. You wish you had started a branch for this feature, back before you began it. Now you need to work on a bug instead, so you've got to put these changes away -- if only you had branched!

This is extremely easy to fix. First, I'll tell you what to type. Then, I'll tell you how it works, so that you can adjust the solutions to your own needs.

What to Type:

1) Commit your changes. It's ok if they're not finished; you'll come back to these later.
2) git branch feature_branch_name
3) git reset --hard origin/master

Bam, your changes are saved off on feature_branch_name, your current working directory matches what was in the origin repo last time you fetched (or pulled) from it, and you are ready to start work on the bug.

The details


1) After your changes are committed, you're still on the master branch. git status looks like:
# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.
#
nothing to commit (working directory clean)
and in gitx (if you're on a Mac, please download gitx):
Notice that your master branch is ahead of origin/master.

2) git branch does exactly one thing: it creates a new label. It goes right where you are, on your most recent commit.
This operation does not check out the new branch; master is still your current branch. Git status has not changed, and gitx has that new label on it:

3) Now the real magic. "git reset --hard origin/master" does three things. Primarily, it moves the "master" label (your current branch) to wherever "origin/master" is. That's what reset does: it moves labels around.
The output is:
HEAD is now at 0562dac Some commit that's already been pushed to origin
This tells you the second thing it did: it moved the HEAD label, which marks where you're currently working. You're now working on master, which is now before you did your feature-related changes.
Git status looks the same. Gitx says:
That new_feature branch is sticking out up there, saving your work. But you're back at the "master" label. This is the third thing that your git reset command did, thanks to "--hard": it replaced everything in your working directory with what the repository looks like in origin's master branch.
"--hard" tells git to rejigger all your files.

Why is "--hard" not dangerous? well, it can be. But we're safe because
1) we committed changes
2) we gave that commit a label.
Git will always and forever remember the exact state of your code at any commit that has a label, or is in the history of any commit with a label. If a commit is reachable in the history from any reference (branch, HEAD, tag), then it is safe.

The critical point to this operation -- the reason this is really easy for me now, but was super-hard before I understood the commit graph -- is that we don't have to move any code. We don't have to move any commits to a branch: instead we move the labels around. Branches are simply pointers to a commit. We can move them around all day. In this example, we added a new pointer for the branch, moved the "master" pointer back to where it was before our changes, and replaced our working directory files with the older stuff.

This contrasts sharply with the overhead of branching in Subversion. Yay Git!

Monday, March 12, 2012

VCS as Biographer

Git gives us all this amazing power to control what we commit and when. We can even go back and modify commits to rewrite history. This power is dangerous. What's the point of having it?

The reason is that our version control history tells a story of our project. It is up to us what that story says. Git gives us control over what that story looks like.

1) It gives us privacy. Distributed version control means we have a full-fledged repository right on our hard drive. We can use the VCS to save our work without sharing it with the world. By comparison, Subversion is like a big open locker room. If you put your underwear on inside out, then take it off to flip it around, everybody knows it.

2) It gives us a little preview of what we're committing. The staging area means we can commit some of our changes, then others. This lets us maintain separation of concerns between commits. Programmers love separation of concerns, right?

3) it lets us adjust our commit history as much as we want before publishing them. This is like editing our video in Camtasia before uploading it to YouTube.

4) It lets us save our game. Create branches all day; git is the honey badger in a fire ant nest. We can go back to any savepoint (any commit, really, but branches are like saved games with meaningful filenames).

Now, what you want your story is to look like is another question. Do you want every tiny step and backstep recorded in the main repo? Do you want each major feature to show up as one commit on the master branch? Do you want to see merges as separate commits, or do you prefer the rebase strategy, wherein we pretend our feature development started from where the repo stands when we finish feature development? This is all a matter of aesthetics. The point is that with git, you have the option to consciously choose what you want your version control history to say about your project.

A prospective new team member came in the other day, and he wanted to look at our commit graph. The commit graph speaks about how development occurs, how team members interact. Think about what you want yours to say about you.

Tuesday, March 6, 2012

Static Methods Are Your Friend

There's an age-old debate about whether static methods are the devil. They make unit testing so hard!

Utility methods are easier to use when they're static. Take for instance a simple method to pull the value of a cookie off an HttpServletRequest. Since this is Java, we can't put the method on the HttpServletRequest class. We need to make a utility method instead, say in a class called CookieUtil.

public String getCookieValue(String cookieName, HttpServletRequest request);

Why inject or instantiate a CookieUtil just to call this method on it? If the method is static, we can call CookieUtil.getCookieValue(...) quickly and easily, without adding constructor arguments and fields that clutter up our classes.

The argument goes: you can't mock the method if it's static! (or, if you do, it's a big pain in the butt.) Our unit tests must test onnnlyyy this class!

First, let's consider only static utility methods which meet these two criteria:
* Referential transparency: the output depends entirely on the input. Same input -> same output, every time
* No side effects: there is no I/O in the method, and the objects passed in are not modified.

These static utility methods are highly testable. I think we should test the heck out of any static utility method, and then not worry about mocking it.

At some point we draw the line. Do we mock out the collections libraries? No! At some point you say, "This other code is tested separately. I am relying on it to work."

There's a compelling reason for this: your code is better tested when the utility methods are not mocked. When you mock a dependency such as CookieUtil, you're saying, "I'm testing that my class interacts with CookieUtil this certain way." What if that's not the way CookieUtil works? What if your mock of CookieUtil returns empty string when a cookie doesn't exist, but the real one returns null? Bam! NPE!

Environments (such as my previous place of employment) that emphasize purity of unit tests are vulnerable to these integration problems. Most bugs don't happen within a class - they happen at the seams where code written by various people intersects. Test this intersection!

Declining to mock utility methods is a step toward full testing. Test the low-level utility methods first. Then build on those, and test your classes including the integration with the real utility methods. This results in easier tests, less cluttered code, and better tested software. What's not to like?

Saturday, March 3, 2012

Abstractions in methodologies

We love to make abstractions. Human beings understand things in terms of abstractions, especially in terms of metaphors and similes. When we encounter something new, we find ways that it is like something similar. Eric Evans makes some great points about this in Software Is (Not) Like That, an article on DDD from way back in 2004.

Evans points out how ridiculous it is to pretend software is like manufacturing:
Before there were factories, crafts-people created useful objects, which were relatively costly and of variable quality. If there had existed the ability to effortlessly replicate objects created by the world's best crafts-people, would we ever have had factories? We don't need software factories.

and the destructive consequences of pretending building software is like architecture:
Smart, experienced people become incompetent through loss of hands-on practice....
The natural emergence of dynamic, creative design work squelched by remote idealists.
A programmer underclass established.

These illustrate the negative consequences of applying halfway-decent metaphors too broadly, and of applying bad metaphors at all.

The same problems apply to code. Take Object-Oriented Programming. (oh no, here I go again dissing OO. I promise, I'm only dissing the initial hype.) When OO was new, it would solve all our problems by allowing us to think of software in terms of concrete metaphors. Everything is a noun. We like nouns! We can hold nouns in our hand, turn them over and look at them, even pitch them at your head if you try to make us let go of our favorite metaphors.

In some domains these metaphors are useful, but taking them too far is destructive. As Eric points out toward the end of his article, metaphors can be useful for conveying information when you keep them small. We need to be very quick to discard a metaphor. Let it make its small point and then drop it.
Personally, I enjoy carrying a metaphor as far as I can and then much farther, until it becomes ridiculous. Once it is clear that the metaphor is ridiculous, there is less temptation to hold on to it.

There's another point here. Metaphors are a form of abstraction, and abstraction is necessary for broad understanding. However, metaphors are super appealing because they're concrete. Like those darn nouns in OO development, where in our training courses we modeled people, pens, chairs, things we can hold in our hands and throw. The best abstractions, the ones that we are less tempted to cling to until they become ridiculous, are more abstract abstractions. Design patterns do this. For-programmer DSLs often do this. As developers, we're smart. We can conceive abstractions that aren't named after something we've seen before.

Evans's article mentions that in real modern factories, productivity comes from small teams on the manufacturing floor given some independence and self-determination. In an architected software system, the programming-team innovation that's quashed in the name of uniformity. There's a theme here: smallness. The best abstractions aren't good for everything. They're great for one specific thing, and then they stop. The best abstractions usually aren't concrete, because software is not concrete.

Metaphors may help in our understanding of programming methodologies and of the software itself, but they should be quickly discarded. They're each a stone on the path to understanding, where we can talk about the problem domain in its own terms.

Thursday, March 1, 2012

Strong Typing in Java: a religious argument

Strongly-typed, type-inferred languages like F#, Scala, and Haskell make Java feel like its static typing system is half-assed. This is not Java's fault entirely; it's the way we use it.

Say we have a domain class like

public class User {

public User(String firstName, String lastName, String email, long balanceInDollars) {}

...
}

I'm in agreement with Stephan that using String and primitive types here is wimping out. That is weak typing. Weak! Chuck Norris does not approve!

firstName and lastName and email are conceptually different. If each has their own type, then we get real compile-time type checking.

public class User {

public User(FirstName firstName, LastName lastName, EmailAddress email, Money balance) {}

...
}


This gives us type safety. As we pass a FirstName or an EmailAddress from class to class, we know what we're getting. If we mix up the order of the arguments, we hear about it from the compiler.

"But they're just strings!" you say. "Don't make a bunch of cruft - just call them what they are!"

NO! I say. They are not Strings. They are stored as strings. That, my friend, is an implementation detail. FirstName and EmailAddress represent distinct concepts, and they should have distinct types. The first part of wisdom is calling things by their right names.

There are other OO-style benefits from this, such as putting the validation for each type in its type class and changing the internal representation of the type without affecting its interface. Those may be significant in some situations, but in my argument they're icing. The benefit of strong typing is compile-time checking, and that's reason enough to call a FirstName a FirstName and not a vague String.

Now, let's address this "bunch of cruft" argument. No question, Java does not make this kind of type-wrapping pretty. In Haskell, it takes a one-line type alias. By OO principles, we ought to be able to inherit from String to get its behavior. But noooo, this is Java, and String is final, so we wind up with

public class FirstName {
public final String stringValue;

public FirstName(final String value) {
this.stringValue = value;
}

public String toString() {...}
public boolean equals() {...}
public int hashCode() {...}

}

(notice that I used a *gasp* public field. That's another religious argument for another post. I include it here just to stir people up even further.)

Then every time we want a user:

User u = new User(new FirstName("Joe"), new LastName("Frank"), new EmailAddress("..."), new Money(30));


We can get a little better by providing static methods and statically importing them:

(in FirstName)

public static FirstName(final String value) {
return new FirstName(value);
}


User u = new User(FirstName("Joe"), LastName("Frank"), EmailAddress("..."), Money(30));

That's a little better. But now we get into the part where user is an immutable value type, and we want to update one field.

User updated = new User("Joseph", old.getLastName(), old.getEmailAddress(), old.getBalance());

Ugh! and every time we add a new field to user, even though we simply want to copy it, we have to modify this code.
Copy constructors don't help when the type is immutable.

Let's talk about F# for a minute. F# has records:

type User = { firstName : FirstName; lastName : LastName ; email : EmailAddress; balance : Money }

and then the cool bit is, when you want another record that's almost the same:

let updated = { oldUser with firstName = FirstName("Joseph") }

I want to do this with my Java user. I want to say

User updated = newUser.with(FirstName("Joseph"));

which is cool and all; with strong typing we can overload the "with" method all day long. We can chain them. We can add implementations of "with" for common combinations of fields to reduce instantiations.

(in User)

public User with(FirstName firstName) {
if (this.firstName.equals(firstName)) {
return this; // avoid instantiating an identical value object
}
return new User(firstName, this.lastName, this.email, this.balance);
}

Now you have a whole ton of "with" methods that can be chained. If you add a new field to User, you need to change all of them, but they're all in the same place so that's just fine. What changes together, stays together.

Now, if you don't like an instantiation per changed field, or if you don't like all those "with" methods cluttering up your user class, here's another idea:

User updatedUser = new UserBuilder(oldUser).with(FirstName("Joseph").with(LastName("Frankenfurter").build();

where the UserBuilder keeps the oldUser and uses its values for any fields that aren't provided by the caller to build the new user. That's one instantiation, only one method that instantiates a user, and it's encapsulated into one builder class.

Some people may argue that immutable types and strong typing in Java is going against the way the language is intended to be used, and therefore does nothing but make our lives more difficult. "That's why God gave us POJOs," they say. Java is a powerful language, and it is capable of supporting more idioms than the ones the language designers envisioned. We can grow as programmers within our language of choice. Java supports strong typing.

Strong typing gives us compiler errors on what would otherwise be caught only during testing. It creates some extra code, sure, but it's localized. It can make working with immutable types a little cleaner.

I say, never use "String!"