Saturday, September 22, 2012

Functors: What the funk for?

For all the programmers who don't deeply grok the lambda calculus terminology --

Say you are about to call a method on a container, and that container can give you something back of type Tweet. What you really want isn't a Tweet, but some part of it, say Tweet.getId(). What if, instead of getting the Tweet back from the container and then calling getId() on it, you prefer to tell the container to get the id for you, and return only what you wanted? Then you need a functor.

So you make a quick function literal (tweet => tweet.getId()) that pulls the id out of a Tweet, and pass that in to the functor. The output is the same kind of container, only it holds IDs instead of Tweets. A functor isn't a special type; it's a function with a particular purpose.

Why would you want to do this? I used to think that too. Then...

I have an Iterable<Tweet> tweets, and I want to get the ID of the first element. In this example, I'm using Java with the Guava library. The function to call is Iterables.getFirst, which requires a default value in case the iterable is empty. As it happens, I have a default value for the ID, but no Tweet that contains this default ID. Short of constructing a stupid dummy Tweet that holds my default ID, I'm stuck with:

Tweet firstTweet = Iterables.getFirst(tweets, null) 
return firstTweet == null ? defaultId : firstTweet.getId() 

I have to create an identifier and then check stuff on it. This is not declarative enough or simple enough for my taste. I'd rather do this (using Java 8 lambda syntax for brevity):

return Iterables.getFirst(tweets, t => t.getId(), defaultId)

In this hypothetical example, I'm telling that getFirst function to run my function on the tweet before returning. If tweets is empty, then the function can return defaultId. So I'm supplying a default that is meaningful to me, the function is going to return the same type whether there's a tweet or not, and everyone is happy.

But, that method doesn't exist. And it would be a pain to an optional function argument to the interface of all the functions in Iterables. Thinking in terms of functors, I can solve this problem with functions that do exist in Iterables:

Iterables.getFirst(Iterables.transform(tweets, t => t.getId()), defaultId)

Here, transform is a functor: it applies the supplied function to the elements inside the iterable, returning an iterable of the function-output type. This means a tweet gets turned into a tweetId before getFirst sees it.

To a long-time Java dev like me, this looks inefficient. Do work on every tweet in tweets just to get the first one out differently? Ahhh, but Guava iterables are lazy! That function is not applied until somebody calls next() on the iterable returned by transform(tweetst => t.getId()). Iterables.getFirst calls next() at most once, so only the first tweet is transformed. Therefore, I have exactly what I wanted: turn that first tweet (if there is one) into an id before giving it back to me. The type of defaultId matches the element type of transform(tweetst => t.getId()).

The lesson of functors is: you don't have to take something out of a container in order to operate on it.

In OO design, there's a principle of "Tell, don't ask." Classes are supposed to tell each other what to do, rather than asking for internal data. This is an example of that -- the Iterable<Tweet> has tweets, and I want the ID of the first one. Using the functor, I tell it "give me the result of this function applied to your tweet." This is better encapsulation than pulling out the whole tweet and operating on it.

In this example, a functor is a function that does a transformation on data inside a context. In this example the context is an Iterable and its contents are Tweets. The transformation is a functor from Tweet -> ID. This way, an Iterable<Tweet> can give me back an ID, exactly what I wanted in the first place, without me ever having to see its Tweet.

Look! A functional trick can make my Java more OO than ever.

Caveat: there are many definitions for Functor out there, and different types of functors. This is one.


  1. I never knew about the Iterables.getFirst(...) method. That is actually pretty useful for least it is much more readable than iterable.iterator().next(). My most prominent functor use case has been with the Collections2.transform(...) method in Guava. Given a list of objects, give me all the the id's of those objects.

  2. Nit:

    > Here, transform is a functor:

    I think here the Iterables generic class is the functor and transform is the corresponding fmap. A functor (assuming the usual correspondence between the type system and category theory) maps types to types, not values to values.