Saturday, November 23, 2013

Trains within trains

In Java, there are primitive types and there are objects. Often we want to work with everything as an object, so those primitives get boxed up into object wrappers.
Ruby and Scala say, "That's silly. Let everything be an object to begin with." That keeps parameter-passing semantics and comparison and printing operations consistent.

Yesterday while writing code, I thought I was working with a sequence, so I wrote

input map { transformation } map { transformation2 }

Then the compiler complained, because my input was a simple value. Value? pshaw. I've clearly gotten used to working with Functors. As that last post says, functors are like trains that hold a value. You can attach transformations like extra train cars, and the result is still one train.
 
So I wrapped my value in an Option and went about my business.

Some(input) map { transformation } map { transformation2 }

Option is a functor, a context that can hold a value and incorporate transformations. However, Option is more than that. As a context, it has a special power: it might hold no value at all.

The map method on Option doesn't let me use that special power. Map works within the context in that transforming an empty Option has no effect, but map won't let me switch from a full Option to an empty Option. If I have a transformation that fails, I'd want the Option to turn out empty.

This is the flatmap method: it might map it, or it might flat it.... OK, that doesn't make any sense. That's the name of the method, anyway.[0]

As a train, flatmap says, "Instead of telling you what the next car is, I'll wait for you to give me the value and then I'll give you a whole new train." 

The net result is still exactly one train. Unlike mapflatmap lets us use the special power of the context.

List (or Seq) is another functor with a special power: it holds 0 to many values. map transforms values, but flatmap takes advantage of the special power, so the result can contain a different number of values. Maybe twice as many, maybe none.

One more example: Future holds a value may not exist yet. Future.flatmap lets you produce another value that may not exist yet, and the result is still one Future

Each context has a special power, and flatmap lets you exercise that power a second time, always returning one context. The context doesn't get deeper. You don't get an Option of an Option back from flatmap; you still have one layer of Option around a value.

The Option train probably evaluates your trains-formation immediately, and gives you back a Some or None right away. A lazy List or a Future might store the instruction for later, or execute it right away -- the important thing is you can't tell the difference.

You know where this is going, right?


A monad is a context with a special power, and a method flatmap [1] that lets you use its special power again and again.

There's one more consistent feature of all monads: they all provide a constructor that lets you put a value in the context. For Option, that's Some(x). There's List(x), and Future.now(x).[2] It says, "if you have a value, you can put it on a train."

What they DON'T have in common is a way to pull values OUT of the train. This is different for every monad, if it exists at all. We have Option.get, and List can be accessed by index. Future doesn't really want you to ever get the thing out; if you insist, Await can dig it out for you.

Since you have to do something different to get the value out every time, what's the point of naming them "monad" at all? It turns out there are things we can do to "anything inside a monad." One is to build up a computation without (necessarily) running it right away. Right now I'm looking at Process in scalaz-stream, which does this. Yet, many things you now do to a List could be done to a monad in general. You can always map or flatmap, without knowing what special powers you're working with.
Ever write a function to operate on a sequence of its arguments rather than just one, so that you can call it either way? Take a function that saves items to a database. It can save one or many; maybe you even made it accept a variable argument list. What if, instead, you wrote that function to operate on any monad containing items? Then different callers could pass multiple items in a List, a possibly-nonexistent item in an Option, or an item that hasn't been determined yet in a Future. The Option would execute the database-inserting action immediately, while the Future would save it for later. Your function returns the result of the insertion in that same context. Now that's accommodating![3]
If the caller really does want to pass in exactly one item, she could wrap it in the Boring monad that has no special powers at all.[4] Why, with that, I could work in monadic style all the time! map and flatmap all the things!

It sounds crazy now, but hey, we used to manually box up our primitives in Java, too.
--------------------------------
[0] I can think of a few ways to interpret "flatmap" as a reasonable name for the method, but it'll be more fun to let people comment.

[1] Haskellers call this flatmap method "bind." You can combine one within another without getting any deeper, and if you can combine two you can combine as many as you like: therefore, monads are composable.

[2] Haskellers call the single-value constructor "unit."

[3] technically you could accept any functor, I think.

[4] I totally made that up.

Sunday, November 17, 2013

From lists to trains to functors

The first useful bit of functional programming is the map method on List. It gives us a new List, which contains the results of the mapped function applied to each item in the list. We might think of it like this, mapping f over a List of a single item of type A:
That's the equivalent of, in Java:

  public static <A, B> List<B> map(List<A> oldList
                                   Function<A,B> f) {
    List<B> newList = new LinkedList<B>();
    for(A item: oldList) {
      newList.add(f.apply(item));
    }
    return newList;
  }

Except... that's a naive picture of what map does. That's one possible implementation which works for Lists. There are others. Consider a lazy list. The lazy list does not evaluate a mapped function until its result is requested. So what we get is more like this:

From the outside, the two Lists of type B are identical. When you get the values out of the list, they're the same.

The transformation is encapsulated inside the lazy list. It's guaranteed to be used before we can see the value from inside, but it doesn't have to be used immediately.
It's kind of like, the List holds its value in a train car, and the train car always has a hook to which another train car can attach.
Map hooks the function up like another train car.

When you pull the value out of the train, it passes through all the cars.
The key is: no matter how many cars you hook up, you still have exactly one train. And there's always a hook at the end that can attach more cars.

This means that a function passed to map doesn't have to be applied immediately. The list implementation can hook it up and save it for later. This kind of temporal distortion, of "you told me what to do, and I'll do it when I'm good and ready" is part of the power and mystery of functional programming. It gets more magical in places other than list. Consider Future.

Future represents a value that we might not have yet. Maybe it's being fetched by a process on another thread. It's a container like List, except that it might not have that original value yet. Future has a map method just like list, and we can hook up functions like train cars in the same way.
Perhaps your Future is waiting for something to retrieve an Order from the database, and what you really care about is the total amount of the order. You can hook up a train car for that.

Here's the Future waiting for an Order:
Map that across a function to attach a train car. (in Scala)
val extractAmount: Order => Amount = (o: Order) => o.totalAmount
val futureAmount = Future.map(futureOrder, extractAmount)
Now you have a Future waiting for an Amount. After the database-accessing thread finishes its work, the Future will have a value for you, and when you get it out, it passes through all the train cars (all the mapped functions).
This is that time-warping thing again. When we chose to map the Future across the amount-extracting function, that was before there was an Order to extract the amount from. We can specify what to do before we can possibly do it.

If this makes sense, if you can see the commonality and usefulness of hooking up new functions that will transform a value before we can see it, then it's time to give a name to it: Functor. A functor is any type with a map method that works like this. Whatever sort of value it has for you, or may have later, you can hook up the train cars to transform that value. And what you get back from map is always another of the same kind of Functor, now holding a different type of value. Add all the cars you like, it's still a train.
Functor. Map over it. Because it always has a hook, ready to attach another train car.
Choo! Choo!


Wednesday, November 13, 2013

git: I want my stash back

TL;DR - it's a good idea to throw experimental changes on a branch instead of stashing them.

Today I stashed some changes, then popped them out, then decided they were a failure and wiped them out, then (later) wanted them back.

git stash makes a commit, and commits are not deleted for thirty days, so there had to be a way to get them back.

git fsck finds lost references. But only ones that are good and thoroughly lost, not stashed or listed in a reflog. To find all the references that are not in the history of some named commit, try

git fsck --unreachable > temp_file

Stick the output into a temp file because this takes a long time.

Then to find the commit I was looking for:

cat temp_file | grep commit | cut -d ' ' -f 3 | xargs git show --name-only

where 
cat temp_file reads the temp file
grep commit chooses only the commits out of all the unreachable objects found by git fsck
cut -d ' ' -f 3 prints only the third field, using space as a delimiter. This gives me the commit hash
xargs puts the input lines on the end of the argument list
git show prints information about any object it git; for commits it shows the git log output and the diff
--name-only narrows the diff output to only the changed filenames

That gave me a whole slew of commit descriptions, opened in less the same way git log works. I searched for the filename of interest

/Stuff
(forward slash is the less command for search; push n to go to the next one)
and then moved up (arrow key) to see the description of that lost stash-entry commit, along with its commit hash.

commit 2b16061dba6ba8e529e56f53544c84ba432ed7be
Author: Jessitron <jessitron @gmail.com>
Date:   Wed Nov 13 11:06:19 2013 -0600

    index on develop: 776a776 Debug logging for claiming

src/main/scala/poo/Stuff.scala
src/test/scala/poo/Stuff_UT.scala

Finally, I can check that out and create a temporary branch, which is what I should have done in the first place.

git checkout commit_hash
git branch crazy-changes

Now these experimental changes can hang out, out of my way and easily reachable, until I'm good and done with them and delete the branch.

git branch -d crazy-changes

Question: if I can only find this with fsck --unreachable, that implies that this commit is still recorded somewhere. Anyone know where that is?