Saturday, September 28, 2013

Data Exposure and Encapsulation

TL;DR - Scala lets you expose internal data, then later change underpinnings while maintaining the same interface.

The functional Way is an open Way. Internal data is available to anyone who wishes to extract it.
"But what about encapsulation??" - horrified OO programmer
"We do not fear your code, for you cannot screw up our immutable data." - calm FP programmer
"But what about change? If you expose internal structure, then you can never change internal structure. You'll break all my code!" - OO
"The surface is resilient. The essence is fluid. We can keep client code happy while expanding underneath." - FP
One of the ways that Scala achieves "scalability" is: make the code easy to write now, and still flexible to change later. An example is simple data structures: the case class. By default, anyone can pattern match to extract all the values from a case class. For instance:

case class PersonName(first: String, last: String)
val me = PersonName("Jessica", "Kerr")

val greeting = me match {
  case PersonName(first, last) => s"Hello, $first $last"

This is handy - it makes building and using simple datatypes painless. Fields are public, which is not frightening because they're final, safe from alteration.

The danger of this easy syntax is rigidity: if I add a middle-name field to PersonName, then all the code that pattern matched on it is broken. Construction, too.

case class PersonName(first: String, 
                      last: String, 
                      middle: Option[String])
val me = PersonName("Jessica", "Kerr") // broken!

me match {
  case PersonName(first, last) => s"Hello, $first $last" // broken!

We want to change the PersonName without breaking backwards compatibility. In Scala, all the handiness of the case class (construction, extractors) is syntax sugar. That means we can pull it apart and do whatever we want with it behind the scenes.

In Scala, the constructor that works without "new" in front of it is really function application on the PersonName companion object. [1] And function application on anything is really a call to its apply method. Case class syntax sugar provides an apply method that accepts all the fields, like:

object PersonName {
     def apply(first: String, 
               middle: Option[String]) : PersonName =
            new PersonName(first, last, middle)

PersonName("first", "last", Some("middle")) expands to PersonName.apply("first", "last", Some("middle")) .
Since that's all the case class constructor really is, we can re-enable the old syntax by adding a bonus apply method to the PersonName object:

object PersonName {
    def apply(first: String, last: String): PersonName = 
       new PersonName(first,last, None)
val me = PersonName("Jessica", "Kerr") // fixed!

There's another secret method name behind the pattern match syntax. When Scala sees:

(p : PersonName) match {
   case PersonName(a : String,b : String) =>

Scala looks for a method called unapply on the PersonName object; unapply should accept a PersonName and return an optional tuple of two strings.

object PersonName {

  def unapply(pn: PersonName): Option[(String, String)] = 
       Some((pn.first, pn.last))

It's like the case statement is switching around the return value and the arguments of the unapply method. If calling unapply with the matched-against value (p) returns Some(tuple), then the match succeeds and the identifiers (a and b) get the values in the tuple.

Retaining support for the pattern match is harder than the constructor. We can't add another unapply extractor to the PersonName object; it conflicts with the generated one that comes with the case class.

Here's a solution that abstracts PersonName into a trait, then backs it with a different case class of our choice. Now the constructors and extractors are ours to control and manipulate.

trait PersonName {
    val first:String
    val last:String

case class FullPersonName(first: String, 
              last: String,
              middle: Option[String]) extends PersonName

object PersonName {
  def apply(first: String, last: String): PersonName = 
          FullPersonName(first,last, None)
  def unapply(pn: PersonName): Option[(String, String)] = 
          Some((pn.first, pn.last))

This compatibility could also be met by changing PersonName from a case class into a regular class, and supplying apply and unapply methods as we like. What matters is not how we do it, but that we can.

In this way Scala makes the obvious thing easy, and custom behavior possible. Backwards compatibility enables reuse. In OO we encapsulate everything up front to make later change possible, with indirection and lots of boilerplate. In Scala we don't have to complicate our code to accommodate possible imagined future change. Stick with YAGNI now, without paying for it later. Reduced suffering for everyone! The world is a better place with FP.

[1] We could fix the broken constructor by supplying a default value for middle name.

case class PersonName(first: String, 
                      last: String,
                      middle: Option[String] = None)

That's simpler but less flexible.

Monday, September 16, 2013

Evolution of a developer

At first I wrote flexible code. I figured that because I couldn't predict the future, I better leave all the options open. The code was modularized, well-factored, with a lot of bells and whistles, and more than enough documentation. I was proud of it, until I learned that:
  • Things changed so rapidly that even my solidest assumptions became false. 
  • The small flexibilities in the code were too small to be really helpful, yet too many to be easily refactored.
  • In a world of 140 characters, nobody read my heavy documentation. Literate programming didn't work, if ever.
  • Longer code, no matter good or bad, is harder to maintain.
So, in another project, I switched to writing extremely tight code. I used a lot of functional abstractions and removed every single bit of redundancy. And documentation is, of course, left as an exercise for the reader. I figured that good code was deleted code, and short code could be easily refactored. I was proud of it, until I learned:
  • 80% of the efforts had been spent on reducing 20% of the redundancy. Like speeding, it didn't save much. 
  • Highly-factored, redundancy-free code can be difficult to document, understand and debug. 
  • The best way to compress some code is using gzip, not your hand.
Now I think, just like many things in life, a balanced position is the best. In other words: write simple code. I deeply believe that you can achieve simplicity without compromising on other aspects. I believe that applying the simplest solution which consumes minimal energy is how the whole universe works. When you get one simple answer, it may not be the right answer. However, once you find the right answer, it is usually also very simple. 
I think we should just make it easy. If you feel difficulty when writing a program, if you find yourself scratching your head, either because of intricate structural design or abstract code factorization, you are probably doing it wrong. After all, you should be able to enjoy programming, not suffer through it. There is more than one way to do it - any approach can be a good start, as long as you feel easy and comfortable about it, and you do not forget to keep improving it. If it's too fat, cut some; if it's too thin, grow some. Eventually you'll reach a point where you find ultimate simplicity. Call it fixed/optimal/equilibrium point as you like, it's all the same thing. It's just how the universe works.

--- Ming Lei, on the St. Louis Lambda Lounge list

Friday, September 13, 2013

A realistic use case for applicative functors

But why would you do that??

Say you have a collection of document IDs and you are going to retrieve
the documents from the database. Not right away, but someday. You are
writing a function that defines the operation of retrieving a sequence
of documents.

So there's such a thing as an DbTask, which says "If you give me a
database, I can do something with it." In our case we wind up with an
DbTask[Seq[Document]], which says, "If you give me a database, I can
give you this collection of documents you want."

If you're wondering why we would abstract the decision of "what to do
with a database" from actually doing stuff with a database, check out
Runar's presentation about purely functional I/O.

Now you want a DbTask that retrieves a whole sequence of documents, based on a sequence of IDs. Can we build this without writing anything new in the DbTask classes?

    def retrieveAll: Seq[DocId] => DbTask[Seq[Document]] = ???

How do we implement it? Say we have a primitive operation for retrieving
one document.

    def retrieveOne: DocId => DbTask[Document] = // already implemented by someone else

If we can retrieve one, we can retrieve them all! Let's do that:

    (docIds : Seq[DocId]) => docIds map retrieveOne

That doesn't give us quite what we want. We turned each DocId into a
retrieval DbTask[Document], so we have a sequence of single

    Seq[DocId] => Seq[DbTask[Document]]

How do we turn that inside out, to get to a DbTask[Seq[Document]]?

To construct a sequence inside an DbTask out of a bunch of
DbTasks, we'll use these pieces:
  • each DbTask[Document]
  • an empty sequence, Seq()
  • a way to add a document into a sequence, which is the +: (prepend) method on Seq.
  • a way to put each of these last two into a DbTask. It's sort of a factory method.[1]
        DbTask(x: X) => DbTask[X]
  • a way to plug things together within DbTasks. Call this apply. It's a method on DbTask that takes a wrapped function (microwave in a tortilla) and spits out a wrapped result (hot food in a tortilla).
    DbTask[X] { ...
       def apply(f: DbTask[X => Y]) : DbTask[Y]

We have the first three easily enough. The last two together form the
qualifying characteristics for an applicative functor. Applicative
functors let us plug the things inside them together, like building a
ship in a bottle.

Let's use these five pieces to turn our DbTask[Seq[_]] into a
Seq[DbTask[_]]. Fold up the sequence of DbTasks, starting
with a DbTask containing an empty sequence, and then prepending each
item into the sequence, all within the DbTask.

  def turnInsideOut[X]: Seq[DbTask[X]] => DbTask[Seq[X]] = 
  { seqOfOps =>
      val wrappedEmptyDbTask[Seq[X]] = DbTask of Seq.empty[X]
      def wrappedPrependDbTask of((b: Seq[X])=>(a:X)=> a +: b)

      seqOfOps.foldRight(wrappedEmpty){(e, soFar) =>
           e apply (soFar apply wrappedPrepend)}

Note that prepend needs two arguments, and they're applied one at
a time. All of the operations take place within the bottle.

    def retrieveAll = { ids =>
       turnInsideOut( ids map DbTask.retrieveOne)

The code for this example is on github. I'm not sure it's good, so I put it on the internet and you all can correct me.

The point is: applicative functors let you build up what you
want to do, within a context. Think of the context as a bottle, or a
tortilla, whatever you like. In this example, the purpose of that
context is to hold operations on a database for later execution.

The example assembles singles into a sequence. The same ideas can
assemble them into tuples, arrays, etc. Libraries like scalaz abstract
the turnInsideOut functionality so you don't have to write it yourself.

Applicative functors are one small functional recipe for building things that combine in many ways. Look for more of these on this blog in the future.

[1] FP people call this X => O[X] "pure" or "return".

Wednesday, September 11, 2013

Converting from svn to git: salvaging local branches

If some team members use git-svn locally, they might have a local working branch. When the team moves to a central git repository, that work needs to come with them.

In the old git-svn repo:

1) Find the revision where the local branch of interest branched off master:

git svn find-rev `git merge-base $branch master`

Here, master corresponds to subversion's trunk and $branch to the local branch we want to save.

merge-base says "Tell me the last commit these two branches have in common." It gives back the label (hash) of the commit. To learn more about it, run git show <hash>

svn find-rev says "Tell me the svn revision number for this git commit." You'll get back a number, such as 3456.

2) Create patch files for every commit on the branch:

git format-patch master..$branch

format-patch will produce a bunch of .patch files in your local directory, one for each commit between master and the tip of your branch. They start with numbers, in the order the commits were made, so that they can be applied in that order later.

In the new git repository:

After cloning the new git repository (goodbye svn!)[1]

1) Find the commit that corresponds to the one where the branch started.

git log :/@3456

where 3456 is the revision number, @ is a symbol that happens to be before the revision in the commit log, and git log :/ means "show me the log starting from a commit with this text in its message."

You might have to search for your revision number to find the right one, in case that number happens to appear in other commit messages.

Copy the hash of the commit you find.

2) Check out that commit

git checkout <hash of commit found in previous step>

Now you're in detached head state!

3) Create the branch here, and check it out

git checkout -b $branch

Whew, head is attached again.

4) Apply the patches

git am ../path-to-old-git-svn-repo/*.patch

Now your branch exists on the new repository, as if you'd always been working from there.

[1] If you're not moving the branch from one git-svn repo to another, then find the destination commit like this:

git svn find-rev r3456

where 3456 is the revision number where the branch starts.