Data Exposure and Encapsulation

TL;DR – Scala lets you expose internal data, then later change underpinnings while maintaining the same interface.

The functional Way is an open Way. Internal data is available to anyone who wishes to extract it.

“But what about encapsulation??” – horrified OO programmer
“We do not fear your code, for you cannot screw up our immutable data.” – calm FP programmer
“But what about change? If you expose internal structure, then you can never change internal structure. You’ll break all my code!” – OO
“The surface is resilient. The essence is fluid. We can keep client code happy while expanding underneath.” – FP

One of the ways that Scala achieves “scalability” is: make the code easy to write now, and still flexible to change later. An example is simple data structures: the case class. By default, anyone can pattern match to extract all the values from a case class. For instance:

case class PersonName(first: String, last: String)
val me = PersonName(“Jessica”, “Kerr”)

val greeting = me match {
  case PersonName(first, last) => s”Hello, $first $last”

This is handy – it makes building and using simple datatypes painless. Fields are public, which is not frightening because they’re final, safe from alteration.

The danger of this easy syntax is rigidity: if I add a middle-name field to PersonName, then all the code that pattern matched on it is broken. Construction, too.

case class PersonName(first: String, 
                      last: String, 
                      middle: Option[String])
val me = PersonName(“Jessica”, “Kerr”) // broken!

me match {
  case PersonName(first, last) => s”Hello, $first $last” // broken!

We want to change the PersonName without breaking backwards compatibility. In Scala, all the handiness of the case class (construction, extractors) is syntax sugar. That means we can pull it apart and do whatever we want with it behind the scenes.

In Scala, the constructor that works without “new” in front of it is really function application on the PersonName companion object. [1] And function application on anything is really a call to its apply method. Case class syntax sugar provides an apply method that accepts all the fields, like:

object PersonName {
     def apply(first: String, 
               middle: Option[String]) : PersonName =
            new PersonName(first, last, middle)

PersonName(“first”, “last”, Some(“middle”)) expands to PersonName.apply(“first”, “last”, Some(“middle”)) .
Since that’s all the case class constructor really is, we can re-enable the old syntax by adding a bonus apply method to the PersonName object:

object PersonName {
    def apply(first: String, last: String): PersonName = 
       new PersonName(first,last, None)
val me = PersonName(“Jessica”, “Kerr”) // fixed!

There’s another secret method name behind the pattern match syntax. When Scala sees:

(p : PersonName) match {
   case PersonName(a : String,b : String) =>

Scala looks for a method called unapply on the PersonName object; unapply should accept a PersonName and return an optional tuple of two strings.

object PersonName {

  def unapply(pn: PersonName): Option[(String, String)] = 
       Some((pn.first, pn.last))

It’s like the case statement is switching around the return value and the arguments of the unapply method. If calling unapply with the matched-against value (p) returns Some(tuple), then the match succeeds and the identifiers (a and b) get the values in the tuple.

Retaining support for the pattern match is harder than the constructor. We can’t add another unapply extractor to the PersonName object; it conflicts with the generated one that comes with the case class.

Here’s a solution that abstracts PersonName into a trait, then backs it with a different case class of our choice. Now the constructors and extractors are ours to control and manipulate.

trait PersonName {
    val first:String
    val last:String

case class FullPersonName(first: String, 
              last: String,
              middle: Option[String]) extends PersonName

object PersonName {
  def apply(first: String, last: String): PersonName = 
          FullPersonName(first,last, None)
  def unapply(pn: PersonName): Option[(String, String)] = 
          Some((pn.first, pn.last))

This compatibility could also be met by changing PersonName from a case class into a regular class, and supplying apply and unapply methods as we like. What matters is not how we do it, but that we can.

In this way Scala makes the obvious thing easy, and custom behavior possible. Backwards compatibility enables reuse. In OO we encapsulate everything up front to make later change possible, with indirection and lots of boilerplate. In Scala we don’t have to complicate our code to accommodate possible imagined future change. Stick with YAGNI now, without paying for it later. Reduced suffering for everyone! The world is a better place with FP.

[1] We could fix the broken constructor by supplying a default value for middle name.

case class PersonName(first: String, 
                      last: String,
                      middle: Option[String] = None)

That’s simpler but less flexible.

Evolution of a developer


At first I wrote flexible code. I figured that because I couldn’t predict the future, I better leave all the options open. The code was modularized, well-factored, with a lot of bells and whistles, and more than enough documentation. I was proud of it, until I learned that:

  • Things changed so rapidly that even my solidest assumptions became false. 
  • The small flexibilities in the code were too small to be really helpful, yet too many to be easily refactored.
  • In a world of 140 characters, nobody read my heavy documentation. Literate programming didn’t work, if ever.
  • Longer code, no matter good or bad, is harder to maintain.

So, in another project, I switched to writing extremely tight code. I used a lot of functional abstractions and removed every single bit of redundancy. And documentation is, of course, left as an exercise for the reader. I figured that good code was deleted code, and short code could be easily refactored. I was proud of it, until I learned:

  • 80% of the efforts had been spent on reducing 20% of the redundancy. Like speeding, it didn’t save much. 
  • Highly-factored, redundancy-free code can be difficult to document, understand and debug. 
  • The best way to compress some code is using gzip, not your hand.

Now I think, just like many things in life, a balanced position is the best. In other words: write simple code. I deeply believe that you can achieve simplicity without compromising on other aspects. I believe that applying the simplest solution which consumes minimal energy is how the whole universe works. When you get one simple answer, it may not be the right answer. However, once you find the right answer, it is usually also very simple. 

I think we should just make it easy. If you feel difficulty when writing a program, if you find yourself scratching your head, either because of intricate structural design or abstract code factorization, you are probably doing it wrong. After all, you should be able to enjoy programming, not suffer through it. There is more than one way to do it – any approach can be a good start, as long as you feel easy and comfortable about it, and you do not forget to keep improving it. If it’s too fat, cut some; if it’s too thin, grow some. Eventually you’ll reach a point where you find ultimate simplicity. Call it fixed/optimal/equilibrium point as you like, it’s all the same thing. It’s just how the universe works.

— Ming Lei, on the St. Louis Lambda Lounge list

Tools vs Self-restraint, and a Dystopian vision of FP

Back in the Smalltalk days when OO was birthed, the idea was to have a bunch of objects that pass messages to each other. The focus was supposed to be on the messages.

But people like objects. They are easy to draw on a white board. And they come with cool tools like inheritance and polymorphism. Pretty soon these tools became the focus. People learned how to use them, and used them willy-nilly. All that worry about messages was left in the dust.

It’s almost like wanting to be a surgeon, so I learn how to use all the tools that a surgeon uses. Get certified in scalpels and retractors and ultrasound tissue disruptors. And bam, I’m a surgeon! Hire me, I can operate on anybody!

Functional programming is still in its youth. Which tools of the trade will become synonymous with functional, as its principles are left in the dust? The point of FP is making each piece of code interchangeable with its return value. Expressions that can be evaluated in whatever order, or never if they aren’t needed. Whatcha wanna bet people learn how to pass functions around, and start considering themselves functional programmers?

Objects that pass themselves and give access to all their innards are the opposite of the original intent of OO. When FP goes mainstream we’ll have side-effecting, environment-dependent functions passed around willy-nilly. “What do you mean I can’t pass this code over the network?” they’ll say, and RMI will be enhanced to Remote Method Passing, and committees will publish protocols.

OO was originally method-oriented. FP is more accurately expression-oriented. People get hung up on what they can do with the tools these paradigms offer. The essence of each is not what we can do, but what we don’t do. Objects don’t muck around with each others’ state, so we can be sure the code that sets, changes, and uses this data is right here with it. Functions don’t use or impact the outside world, so we can be sure this input entirely determines this output.

Reuse isn’t about typing. Characters are free. It’s about saving headspace: throw this task over a wall to some code I trust. Then I don’t have to worry about design, naming, or testing. But wait, can you trust that superclass? only if you’re diligent about enforcing LSP and various other principles we aim for as we try to use OO tools without stabbing ourselves with the scalpel. Functional programming will become just as painful if we let go of principles and hold tight only to the tools.

As a surgeon, I can excise your kidney, snip off those tonsils, or cut that baby right out of your uterus without stretching your vagina. But should I? The essence of medicine is guessing the right procedure for each patient. Most of the time, the right surgery is no surgery.

Therefore: Abuse of inheritance is like cutting out your kidney to cure the flu. And returning any old function from another function is using radioactive iodine to cure one cancer and cause another.

Maybe we should all take an aspirin and go to bed.


The sage responds:

“The adoption of new paradigms leads to the formations of patterns of
their misuse. And their replacement by other paradigms. Why should
this discourage you? It is the way we learn.

(Shrugs shoulders) Purity is a matter of perspective. Software is for
helping people.”

A realistic use case for applicative functors

But why would you do that??

Say you have a collection of document IDs and you are going to retrieve
the documents from the database. Not right away, but someday. You are
writing a function that defines the operation of retrieving a sequence
of documents.

So there’s such a thing as an DbTask, which says “If you give me a
database, I can do something with it.” In our case we wind up with an
DbTask[Seq[Document]], which says, “If you give me a database, I can
give you this collection of documents you want.”

If you’re wondering why we would abstract the decision of “what to do
with a database” from actually doing stuff with a database, check out
Runar’s presentation about purely functional I/O.

Now you want a DbTask that retrieves a whole sequence of documents, based on a sequence of IDs. Can we build this without writing anything new in the DbTask classes?

    def retrieveAll: Seq[DocId] => DbTask[Seq[Document]] = ???

How do we implement it? Say we have a primitive operation for retrieving
one document.

    def retrieveOne: DocId => DbTask[Document] = // already implemented by someone else

If we can retrieve one, we can retrieve them all! Let’s do that:

    (docIds : Seq[DocId]) => docIds map retrieveOne

That doesn’t give us quite what we want. We turned each DocId into a
retrieval DbTask[Document], so we have a sequence of single

    Seq[DocId] => Seq[DbTask[Document]]

How do we turn that inside out, to get to a DbTask[Seq[Document]]?

To construct a sequence inside an DbTask out of a bunch of
DbTasks, we’ll use these pieces:

  • each DbTask[Document]
  • an empty sequence, Seq()
  • a way to add a document into a sequence, which is the +: (prepend) method on Seq.
  • a way to put each of these last two into a DbTask. It’s sort of a factory method.[1]

        DbTask(x: X) => DbTask[X]

  • a way to plug things together within DbTasks. Call this apply. It’s a method on DbTask that takes a wrapped function (microwave in a tortilla) and spits out a wrapped result (hot food in a tortilla).

    DbTask[X] { …
       def apply(f: DbTask[X => Y]) : DbTask[Y]

We have the first three easily enough. The last two together form the
qualifying characteristics for an applicative functor. Applicative
functors let us plug the things inside them together, like building a
ship in a bottle.

Let’s use these five pieces to turn our DbTask[Seq[_]] into a
Seq[DbTask[_]]. Fold up the sequence of DbTasks, starting
with a DbTask containing an empty sequence, and then prepending each
item into the sequence, all within the DbTask.

  def turnInsideOut[X]: Seq[DbTask[X]] => DbTask[Seq[X]] = 
  { seqOfOps =>
      val wrappedEmptyDbTask[Seq[X]] = DbTask of Seq.empty[X]
      def wrappedPrependDbTask of((b: Seq[X])=>(a:X)=> a +: b)

      seqOfOps.foldRight(wrappedEmpty){(e, soFar) =>
           e apply (soFar apply wrappedPrepend)}

Note that prepend needs two arguments, and they’re applied one at
a time. All of the operations take place within the bottle.

    def retrieveAll = { ids =>
       turnInsideOut( ids map DbTask.retrieveOne)

The code for this example is on github. I’m not sure it’s good, so I put it on the internet and you all can correct me.

The point is: applicative functors let you build up what you
want to do, within a context. Think of the context as a bottle, or a
tortilla, whatever you like. In this example, the purpose of that
context is to hold operations on a database for later execution.

The example assembles singles into a sequence. The same ideas can
assemble them into tuples, arrays, etc. Libraries like scalaz abstract
the turnInsideOut functionality so you don’t have to write it yourself.

Applicative functors are one small functional recipe for building things that combine in many ways. Look for more of these on this blog in the future.

[1] FP people call this X => O[X] “pure” or “return”.


Converting from svn to git: salvaging local branches

If some team members use git-svn locally, they might have a local working branch. When the team moves to a central git repository, that work needs to come with them.

In the old git-svn repo:

1) Find the revision where the local branch of interest branched off master:

git svn find-rev `git merge-base $branch master`

Here, master corresponds to subversion’s trunk and $branch to the local branch we want to save.

merge-base says “Tell me the last commit these two branches have in common.” It gives back the label (hash) of the commit. To learn more about it, run git show

svn find-rev says “Tell me the svn revision number for this git commit.” You’ll get back a number, such as 3456.

2) Create patch files for every commit on the branch:

git format-patch master..$branch

format-patch will produce a bunch of .patch files in your local directory, one for each commit between master and the tip of your branch. They start with numbers, in the order the commits were made, so that they can be applied in that order later.

In the new git repository:

After cloning the new git repository (goodbye svn!)[1]

1) Find the commit that corresponds to the one where the branch started.

git log :/@3456

where 3456 is the revision number, @ is a symbol that happens to be before the revision in the commit log, and git log :/ means “show me the log starting from a commit with this text in its message.”

You might have to search for your revision number to find the right one, in case that number happens to appear in other commit messages.

Copy the hash of the commit you find.

2) Check out that commit

git checkout

Now you’re in detached head state!

3) Create the branch here, and check it out

git checkout -b $branch

Whew, head is attached again.

4) Apply the patches

git am ../path-to-old-git-svn-repo/*.patch

Now your branch exists on the new repository, as if you’d always been working from there.

[1] If you’re not moving the branch from one git-svn repo to another, then find the destination commit like this:

git svn find-rev r3456

where 3456 is the revision number where the branch starts.