Sunday, January 13, 2013

From imperative to data flow to functional style

Functional style makes code more maintainable. How? One way makes processing into a flow of data with discrete steps. Another way separates flow from context, making the context swappable. This post illustrates both.

We'll go from an imperative style to a more and more functional style. Scala is a good language for this illustration, since it supports both. If you want to run this, grab the code from github and :load it in the Scala REPL.

Example: Given a filename, we extract the message from its first line:
The secret message is '...'
Watch out for files that don't exist, are empty, or don't follow the format. In any of these cases, return None. If everything looks good, we get a SecretMessage containing the quoted text from the file.
case class SecretMessage(meaning: String) 
The type of the function is:
String => Option[SecretMessage]
Note: In good old-fashioned style, this function could return null if it can't provide a result. I may be starting out imperatively, but not that dirty. Option is better: None represents no result, or Some contains a value. Either way, we get a valid object reference and no NullPointerExceptions.

Note: Since the SecretMessage is input and output as a string, we could encode it as a String. This isn't specific enough for my taste. I like my types to tell me what the thing is, not just how it is read in or output. 


Imperative

The imperative implementation is straightforward IF you like to play compiler in your head. Open the file, check whether it exists. If it does, read it; check whether it's empty. If it isn't, read the first line and then use a regex to check its format and extract the secret message. Nothing weird there, but you have to step through every line to know what the method is doing.

def imperativeStyle(filename : String)
  : Option[SecretMessage]
= {
  val file = new java.io.File(filename)
  if(!file.exists)
    None
  else {
    val source = io.Source.fromFile(file)
    if (!source.hasNext) {
      source.close
      None
    }
    else {
      val firstLine = source.getLines.next
      source.close
      val ExpectedFormat = "The secret message is '(.*)'".r
      firstLine match {
        case ExpectedFormat(secretMessage) =>
          Some(SecretMessage(secretMessage))
        case _ => None
} } } }


Data Flow

Think about what this function is trying to do. Take a filename, get a file (that exists), get its first line (if it isn't empty), extract a message (if it's in a certain format). These are three transformations, and each of them might not work. These become three methods:
def openFile(n: String) : Option[java.io.File]
def readFirstLine (f : java.io.File) : Option[String]
def parseLine(line : String) : Option[SecretMessage]
If any of these return None, then our function will return None. A Some return means processing continues.
This makes two pipelines: one where processing continues, and one where the None value is just passed along; only the type parameter changes to meet the required output of the function. This is like the gutter in a bowling alley.

Now, to implement this. The three transformation functions I've stuck in a module called transformers; it's the form of the surrounding function that's interesting. We'll walk through four refactors that make it more and more readable.

Minimally, we can change the if statements to call the transforming functions.

def useTransforms(filename: String)
  : Option[SecretMessage]
= {
  import transformers._
  val fileOption = openFile(filename)
  if (fileOption.isEmpty)
    None
  else {
    val lineOption = readFirstLine(fileOption.get)
    if (lineOption.isEmpty)
      None
    else {
      parseLine(lineOption.get)
} } }

This is a little cleaner than the original. The transforming function names call out what each is accomplishing. The separation of these pieces of functionality leads to more lines of code, but now each piece can be tested individually. That's always good!


Moar functional

Checking isDefined on options is ugly. The more idiomatic way to branch on Option is to use pattern matching.

def patternMatching(filename:String)
  : Option[SecretMessage]
= {
  import transformers._
  openFile(filename) match {
    case None => None
    case Some(file) =>
      readFirstLine(file) match {
        case None => None
        case Some(line) =>
          parseLine(line)
} } }

This is shorter, but it still has a repetitive pattern. With each transformation step, we're manually coding that gutter flow.

The next trick is to recognize that Option is a collection. It's a special collection, containing either one or zero items. Then notice that each transformation operates on the single item in the collection returned by the previous step. What method applies a function to items in a collection? map! Except that is not quite right, because map transforms a collection element into another element. Each of our transformers return another collection of zero or one items. What method applies a function to items in a collection and turns each into another collection? flatMap!

def chainOfMaps(filename:String)
  : Option[SecretMessage]
= {
  import transformers._
  openFile(filename).
    flatMap(readFirstLine).
    flatMap(parseLine)
}

Now this is about as short as it gets. The gutter pipeline is encoded in flatMap itself, because a flatMap on None will return another None. How can it get cleaner than this?

There's another way to process collections in Scala: the for comprehension. it lets us line things up in a more readable fashion, and then it calls flatMap and map. Here, we give names to intermediate values in the pipeline.

def forComprehension(filename:String) : Option[SecretMessage] = {
  import transformers._
  for( file <- openFile(filename);
       line <- readFirstLine(file);
       secretMessage <- parseLine(line))
  yield { secretMessage }
}

In IntelliJ, it lets me hover over these variables names to see their types. In my experience, this style is drastically easier to get right, to debug, and to follow when reading. These benefits are worth a few extra lines and curly braces.


What have we accomplished?

We turned a step-by-step imperative function into a very concise concatenation of three transformations. We removed all manual handling of the gutter pipeline. We broke out the transformations into testable bits.


Now for the cool part.

Here, we're using the Option class as a context for our data. It is a context that says, "I might have a result, and I might not. If I don't, roll the ball down the gutter and ignore future transformations."

Requirements change! The caller of our function wants to know why no secret message was found. Instead of None, return a failure type with a message. Implement this by changing the context the work is done in. The work for this is in another file, if you want to run it yourself.

That context is either a Failure or a Success. It is a generic context, but at the end of our function a Success will contain a SecretMessage, while a Failure contains only an error message.[1]

Change each transformer to output Result instead of Option. They each include a descriptive message upon Failure.

One more step is needed to make Result operate as a context the same way Option does. Scala's for comprehension will work with any type that implements map and flatMap. Add the declarations to the common Result trait, and then the implementations to each concrete class.

sealed trait Result[A] {
  def flatMap[B](f : A => Result[B]) : Result[B]
  def map[B](f : A => B): Result[B]
}
case class Success[A](value : A) extends Result[A] {
  def flatMap[B](f : A => Result[B]) = f(value)
  def map[B](f : A => B) = Success(f(value))
}

case class Failure[A](message: String) extends Result[A] {
  def flatMap[B](f : A => Result[B] ) = Failure(message)
  def map[B](f : A => B) = Failure(message)
}

Success applies the passed-in functions to the value inside it. Failure always returns another Failure, with the same message but a different type parameter. This is the gutter where errors flow.

Here's the interesting bit: when it comes to changing our function, we almost don't. The return type changes. The pipeline is identical!

def forComprehension(filename:String) : Result[SecretMessage] = {
  import transformers._
  for( file <- openFile(filename);
       line <- readFirstLine(file);
       result <- parseLine(line))
  yield { result }
}

This is a key concept of functional programming: separating out context. Sometimes that context contains state, other times isolated side effects (I/O), other times order of execution (synchronous, asynchronous, parallel). We can abstract things that used to be inherent. An imperative style cements mutable state and step-by-step order of execution; changing this is a huge amount of work and easy to get wrong. Functional style, once you get used to it, gives us more degrees of freedom.

Next time you write a method longer than a few lines, look for a path of data transformation. It might lead you someplace elegant.


[1] Sure, this sounds like an Either. If only Either implemented flatMap, it would completely work for this.

Thursday, January 10, 2013

Geeks, Freaks, Nerds, & Programmers

This post is about diversity. But not about gender -- gender is a symptom of the larger problem, and I am a part of that problem.

First, something about brains: In our brains, everything that goes in, what we see and hear and read, forms a pattern in our neural connections. Later, when we see something similar, those same neurons are activated and we recognize the situation. It is like Content Addressable Storage, except that it doesn't need an exact match; close is good enough. Storage and retrieval with pattern matching thrown in for free -- it's memory and processing at the same time![1]

As a consequence of this, two brains with different patterns produce different output for the same input. Patterns come from experiences, so no two brains are the same. Two people presented with the same problem will ask different questions, try different solutions, come up with different names for concepts. [2] Experience changes not just what we think, but how we think. The more different the experiences, the more variety in ideas.

Consider geeks like me. The typical geek grew up watching Star Wars, Star Trek, and Monty Python. We played D&D. We read books about our future in space. We were unpopular in high school, but now we have our own subculture where we fit in. Lots of us now work as programmers.

Bonus for us: at work we're all a bunch of geeks, so we have all this shared experience. This lets us communicate efficiently, because we can speak in XKCD references and Lord of the Rings analogies. But what is that costing us?

In design meetings or pair programming, we look at a problem and separately we all come up with the same idea. We say, "Yeah! We must be right, because we all agree!" But are we missing out?

The more shared experiences, the more similar patterns, the more uniform our ideas. We don't even know what might occur to someone with completely different references. Someone who loves history and gardening, or someone raised in a different religion. Likely there are a few people like that on our team, but they don't speak up much. They're in the minority, always voted down. They get quiet after a while.

This camaraderie we enjoy, this snuggly monoculture, it's costing us ideas.

More about brains. When presented with a hard question, such as "Will this candidate be a good programmer on our team?" our brain does some magic. Without consulting us, it substitutes an easier question: "Does this candidate resemble other good programmers I know?" and then also without asking us, it answers this question with easily obtained data, such as "Does he look like the other programmers I know? Does she laugh at the same jokes? drink the same beer? Does he have a beard?"
This is how our gut makes decisions. [3]

Some managers call this "hiring for cultural fit." Is this the culture we want?

Teams are built on shared values. It's easy to get lazy and base our shared values on the Force, "that's what she said," and good beer. This takes away from values like solid code, abstract thinking, or responsive UIs. Team members who don't fit in can pull the team's focus toward the stuff that matters.

Bruce Watson, manager of Atomic Object's office in Detroit, has seen this many times.
"Change the gender mix and behavior immediately changes to a greater focus on work, different perspectives are introduced, and side trips are more social than derogatory.
This effect is not exclusive to women joining all male teams. I have seen behavior change when men join an all women's group or when races, beliefs, and orientations mix."
Monoculture limits ideas and distracts from values that matter. Let's take the "cult" out of culture.

How? Apply a design principle: separation of concerns. Separate being a geek from being a programmer.

Geek is a subculture, but programming is a profession. Right now the terms are used interchangeably in many circles. Geeks are mostly white and male, and this is fine. Programmers are mostly white and male, and this is a detriment to our industry. We're missing out on other brain patterns, and we're not optimizing for teamwork.

The problem is: programmer == geek. Nerd, to those outside our circle. Gender is a symptom, and racial uniformity is a symptom. The US is 50% male, and developers are 90% male. The US has 30% black and Hispanic people, and development has ... hardly any. Gender and race we can measure, so they are the metrics for general diversity. Count the women, count the skin colors, count the ideas your team has access to. The real goal is diversity of minds.

A child growing up as a girl, or a black person or Latino, why would they picture themselves as a programmer? They're not a geek.

I'm a woman, but I make fun of team members who don't read the right webcomics, and I wear this "geekette" shirt to present at user groups and conferences, and I make penis jokes. I'm part of the problem.

What can we do about it?
Three suggestions.
  1. Seek out the team members you're least comfortable with. Ask them design questions. Draw them out if necessary, because they may not be used to it.
  2. None of us, even those who identify as geeks, are defined by this label. Share the parts of us that don't fit the stereotype. At work, at user groups and conferences, talk about topics that are unique or important to you. Talk about books the others probably haven't read. Politics, even. Not your kids -- that's too safe. Tell the stories that changed your outlook on life. Surprise each other.
  3. One way to get new patterns for our team is to get new patterns in our own brains. Go do something you can't picture yourself doing. Dance, climb, speak an obscure language. Run for office. Join a band. Be the diversity we seek. Listen to a kind of music you've never liked. Go past where it feels good, and you might find where it feels wonderful.
We are more than geeks. Our industry is about more than geekdom. Programming is about creating, thinking, abstracting. Let's make programming for everybody.



"As a leader, I know that the more mixed the group, the easier it is to get a stronger focus on teamwork. The stronger the teamwork, the better the software, and the better the software the greater the sense of contribution among all team members." -- Bruce Watson



[1] For more info, look up Sparse distributed memory.
[2] "Different men confronting the same range of phenomena... describe and interpret them in different ways." "The particular conclusions he does arrive at are probably determined by his prior experience in other fields, by the accidents of his investigation, and by his own individual makeup.... An apparently arbitrary element, compounded of personal and historical accident, is always is always a formative ingredient of the beliefs espoused by a given scientific community at a given time." -- Thomas Kuhn, Structure of Scientific Revolutions
[3] Substitution heuristic and representativeness heuristic. Daniel Kahneman, Thinking Fast & Slow