Friday, September 21, 2012

When is code data, and when is it code?

The first tenet of functional programming: code is data. Store it in variables, pass it as parameters, return it from methods.

This is challenging in the JVM, where all code lives inside methods on objects. Therefore, Scala passes around code in a function object. Code lives in the apply() method.

When is a function object created? It is not always clear when we're reading Scala code: what is assigned to an ordinary method on a class? what will be executed immediately? what gets wrapped in a function object for storing and passing around?
This post describes how to get from executable code to a method; how to get from executable code or a method to a function literal; and how to get from a function literal or method back to executable code. Along the way I'll mention three exceptions to the rule "a method is executed every time it is referenced," show how to pass around a catch clause, and interrogate an innocent citizen.

Since all code in Scala is in a class (even scripts get wrapped), let's start there.

Code in a class: now or later?

Expressions inside a class body are executed on instantiation as part of the default constructor. The two exceptions are methods (declared with def)and function literals. This can surprise the unwary coder.

If a block or line of code comes after val x =, then it's getting executed once, right now. If it comes after def x =, then it gets executed later, whenever the method is called.

Say we want to create a wiretap on a citizen. The citizen is mutable. The following class will initialize the val phone only once; if Sally's home phone number is updated, the wiretap will never see it. However, name is defined as a method; if Sally's name changes, her wiretap's name property will reflect this.

class Wiretap(target: Citizen) {

    val phone = { target.homePhone }

    def name = target.name
}

Scala's Principle of Uniform Access means that users of Wiretap don't have to care whether name and phone are methods or fields, they're just properties.[1] But the writers of Wiretap had better be careful!

Function Literals

You can always defer execution of code by wrapping it in a function literal. There's a straightforward way to do this, and there are sneaky ways.

Who doesn't love rockets?

rocket operator

Create functions explicitly with a rocket. You can put these into a val if you like, or pass them to methods like List.map which expect a function type.

val isLegal = (w: Wiretap) => w.toString.length > 1

(We set the bar pretty low for a wiretap to be considered legal.)

Using the ubiquitous underscore, we can make a function literal without a rocket:

val isLegal = (_:Wiretap).toString.length > 1


What function is expected as a parameter


Inside a parameter list where a function of specific parameters is expected, we can skip the parameter-type declaration. Scala will do everything it can to turn what you put in that parameter into the expected function type.

val legalWiretaps = listOfWiretaps.filter( _.toString.length > 1 )

Scala will convert a method into a function if the method is referenced where a function of the same type is expected. This is one of three exceptions to "methods are invoked every time they are mentioned."

object Wiretap {
   def isLegal(w : Wiretap) = w.toString.length > 1
}

listOfWiretaps.filter(Wiretap.isLegal)

Because filter expects a function of type Wiretap => Boolean, Scala recognizes that the method matches this signature and wraps the method in a function literal for you.

When you want a method treated as a function literal


You can tell Scala to treat the method as a function type -- code-as-data rather than code to be executed -- if you follow the method name with an underscore.[2] The underscore triggers the Scala compiler to wrap the method instead of invoking it.

scala> val targetNameFunction = sally.name _ 
                    // the near-invisible underscore is critical
targetNameFunction: () => String = <function0>


targetNameFunction is now a function that returns string. Every time it is applied, sally.name will be called again.

This is the second exception to "every time a method is referenced, it is invoked. The last one, sneakiest of all, is described in the next section.

Where the method you call sequesters your expression


In Java, when you call a method, all the parameter expressions are evaluated before they are passed in. This is eager evaluation. In Scala, the writer of a method can specify "when they pass me something, don't bother figuring out what they passed me - just give me that code so I can evaluate it when I please." This is called (for reasons I don't understand) a by-name parameter. It's cool because it lets Scala developers define our own control structures like loops and conditionals. It's uncool because the caller of the method may not realize that she just passed in a function literal; it looks the same to her. That expression might be evaluated once, or many times, or not at all.

Here's an evil method:

   def evil(sneakyMethodThing : => String) = {
      println("Time to do some bad stuff")
      sneakyMethodThing == "I didn't do it" || sneakyMethodThing == "You can't prove anything"
   }

When I call this with an expression that interrogates Sally...

evil( officer.interrogate(sally) )

... then in this one line, Sally might get interrogated twice!

The code officer.interrogate(sally) looks to the caller of evil like an expression that will be evaluated immediately. But really it gets wrapped up inside an object and passed in to the evil method. Yet inside the evil method, sneakyMethodThing doesn't behave like a function object; you don't have to use () to apply it. It behaves more like a method: every time sneakyMethodThing is referenced, it is invoked.

evil: Time to do some bad stuff.
officer: "Why did you do it?"
sally: "I don't know what you're talking about."
[evil observes that this does not equal "I didn't do it" and continues evaluating]
officer: "Why! Why!"
sally: "I told you, I don't know!" [sobs]
[evil returns false]

Watch out for these by-name parameters, and be careful about passing in expressions (like this one) with potentially damaging side-effects.

Why should a function cover every case?


This is a strange animal. A partial function literal is one or more case statements in a block - like a chunk of a match expression. Catch expressions use these.

try {
  // some stuff
} catch {
   case ex : IllegalArgumentException => "I'm a dork"
   case ex : Throwable => "The world is over"
}

The part after the catch clause is a partial function. Check this out - you can put that piece in a value:

val standardExceptionHandling : PartialFunction[Throwable, Any] = {
   case ex : IllegalArgumentException => "I'm a dork"
   case ex : Throwable => "The world is over"
}

... and then use it:
try {
// some stuff
} catch standardExceptionHandling

That's cool!

The other place partial functions are useful is when you want to use a pattern to extract data from the parameter to a function.

someMapOfThings.foreach {
    case (key, value) => /* do something with key and value */ }

Be careful here to use a pattern that will match every input. Otherwise, a MatchError can bite you at runtime.

The PartialFunction is a function object with a bonus method: isDefinedAt returns a boolean indicating whether the function has an answer for the supplied input. These are interesting in other ways, but this post is about syntax. It's time to take this the other way and turn functions and methods back into executable code.

How to make it go


To wrap it up, go the other way: call methods and functions to execute the code inside of them.

Methods and by-name parameters are called every time they are referenced. Three exceptions: when they appear in a parameter list where a corresponding function object is expected; when they are followed by an underscore; and when they are received into a by-name parameter.

Functions are applied in two ways:
  1. follow the identifier with a parameter list. If the function accepts no arguments, then the parameter list is an empty set of parens. Unlike methods, you can't leave off the parentheses.
  2. call the apply method directly
val k = (_:String).length
  1. k("hi")
  2. k apply "hi"

There's a peculiar consequence to the first syntax here. Anything with an apply method is a function - including stuff like lists and strings and arrays.[3] If you follow a string with a parameter list, you'll get a result.

scala> "Sally"(0)
res10: Char = S

Perhaps even more startling, if you pass a string where a function is expected, it'll accept it.

scala> Range(0,5).map("Sally")
res11: scala.collection.immutable.IndexedSeq[Char] = Vector(S, a, l, l, y)

But that string is a partial function, so it'll blow up if you apply it to something out of range.

Whew


Scala juggles the code around among methods, objects, and expressions to achieve its mix of imperative and functional support. Conceptually, code can be passed as data, even on the JVM. Those concepts aren't always crystal-clear to the reader, though, so be careful.


[1] for added confusion, in a class, both methods (def) and fields (val) are implemented as methods. The val becomes a getter, essentially, that returns an invisible constant field.

[2] Scala's error messages refer to this as "partially applied function," but I dispute the use of this term. For instance, try to assign a method with parameters to a val:

scala> val e = Math.pow
<console>:8: error: missing arguments for method pow in class MathCommon;
follow this method with `_' if you want to treat it as a partially applied function

Again, saying "Math.pow _" will wrap the method in a function type and store that in the val. I think this is not partial application; @bvenners says it's eta-expansion. @dickwall says it is an edge-case of partial application, supplying 0 arguments.

[3] Or there's an implicit conversion that turns them into something that implements PartialFunction. Always tricky in Scala.

1 comment:

  1. Conceptually, code can be passed as data, even on the JVM. Those concepts aren't always crystal-clear to the reader, though, so be careful.

    ReplyDelete