Sunday, March 31, 2013

Passing functions in Ruby: harder than it looks

People say, "Oh, Ruby has functional programming! We pass blocks around all the time!"

I'm sorry to inform you: Ruby blocks are not first-class functions.

Functional programming is so called because we find it useful to pass functions as values. When we do that, we expect that the function can be called and:
  • it'll get its own level on the stack
  • when it returns, that level of the stack will go away
  • this stack is the same one used by the caller.
When a function returns, whoever calls it gets the return value and goes on its merry way. Sounds easy, right?

Yeah, not in Ruby. See, the common case in Ruby is to pass code in as a block, something like this:
format_all_the_things { |a| "format #{a} into a string" }
Then inside the higher-order function, the block of code is called using yield:
def format_all_the_things
   one = yield "thing 1"
   two = yield "thing 2"
   [one, two].join " & "
end

Here, the block of code is an invisible parameter. It's a little like a first-class function, but not really. For one, blocks don't get their own level on the stack. For another, they don't execute in the same stack as the caller. Best I can tell, this "yield" keyword fires up a coroutine, which gets its own stack. Two violations of my expectations of a first-class function

Danger: return

Use the return keyword in a block, and you invite all kinds of trouble. See, return means it's time to knock a level off the stack. Since the block didn't get its own level in the stack, it has to return from the caller. But in this case, since it was started in a coroutine, there's nothing there to go back to -- LocalJumpError.
 > format_all_the_things { |a| return "yay #{a}" }
LocalJumpError: unexpected return
So, return is your enemy in blocks. Also in Procs.
this is even stranger
wrap the whole thing in a lambda and the block doesn't local jump error; it happily returns from the lambda when it hits the first return keyword inside a block. There's some magic going on here. This works for blocks, but not Procs. Can anyone explain this to me?

> lam = -> { format_all_the_things { |a| return "yay #{a}" } }
> lam.call
 => "yay thing 1"
Ruby has a third option: lambdas. These behave much more like the functions that I know and love.
lam = lambda { |a| return "yay #{a}" }
Lambdas do (seem to) get their own level on the stack, and when they return they return from themselves, not their caller.
Unfortunately this still doesn't work with yield.
> format_all_the_things &lam
LocalJumpError: unexpected return
Dangit! I thought I could trust lambdas, but I was wrong.
Turns out even lambdas behave sanely with return only if accessed using .call() instead of yield. This is the friendly way:
def format_all_the_things(&formatter)
   one = formatter.call("thing 1")
   two = formatter.call("thing 2")
   [one, two].join " & "
 end

format_all_the_things &lam
 => "yay thing 1 & yay thing 2" 
This is the right way to pass and use functions as values in Ruby: lambdas and .call().

Next

"Just don't use return!" you may say. Ruby devs are wonderfully disciplined at avoiding the language's less respectable features.

Sometimes the code is clearer if you can exit the function as soon as you know what the return value is. Especially with one of those handy post-statement ifs:
return "REDACTED" if (a.contains_sensitive_information)
Oddly, there is a control statement that appears to do exactly what I want return to do, and that's next. No longer a loop-control statement, it appears in Ruby 2.0 to be everything we could wish return to be.
> format_all_the_things { |a| next "REDACTED" if (a.include?("1")); "yay #{a}" }
 => "REDACTED & yay thing 2" 

More danger: Break

There are some circumstances where you can use break within a block to end Ruby's internal iteration early.
[4,5,-1,7].each { |n| break :invalid if n < 0 }
This works only with blocks, not with Procs (LocalJumpError) or lambdas (ignored). I think it's evil. Code that exercises flow control on its caller is not a function value.

Conclusion

If you're going to use a functional programming style in Ruby, use lambdas and invoke them with .call().

------
This is all in Ruby 2.0.

Here are my experiments in blocks, Procs, and lambdas with return, break, and next.

5 comments:

  1. On the topic of early return, that's really an imperative style, not functional. I'll freely admit passing functions around in Ruby can be less then ideal, but it's kinda not fair to ask for functional, and then try and embed an imperative instruction in the middle of it. In scheme and lisp if you wanted an early return you would be forced to resort to call/cc or unwind-protect. I believe catch/throw still works in Ruby if you need that, see https://github.com/rubyspec/rubyspec/blob/master/language/throw_spec.rb

    The other option for passing a method is to make use of Symbol#to_proc as given in examples here: http://pragdave.pragprog.com/pragdave/2005/11/symbolto_proc.html. Not so useful if it's necessary to capture the scope in the closure, but it still has it's uses. It does allow nesting a return in the method though, as a new scope is allocated for the method as normal, and then wrapped in a proc for the yield. Since a method is more apt to be imperative style, this works nicely.

    ReplyDelete
    Replies
    1. Oh, I love the syntax with the symbol. Almost as concise as Scala.

      You're right that early return isn't expression-oriented like good functional code. It bugs me that these keywords are available and yet not predictable.

      Delete
  2. I absolutely love posts like this! Keep it up.

    ReplyDelete
  3. Sorry for the thread necro, but I think I have some answers to your questions about why the return statement behaves strangely in procs and blocks.

    I always refer to this post when working with Ruby procs, blocks, and lambdas: http://eli.thegreenplace.net/2006/04/18/understanding-ruby-blocks-procs-and-methods/

    My understanding is that blocks have funny non-local return semantics because they're not really supposed to be considered as their own entity. In other words, when I write a function X() that maps over a list using a block, the code in that block is considered to be part of X(). So if the code in the block is part of X() and contains a return statement then it should return from X(), just like any other return statement inside X() would. The same reasoning applies to blocks used inside of lambdas.

    I believe that procs, like blocks, have non-local semantics because:

    def fun(list)
    list.each do |x|
    yield x
    end
    end

    is equivalent to:

    def fun(list, &block)
    list.each do |x|
    block.call(x)
    end
    end

    Except that the block has actually been converted to a proc in the second example. Since blocks support non-local returns therefore so must procs.

    When you use the & operator (called to_proc) on a lambda, it becomes a proc and inherits the non-local return semantics. Hence making it behave strangely in your example with the yield.

    Your advice to only use lambda with call() for confusion-free functional programming is right on the money :)

    ReplyDelete