Tuesday, July 30, 2013

Abstractions in adulthood

In Surfaces & Essences[1], Hofstadter talks a lot about abstraction. Adults abstract more than children. For evidence, take the Tower of Hanoi game: move the stack to another peg, moving one disk at a time and never putting a bigger disk atop a smaller disk. It takes 7 moves for 3 disks.
When 8-yr-old children solve this puzzle, they often take many more moves, because they impose on themselves another restriction: move the disks one peg at a time. They never move the disk directly from the left peg to the right peg.

When thinking about journeys, adults abstract them to a state change. I was in St. Louis, now I'm in Kansas City. Kids don't separate that from the trip. Did we stop at McDonald's in Kingdom City or Cracker Barrel in Columbia? To a kid, there is no going to Kansas City without these details.



Data takes journeys through our code. We can concern ourselves with the details of getting from here to there -- instantiate the new structure, populate it in a for-loop -- or we can declare where we want to end up and let the internals handle the rest -- calling filter or map on a sequence.

Some people find the details of the journey to be the interesting part of programming. Run a tiny bit faster, or in half the memory, or memoize certain bits. I'm glad these people exist, so they can write libraries and I can use the libraries and ignore all this most of the time.

Stating the end result, and not the means of getting there, is what Bret Victor calls "goal-based programming." [2] I call it declarative style. This is a tenant of functional programming.

But there's a catch! Sometimes the kids are right, and the journey is important.
Take that trip from STL to KC. If I abstract the trip to a simple location change, me.inKC(), am I missing something important?
Maybe. If I took the bus or the train, then I can take the same trip over and over, and it doesn't make a difference to anybody. No matter how many times I take the bus to KC, I'm in KC. The process is idempotent: calling it once is the same as calling it over and over. But if I came by car, and got a speeding ticket, there's a side effect! Do that four times and I'm going to jail in Boonville.
Or what if I implemented the journey as "drive 200 miles west"? Then the outcome would depend on my current location. me.inKC().inKC() would return me in Hays, KS. In this case, the journey is not referentially transparent. The output of the function depends on something other than its input, so the result isn't the same every time it's called. (yes, I'm stretching the metaphor in questionable ways. Be amused.)

Therefore, the journey -- the "how" -- can be implemented in a way that it matters, or in a way that it doesn't. If an algorithm writes to the log or the database, such that how many times it's called affects the world, then we have to care about it. If the code reads from the database or other global state, then it won't do the same thing every time and we have to care about it.

We choose how to implement our journeys. We can keep them well-behaved: we can take public transit and specify the destination. That way we can leave the details behind the scenes; my travel agent can book a bus or a train, whatever's cheaper that day, because all that matters is at the end of the journey I'm in KC. This leaves both me (the writer of code) and future-me (the reader of code) with more mental space to spend on why I'm in KC, and what to do once I'm there.

As @runarorama once said: "Write it functionally. Like an adult."

----
[1] Surfaces & Essences, Analogy as the Fuel and Fire of Thinking. I don't recommend the book; it's incredibly slow. Follow me on twitter instead.
[2] Bret Victor, The Future of Programming.

Tuesday, July 23, 2013

Aqueductron - toying with dataflow in Ruby


I love playing with Ruby because it lets me express concepts clearly[1]. In my aqueductron gem, two concepts are expressed. It's about processing data, and about code modifying code, all without modifying anything[2].

The metaphor of Aqueductron is an aqueduct. Data is the water, taking the form of droplets flowing through the ducts. Each piece of duct might be a filter, or a map. At the end, there's a collector of some kind.

The interesting parts of dataflow are where the metaphor breaks down. For instance, take a duct with a split in it. With real water, each drop will take exactly one path. With data, each drop can go down all paths, or none.


In the real world, an aqueduct is constructed before the water ever flows, and the aqueduct stays the same forever. In aqueductron, a duct piece can change with every drop. The split can add paths as it encounters new information.

  

When the delta between drops is more interesting than the drops themselves, pieces can alter themselves based on what comes through. For added challenge and sanity, aqueductron does this without mutating state.

Since the ducts can change as data flows, it is useful to see what the aqueduct looks like in between drops. The Ruby REPL is handy, and aqueductron is equipped with ASCII art.


Create new paths

For the Lambda Lounge code shootout recently, we implemented some problems from Rosalind.info. Here's the simplest one, generalized from "count nucleotides" to "count all the characters that you see." My code is explained in detail here.

Construct a pipe that expands a string into its characters, then creates a path for each unique letter. It is empty at first.

2.0.0> a = Duct.new.expand(->(s) {s.each_char}).
                    partition(->(a) {a.upcase}
                              ->(a) {Duct.new.count})
 => Duct:
--- / 
 ~ <  +?
--- \  

Send it one string, and the duct creates new paths.

2.0.0> b = a.drip("ACtT")
 => Duct:
      ---\
        # (1)
      ---/
--- / ---\
 ~ <   C  # (1)
--- \ ---/
      ---\
       T  # (2)
      ---/
      +? 

Modify existing paths

Another simple Rosalind problem describes rabbit populations as a modified fibonacci sequence. There's a multiplier (k) applied to the penultimate number as it's added to the last one to generate the next Fibonacci number. In aqueductron, the duct can learn from the data coming through, changing as the data flows in, each one generation's rabbit population. When it's time to make predictions, the pipe uses what it learned. In this case, the pieces are decorated with a description of the function inside them. Code details are here.

Build an empty pipe:

2.0.0> rabbits = Duct.new.custom(empty_fib_function).last
 => Duct:
---\
 ~  last
---/ 

Then drip information about the generations through, the duct learns.

2.0.0> rabbits.drip(2)
 => Duct:
-----\
 2..  last (2)
-----/ 
2.0.0> rabbits.drip(2).drip(5)
 => Duct:
-------\
 2,5..  last (5)
-------/ 
2.0.0> rabbits.drip(2).drip(5).drip(9)
 => Duct:
------------------------\
 ..5,9.. starting k~2.0  last (9)
------------------------/ 

When asked to predict future generations, the duct uses what it has learned.

2.0.0> rabbits.drip(2).drip(5).drip(9).drip(:unknown)
 => Duct:
----------------\
 ..9,19.. k=2.0  last (19)
----------------/ 

Learning dataflow

The part where the flow changes with the data fascinates me. That it changes itself without mutating state fascinates me even more. These are the concepts explored in aqueductron. Look for more aquedutron on this blog, past and future.

--------------
[1] where my target audience is devs (like me) who are more comfortable with objects than Lisps.
[2] it's Ruby, so forcing immutability is a lot of work. Since I'm going for clarity, aqueductron is immutable by choice, not by compiler restriction.