Tuesday, September 11, 2012

Iterators in Scala: glory and danger

In the beginning, there were Iterables. And in the old world of Java, they were good. We could perform the advanced for loop on them.

for (Thing item : listOfThings) {
   // do stuff to each item

And we could perform it again and again.

for (Thing item : listOfThings) {
   // do more stuff to each item

but what is this Iterable? What requirement must be met by the protagonist in the advanced for loop?

interface Iterable[T] {
    Iterator[T] iterator()

What is this inner concept, this offspring of the Iterable? Is this Iterator of the same species, and can we impose upon it in the same manner?

In Java, clearly no.

for (Thing item: listOfThings.iterator()) { 
             // compile error: foreach not applicable

Yet, an Iterator can be useful to us. It can provide the contents of a file - scala.io.Source in Scala  - or draw from any data source local or remote.

In Scala, more possibilities are open to us. We can use the Iterator in a for-expression. We can map over it, filter it, and many other operations that apply equally to Iterable.

val lineIterator = Source.fromFile("build.sbt").getLines

for( line <- lineIterator) yield {
   // something interesting about each line

Yet! Beware! Scala tempts us to use Iterators interchangeably with Iterables. There is danger here. We are spoiled by traversing Iterables such as lists.

val usefulThings = listOfThings.filter(isUseful(_)) // great

val blueThings = listOfThings.filter(isBlue(_)) // great

Most of the collections we use in Scala are immutable and stateless. We can traverse them all day; they continue to supply iterator after iterator upon request. Lists and Sequences implement the trait Traversable.
With iterators, beware! Once accessed, they are forever altered. Iterators are inherently stateful. Every call to next() yields a different result. They implement the trait TraversableOnce.

val lineIterator = Source.fromFile("build.sbt").getLines

val usefulLines = lineIterator.filter(isUseful(_)) // danger!

We now hold two references to the lineIterator. One is contained inside usefulLines, itself an iterator that holds a reference to lineIterator. Any access to usefulLines - even calling toString, which prints whether it is empty! - will change the state of lineIterator. Once you apply any operation to an Iterator, its internal state is unpredictable. Throw away its reference. The Iterator is TraversableOnce.

The immutability of Lists and Maps and Vectors can spoil us. We must remember to track a val which identifies a mutable object such as an Iterator.
With such danger lurking, why should we use an Iterator in Scala?

Sometimes state is needed. Since Iterators are inherently stateful, they are a reasonable place to encapsulate state.
One day, not so long ago, I wished to access the Twitter search API. The tweets were to be retrieved repeatedly, each time excluding tweets already retrieved. The "since" parameter to the Twitter API accomplishes this passed the ID of the last tweet already retrieved. This ID is state. This state can be encapsulated in an iterator.

class TwitterIterator extends Iterator[Tweet] {
      var lastTweetReceived = "0" // mutable state
      def hasNext = true // the twitter stream has no end

      def next() = {
        val tweet = retrieveNextTweet(lastTweetReceived)
        lastTweetReceived = tweet.id

Two other problems which barred my path in the last fortnight yielded to this same solution: encapsulate the state in a custom iterator. Thus, Iterators are an idiom of usefulness. And of caution! In Scala, Iterators appear to be interchangeable with Iterables, but they most decidedly are not. Once an iterator is given to any function or expression, it shall be dead to you. Access it not except through the result of the single expression in which it appears. Store it not in a val, lest you or your compatriots attempt to access it and render it forever into an unknowable state.

1 comment:

  1. A very nicely explained article and, thanks to the language of foreboding and incantation, one that should stick in my frontal cortex.