Monday, October 6, 2014

A monadically built generator

Today, I wanted to write a post about code that sorts a vector of maps. But I can't write that without a test, now can I? And not just any test -- a property-based test! I want to be sure my function works all the time, for all valid input. Also, I don't want to come up with representative examples - that's too much work.[1]

The function under test is a custom-sort function, which accepts a bunch of rows (represented as a sequence of hashmaps) and a sequence of instructions: "sort by the value of A, descending; then the value of B, ascending."

To test with all valid input, I must write code to generate all valid input. I need a vector of maps. The maps should have all the same keys. Some of those keys will be sort instructions. The values in the map can be anything Comparable: strings and ints for instance. Each instructions also includes a direction, ascending or descending. That's a lot to put together.

For property-based (or "generative") tests in Clojure, I'll use test.check. To test a property, I must write a generator that produces input. How do I even start to create a generator this complicated?

Bit by bit! Start with the keys for the maps. Test.check has a generator for them:

(require '[clojure.test.check.generators :as gen])
gen/keyword ;; any valid clojure keyword.

The zeroth secret: I dug around in the source to find useful generators. If it seems like I'm pulling these out of my butt, well, this is what I ate.

Next I need multiple keywords, so add in gen/vector. It's a function that takes a generator as an argument, and uses that repeatedly to create each element, producing a vector.

(gen/vector gen/keyword) ;; between 0 and some keywords

The first secret: generator composition. Put two together, get a better one out.

Since I want a set of keys, not a vector, it's time for gen/fmap ("functor map," as opposed to hashmap). That takes a function to run on each produced value before giving it to me, and its source generator.

(gen/fmap set (gen/vector gen/keyword)) ;; set of 0 or more keywords

It wouldn't do for that set to be empty; my function requires at least 1 instruction, which means at least one keyword. gen/such-that narrows the possible output of the generator. It takes a predicate and a source generator:

(gen/such-that seq (gen/fmap set (gen/vector gen/keyword)))

If you're not a seasoned Clojure dev: seq is idiomatic for "not empty." Historical reasons.

This is enough to give me a set of keys, but it's confusing, so I'm going to pull some of it out into a named function.

(defn non-empty-set [elem-g
  (gen/such-that seq (gen/fmap set (gen/vector elem-g))))

Here's the generator so far:
(def maps-and-sort-instructions
  (let [set-of-keys  (non-empty-set gen/keyword)]
     set-of-keys)

See what it gives me:
=> (gen/sample maps-and-sort-instructions
   ;; sample makes the generator produce ten values
(#{:Os} #{:? :f_Q_:_kpY:+:518} #{:? :-kZ:9_:_?Ok:JS?F} ....)

Ew. Nasty keywords I never would have come up with. But hey, they're sets and they're not empty.

To get maps, I need gen/hash-map. It wants keys, plus generators that produce values; from these it produces maps with a consistent structure, just like I want. It looks like:

(gen/hash-map :one-key gen-of-value :two-key gen-of-this-other-value ...)

The value for each key could be anything Comparable really; I'll settle for strings or ints. Later I can add more to this list. There's gen/string and gen/int for those; I can choose among them with gen/elements.

(gen/elements [gen/string gen/int]) ;; one of the values in the input vector

I have now created a generator of generators. gen/elements is good for selecting randomly among a known sequence of values. I need a quantity of these value generators, the same quantity as I have keys.

(gen/vector (gen/elements [gen/string gen/int]) (count #??#)) 
  ;; gen/vector accepts an optional length

Well, crap. Now I have a dependency on what I already generated. Test.check alone doesn't make this easy - you can do it, with some ugly use of gen/bind. Monads to the rescue! With a little plumbing, I can bring in algo.monad, and make the value produced from each generator available to the ones declared after it.

The second secret: monads let generators depend on each others' output.

(require '[clojure.algo.monads :as m])
(m/defmonad gen-m
    [m-bind gen/bind
     m-result gen/return])

(def maps-and-sort-instructions
 (m/domonad gen-m
   [set-of-keys (non-empty-set gen/keyword)
    set-of-value-gens (gen/vector  
                       (gen/elements [gen/string gen/int]) 
                       (count set-of-keys))]
    [set-of-keys, set-of-value-gens])

I don't recommend sampling this; generators don't have nice toStrings. It's time to put those keys and value-generators together, and pass them to gen/hash-map:

(apply gen/hash-map (mapcat vector set-of-keys set-of-value-generators))
  ;; intersperse keys and value-gens, then pass them to gen/hash-map

That's a generator of maps. We need 0 or more maps, so here comes gen/vector again:

(def maps-and-sort-instructions
 (m/domonad gen-m
  [set-of-keys (non-empty-set gen/keyword)
   set-of-value-gens (gen/vector  
                      (gen/elements [gen/string gen/int]) 
                      (count set-of-keys))
   some-maps (gen/vector 
              (apply gen/hash-map 
               (mapcat vector set-of-keys 
                              set-of-value-gens)))]
  some-maps))

This is worth sampling a few times:
=> (gen/sample maps-and-sort-instructions 3) ;; produce 3 values
([] [] [{:!6!:t4 "à$", :*B 2, :K0:R*Hw:g:4!? ""}])

It randomly produced two empty vectors first, which is fine. It's valid to sort 0 maps. If I run that sample more, I'll see vectors with more maps in them.
Halfway there! Now for the instructions. Start with a subset of the map keys - there's no subset generator, but I can build one using the non-empty-set defined earlier. I want a non-empty-set of elements from my set-of-keys.

(non-empty-set (gen/elements set-of-keys)) 
  ;; some-keys: 1 or more keys. 

To pair these instruction keys with directions, I'll generate the right number of directions. Generating a direction means choosing between :ascending or :descending. This is a smaller generator that I can define outside:

(def mygen-direction-of-sort 
      (gen/elements [:ascending :descending])) 

and then to get a specific-length vector of these:

(gen/vector mygen-direction-of-sort (count some-keys)) 
   ;; some-directions

I'll put the instruction keys with the directions together after the generation is all complete, and assemble the output:

(def maps-and-sort-instructions
 (m/domonad gen-m
  [set-of-keys (non-empty-set gen/keyword)
   set-of-value-gens (gen/vector  
                      (gen/elements [gen/string gen/int]) 
                      (count set-of-keys))
   some-maps (gen/vector 
              (apply gen/hash-map 
               (mapcat vector set-of-keys 
                              set-of-value-gens)))
   some-keys (non-empty-set (gen/elements set-of-keys)) 
   some-directions (gen/vector mygen-direction-of-sort 
                               (count some-keys))]
        
   (let [instructions (map vector some-keys some-directions)] 
                           ;; pair keys with directions
    [some-maps instructions]))) ;; return maps and instructions

There it is, one giant generator, built of at least 11 small ones. That's a lot of Clojure code... way too much to trust without a test. I need a property for my generator!

What is important about the output of this generator? Every instruction is a pair, every direction is either :ascending or :descending, and every key in the sort instructions is present in every map. I could also specify that the values for each key are all Comparable with each other, but I haven't yet. This is close enough:

(def sort-instructions-are-compatible-with-maps
  (prop/for-all
    [[rows instructions] maps-and-sort-instructions]
    (every? identity (for [[k direction] instructions
                          ;; break instructions into parts
              (and (#{:ascending :descending} direction
                    ;; Clojure looks so weird
                   (every? k rows)))))) 
                          ;; will be false if the key is absent

(require '[clojure.test.check :as tc])
(tc/quick-check 50 sort-instructions-are-compatible-with-maps)
;; {:result true, :num-tests 50, :seed 1412659276160}

Hurray, my property is true. My generator works. Now I can write a test... then maybe the code... then someday the post that I wanted to write tonight.

You might roll your eyes at me for going to these lengths to test code that's only going to be used in a blog post. But I want code that works, not just two or three times but all the time. (Write enough concurrent code, and you notice the a difference between "I saw it work" and "it works.") Since I'm working in Clojure, I can't lean on the compiler to test the skeleton of my program. It's all on me. And "I saw it work once in the REPL" isn't satisfying.

Blake Meike points out on Twitter, "Nearly the entire Internet revolution... is based on works-so-far code." So true! It's that way at work. Maybe my free-time coding is the only coding I get to do right. Maybe that's why open-source software has the potential to be more correct than commercial software. Maybe it's the late-night principles of a few hungry-for-correctness programmers that move technology forward.

Nah.

But it does feel good to write a solid property-based test.

-------------
[1] Coming up with examples is "work," as opposed to "programming."

Code for this post: https://github.com/jessitron/sortificate/blob/generator-post/test/sortificate/core_test.clj

1 comment: