Monday, February 20, 2012

F# Snippet: sequence partitioning

This F# Snippet illustrates a way to divide a sequence into a sequence of shorter sequences of a fixed length. The snippet uses list comprehensions.

I don't like it, because it involves flipping through the original sequence over and over and over.

With recursion, we can do all the construction in one pass through the original list. (Yes, I have narrowed the scope from Seq to List. The snippet requires a finite-length sequence so this is not a loss.)

here's a simple implementation using a tail-recursive method to divide the list into groups of a fixed size.
// tail recursion
let rec build groupSize items outputSoFar currentGroup currentCount =
match items with
| [] -> outputSoFar @ [currentGroup]
| head :: tail when currentCount = groupSize -> build groupSize tail (outputSoFar@[currentGroup]) [head] 1
| head :: tail -> build groupSize tail outputSoFar (currentGroup@[head]) (currentCount + 1)

let buildGroups size items = build size items [] [] 0

buildGroups 2 [1..11] ;;

val it : int list list = [[1; 2]; [3; 4]; [5; 6]; [7; 8]; [9; 10]; [11]]

Of course, this is really a fold operation. Using List.fold ensures we really are tail-recursive.

// same thing, using List.fold
let buildWithFoldFunction groupsize (outputSoFar, currentGroup,currentCount) item =
if (currentCount = groupsize) then
(outputSoFar@[currentGroup]), [item], 1
else
outputSoFar, (currentGroup@[item]), (currentCount + 1)

let buildWithFold groupSize items =
let (output,lastGroup,_) = List.fold (buildWithFoldFunction groupSize) ([],[],0) items
output@[lastGroup]

buildWithFold 2 [1..11]


... but this isn't really any better than the list-comprehension Snippet, because all those append operators (the @ signs) are causing copies of the original list to be made over and over.

Really, we need to start from the back of the input list. That way we can build the output list-of-lists from the back, using the cons operator, which is efficient. We can use foldBack:

// Now, using cons and foldBack
let buildWithFoldBack groupSize items =
let buildWithFoldBackFunction groupsize item (outputSoFar, currentGroup,currentCount) =
if (currentCount = groupsize) then
(currentGroup :: outputSoFar), [item], 1
else
outputSoFar, (item :: currentGroup), (currentCount + 1)
let lengthOfLastGroup = List.length items % groupSize
let imaginaryElementsInLastGroup = if lengthOfLastGroup = 0 then 0 else groupSize - lengthOfLastGroup
let (output,_,_) = List.foldBack (buildWithFoldBackFunction groupSize) items ([],[],imaginaryElementsInLastGroup) ;
output

let thingWithFoldBack = buildWithFoldBack 3 [1..11];;

val thingWithFoldBack : int list list = [[4; 5; 6]; [7; 8; 9]; [10; 11]]

The interesting bit there is how long that last list should be (the one with the leftovers). If the input list does not divide evenly by the group size, then we change the initial value of that accumulator piece -- we pretend that there are already a few items in the last list, so the function that builds output will make that one short.

The fact that I feel clever for doing it that way is a warning sign: clever code is hard-to-read code. However, to someone used to working in a functional style, this would probably make perfect sense.

*For math people: I would prefer to do a second mod operation to determine the imaginaryElementsInLastGroup, but it seemed even more obfuscated: let imaginaryElementsInLastGroup = (groupSize - (items.Length % groupSize)) % groupSize

That's kinda fun. It's a little illustration of how functional programming can be efficient and short if we think differently.

No comments:

Post a Comment