Monday, August 3, 2015

Data-in, Data-out

In functional programming, we try to keep our functions data-in, data-out: they take some data as parameters, return some data as output, and that's it. Nothing else. No dialog boxes pop, no environment variables are read, no database rows are written, no files are accessed. No global state is read or written. The output of the function is entirely determined by the values of its input. The function is isolated from the world around it.

A data-in, data-out function is highly testable, without complicated mocking. The test provides input, looks at the output, and that's all that it needs for a complete test.[1]

A data-in, data-out function is pretty well documented by its declaration; its input types specify everything necessary for the function to work, its output type specifies the entire result of calling it. Give the function a good name that describes its purpose, and you're probably good for docs.

It's faster to comprehend a data-in, data-out function because you know a lot of things it won't do. It won't go rooting around in a database. It won't interrupt the user's flow. It won't need any other program to be running on your computer. It won't write to a file[2]. All these are things I don't have to think about when calling a data-in, data-out function. That leaves more of my brain for what I care about.

If all of our code was data-in, data-out, then our programs would be useless. They wouldn't do anything observable. However, if 85% of our code is data-in, data-out, with some input-gathering and some output-writing and a bit of UI-updating -- then our program can be super useful, and most of it still maximally comprehensible. Restricting our code in this way when we're writing it provides more clarity when we're reading it and freedom when we're refactoring it.
Think about data-in, data-out while you're coding; make any dependencies on the environment and effects on the outside world explicit; and write most of your functions as transformations of data. This gets you many of the benefits of functional programming, no matter what language you write your code in.

[1] Because the output is fixed for a given input, it would be legit to substitute the return value for the function-call-with-that-input at any point. Like, one could cache the return values if that helped with performance, because it's impossible for them to be different next time, and it's impossible to notice that the function wasn't called because calling it has no externally-observable effect. Historically, this property is called referential transparency.

[2] We often make an exception for logging, especially logging that gets turned off in production.


  1. Hi Jessica, I was present at your talk at Ordina last July, where this was one of your items. It's pretty cool how you describe to maximize cohesion of a function while still keeping it simple :).

    By the way, I talked to you about exception handling in Java during the break of your talk and created a post about it on my blog: I hope you like it :).

    I hope to visit more of your talks when I'm in the neighbourhood.

    1. This is great! Thank you for posting it! And thank you for the good discussion.

  2. In the context of OOP the Law of Demeter may be useful.

  3. I like your diagrams. I gave a talk at my college (I'm an undergrad student) about this very topic, I think it might be one of, if not the, best way to introduce functional programming.