For every piece of code, there is another piece of code that answers the question, "How do I know that code works?" Sometimes that's more work than the code itself -- but there is hope.
The other day, I made a program to copy some code from one project to another - two file copies, with one small change to the namespace declaration at the top of each file. Sounds trivial, right?
I know better: there are going to be a lot of subtleties. And this isn't throwaway code. I need good, repeatable tests.
Where do I start? Hmm, I'll need a destination directory with the expected structure, an empty source directory, files with the namespace at the top... oh, and cleanup code. All of these are harder than I expected, and the one test I did manage to write is specific to my filesystem. Writing code to verify code is so much harder than just writing the code!
Testing side-effecting code is hard. This is well established. It's also convoluted, complex, generally brittle.
The test process looks like this:
Before the test, create the input AND go to the filesystem, prepare the input and the spot where output is expected.
After the test, check the output AND go to the filesystem, read the files from there and check their contents.
Everything is intertwined: the prep, the implementation of the code under test, and the checks at the end. It's specific to my filesystem. And it's slow. No way can I run more than a few of these each build.
The usual solution to this is to mock the filesystem. Use a ports-and-adapters approach. In OO you might use dependency injection; in FP you'd pass functions in for "how to read" and "how to write." This isolates our code from the real filesystem. Test are faster and less tightly coupled to the environment. The test process looks like this:
Before the test, create the input AND prepare the mock read results and initialize the mock for write captures.
After the test, check the output AND interrogate the mock for write captures.
It's an improvement, but we can do better. The test is still convoluted. Elaborate mocking frameworks might make it cleaner, but conceptually, all those ties are still there, with the stateful how-to-write that we pass in and then ask later, "What were your experiences during this test?"
If I move the side effects out of the code under test -- gather all input beforehand, perform all writes afterward -- then the decisionmaking part of my program becomes easier and more clear to test. It can look like this (code):
The input includes everything my decisions need to know from the filesystem: the destination directory and list of all files in it; the source directory and list plus contents of all files in it.
The output includes a list of instructions, for the side effects the code would like to perform. This is super easy to check at the end of a test.
The real main method looks different in this design. It has to gather all the input up front, then call the key program logic, then carry out the instructions. In order to keep all the decisionmaking, parsing, etc in the "code under test" block, I keep the interface to that function as close as possible to that of the built-in filesystem-interaction commands. It isn't the cleanest interface, but I want all the parts outside "code-under-test" to be trivial.
With this, I answer "How do I know this code works?" in two components. For the real-filesystem interactions, the documentation plus some playing around in the REPL tell me how they work. For the decisioning part of the program, my tests tell me it works. Manual tests for the hard-to-test bits, lots of tests for the hard-to-get-right bits. Reasoning glues them together.
Of course, I'm keeping my one umbrella test that interacts with the real filesystem. The decisioning part of the program is covered by poncho tests. With an interface like this, I can write property-based tests for my program, asserting things like "I never try to write a file in a directory that doesn't exist" and "the output filename always matches the input filename."
As a major bonus, error handling becomes more modular. If, on trying to copy the second file, it isn't found or isn't valid, the second write instruction is replaced with an "error" instruction. Before any instructions are carried out, the program checks for "error" anywhere in the list (code). If found, stop before carrying out any real action. This way, validations aren't separated in code from the operations they apply to, and yet all validations happen before operations are carried out. Real stuff happens only when all instructions are possible (as far as the program can tell). It's close to atomic.
There are limitations to this straightforward approach to isolating decisions from side-effects. It works for this program because it can gather all the input, produce all the output, and hold all of it in memory at the same time. For a more general approach to this same goal, see Functional Programming in Scala.
Moving all the "what does the world around me look like?" side effects to the beginning of the program, and all the "change the world around me!" side effects to the end of the program, we achieve maximum testability of program logic. And minimum convolution. And separation of concerns: one module makes the decisions, another one carries them out. Consider this possibility the next time you find yourself in testing pain.
The code that inspired this approach is in my microlib repository.
Umbrella test (integration)
Poncho tests (around the decisioning module) (I only wrote a few. It's still a play project right now.)
Code under test (decisioning module)
Instruction carrying-out part
Diagrams made with Monodraw. Wanted to paste them in as ASCII instead of screenshots, but that'd be crap on mobile.
 This is Clojure, so I put the "contents of each file" in a delay. Files whose contents are not needed are never opened.
 I haven't written property tests, because time.