Thursday, March 20, 2014

Weakness and Vulnerability

Weakness and vulnerability are different. Separate the concerns: [1]

Vulnerability is an openness to being wounded.
Weakness is inability to live through wounds.

In D&D terms: vulnerability is a low armor class, weakness is low hit points. Armor class determines how hard it is for an enemy to hit you, and hit points determine how many hits you can take. So you have a choice: prevent hits, or endure more hits.

If you try to make your software perfect, so that it never experiences a failure, that's a high armor class. That's aiming for invulnerability.

Thing is, in D&D, no matter how high your armor class, if the enemy makes a perfect roll (a 20 on a d20, a twenty-sided die), that's a critical hit and it strikes you. Even if your software is bug-free, hardware goes down or misbehaves.

If you've spent all your energy on armor class and little on hit points, that single hit can kill you.

Embrace failure by letting go of ideal invulnerability, and think about recovery instead. I could implement signal handlers, and maintain them, and this is a huge pain and makes my code ugly. Or I could implement a separate cleanup mechanism for crashed processes. That's a separation of concerns, and it's more robust: signal handlers don't help when the app is out of memory, a separate recovery does.

In the software I currently work on, I take the strategy of building safety nets at the application, process, subsystem, and module levels, as feasible.[3] Then while I try to get my code right, I don't convolute my code looking for hardware and network failures, bad data and every error I can conceive. There are always going to be errors I don't conceive. Fail gracefully, and pick up the pieces.

-----
An expanded version of this post, adding the human element, is on True in Software, True in Life.

-----
[1] Someone tweeted a quote from some book on this, on the difference between weakness and vulnerability, a few weeks ago and it clicked with me. I can't find the tweet or the quote anymore. Anyone recognize this?
[3] The actor model (Akka in my case) helps with recovery. It implements "Have you restarted your computer?" at the small scale.

5 comments:

  1. Regarding failure and recovery, what do you think of Nassim Taleb's categorization of fragility vs. resilience vs. anti-fragility? We definitely want at least resilience in software. Is there an analogue of anti-fragility?

    ReplyDelete
    Replies
    1. I haven't read the book. (Just downloaded the audio version in response to this.)
      There's anti-fragility in us as software writers, if we can let all our lessons teach us better ways of writing software, and don't let them make us fearful of innovating.

      Delete
    2. This comment has been removed by the author.

      Delete
  2. Designing software for ease of maintenance is equivalent to improving your saving throw,

    ReplyDelete
  3. This used to be popularly known as the three r's.

    Reliability - The capacity of a system to maintains it's own internal state, absence of outside influence.
    Robustness - The capacity of a system to resist or modulate changes coming from outside influence. Guard against
    Resilience - The capacity of a system to return to it's equilibrium state once disturbed.

    ReplyDelete