Rules are not easy

Sometimes in software design we get this idea, “We’ll make this a rule engine. Then the business can write the rules, and they’ll be able to change them without changing the code. That’ll make it more flexible.”

🤣

The rules are code; they change the behavior of the system. Rules interact in ways that are hard to anticipate. It’s harder to write rules than to write code.

It seems like we make business decisions in terms of rules, because we talk about them that way.

People make uncomplicated decisions by rule. We make complicated decisions by aesthetic (from expertise), and these are difficult or impossible to express in rules.

Real-life rules often contradict each other. A human with a feeling for the situation can prioritize among them.

For instance, “How do you position a picture in a column of text?” Back in the day, people laid out the newspaper pages, and they positioned them using some rules and also their eyes. How does a browser do it? Careful people have created nine precise rules for positioning float elements. Excerpt:

4. A floating box’s outer top may not be higher than the top of its containing block. When the float occurs between two collapsing margins, the float is positioned as if it had an otherwise empty anonymous block parent taking part in the flow. The position of such a parent is defined by the rules in the section on margin collapsing.

you don’t need to actually read this

If you think “Rules are declarative, they’re easier to reason about than imperative code” then go format a complicated web site with CSS. Make changes in the hundreds of lines of CSS, and see if you can predict the results. Now see if you can predict the results of changing someone else’s CSS.

Writing rules is hard. Designing a syntax and semantics that let people write rules to cover all the cases in the world, even harder. Do you really want to embark on that? Is it really more effective than changing some code when the business wants change?

As humans, we make aesthetic judgements for complicated decisions. This is one of our superpowers. Putting those judgements into rules is never easy; don’t pretend it is. And no, you don’t need to implement a rule engine.

Thanks to @nokusu for teaching me about floats and margins and other layout fun.

Correctness

How important is correctness?

This is a raging debate in our industry today. I think the answer depends strongly on the kind of problem a developer is trying to solve: is the problem contracting or expanding? A contracting problem is well-defined, or has the potential to be well-defined with enough rigorous thought. An expanding problem cannot; as soon as you’ve defined “correct,” you’re wrong, because the context has changed.

A contracting problem: the more you think about it, the clearer it becomes. This includes anything you can define with math, or a stable specification: image conversion, what do you call it when you make files smaller for storage. There are others: ones we’ve solved so many times or used so many ways that they stabilize: web servers, grep. The problem space is inherently specified, or it has become well-defined over time.
Correctness is possible here, because there is such a thing as “correct.” Programs are useful to many people, so correctness is worth effort. Use of such a program or library is freeing, it scales up the capacity of the industry as a whole, as this becomes something we don’t have to think about.

An expanding problem: the more you think about it, the more ways it can go. This includes pretty much all business software; we want our businesses to grow, so we want our software to do more and different things with time. It includes almost all software that interacts directly with humans. People change, culture changes, expectations get higher. I want my software to drive change in people, so it will need to change with us.
There is no complete specification here. No amount of thought and care can get this software perfect. It needs to be good enough, it needs to be safe enough, and it needs to be amenable to change. It needs to give us the chance to learn what the next definition of “good” might be.

Safety
I propose we change our aim for correctness to an aim for safety. Safety means, nothing terrible happens (for your business’s definition of terrible). Correctness is an extreme form of safety. Performance is a component of safety. Security is part of safety.

Tests don’t provide correctness, yet they do provide safety. They tell us that certain things aren’t broken yet. Process boundaries provide safety. Error handling, monitoring, everything we do to compensate for the inherent uncertainty of running software in production, all of these help enforce safety constraints.

In an expanding software system, business matters (like profit) determine what is “good enough” in an expanding system. Risk tolerance goes into what is “safe enough.” Optimizing for the future means optimizing our ability to change.

In a contracting solution, we can progress through degrees of safety toward correctness, optimal performance. Break out the formal specification, write great documentation.

Any piece of our expanding system that we can break out into a contracting problem space, win. We can solve it with rigor, even make it eligible for reuse.

For the rest of it – embrace uncertainty, keep the important parts working, and make the code readable so we can change it. In an expanding system, where tests are limited and limiting, documentation becomes more wrong every day, the code is the specification. Aim for change.

Areas of responsibility

“And the Delivery team is in charge of puppet….” said our new manager.

“Wait we’re in charge of WHAT?” – me

“Well I thought that it fits in with your other responsibilities.”

“That’s true. But we’re not working on it, we’re working on these other things. You can put whatever you want in our yellow circle, but that’s it.”

“The yellow circle?”

See, I model our team’s areas of responsibility as three circles. The yellow system is everything we’re responsible for — all the legacy systems and infrastructure have to belong to some team, and these are carved out for us. Some we know little about.
Inside the yellow circle is an orange circle: the systems we plan to improve. These appear on our backlog in JIRA epics. We talk about them sometimes.
Inside the orange circle, a red circle: active work. These systems are currently under development by our team. We talk about them every day, we add features and tests, we garden them.

That yellow circle holds a lot of risks: when something there breaks we’ll stop our active work and learn until we can stand it back up. Management may add items here, as they recognize the schedule risk. We sometimes spend bits of time researching these, to reduce our fear of pager duty.

The orange circle holds a lot of hope, our ambitions for the year and the quarter. We choose these in negotiation with management.

The red circle is ours. We decide what to work on each day, based on plans, problems, and pain. Pushing anything directly into active work is super rude and disruptive.

“OK, it’s in the yellow circle, cool. I’ll work on hiring more people, so we can expand the orange and red circles too.”

Stacking responsibilities

TL;DR – Support decisions with automation and information; give people breadth of responsibility; let them learn from the results of their choices.
When I started writing software in 1999, The software development cycle was divided into stages, ruled over by project management.

Business people decided what to build to support the customers. Developers coded it. Testers tested it. System Administrators deployed and monitored it. Eventually the customer got it, and then, did anyone check whether the features did any good?

These days Agile has shortened the cycle, and put business, development, and QA in the same room. Meanwhile, with all the tools and libraries and higher-level languages, feature development is a lot quicker, so development broadens into automating the verification step and the deployment. Shorter cycles mean we ask the customer for feedback regularly.

Now developers are implementing, verifying, deploying, monitoring. The number of tools and environments we use for all these tasks becomes staggering. Prioritization – when the only externally-visible deliverable is features, who will improve tests, deployment, and monitoring? We automate the customer’s work; when do we automate ours?

The next trend in development process helps with these: it divides responsibilities without splitting goals. Business works with customers, developers automate for business, and a slice of developers automate our work. Netflix calls this team Engineering Tools; at Outpace we call it Platform. Instead of handoffs, we have frequent delivery of complete products from each team.

Meanwhile, when developers own the features past production, another task emerges: evaluation of results. Automate that too! What is success for a feature? It isn’t deployment: it’s whether our customers find value in it. Gleaning that means building affordances into the feature implementation, making information easy to see, and then checking it out. We’re responsible for a feature until its retirement. Combine authority with information, and people rise to the occasion.[1]

Learning happens when one person sees the full cycle and its effects, and that person influences the next cycle. Experiments happen, our capabilities grow.

In this division of responsibilities, no one delegates decisions. Everyone shares a goal, and supports the rest of the organization in reaching that goal. The platform team doesn’t do deployments. It creates tools that abstract away the dirty details, supplying all information needed for developers to make decisions and evaluate the results. At Outpace, the Platform team is composed of people from the other development teams, so they share background and know each others’ pain. The difference is: the platform team has a mandate to go meta, to improve developer productivity. Someone is automating the automation, and every developer doesn’t have to be an expert in every layer.

The old way was like a framework: project managers take the requirements from the business, then the code from the developers, and pass them into the next step in the process. The new way is like libraries: the platform team provides what the developers need, who provide what the business needs, who provide what the customer needs. Details are abstracted away, and decisions are not.

When a developer’s responsibilities end with code that conforms to a document, it’s super hard to get incentives aligned to the larger needs. Once everyone is responsible for the whole cycle, we don’t need to align incentives. Goals align, and that’s more powerful. Remove misaligned incentives, give us a shared goal to work for, and people achieve. Give us some slack to experiment and improve, and we’ll also innovate.

————————————–
[1] via David Marquet