Tuesday, November 20, 2012

The invisible hand that isn't there


Amber nails it, but not in the way you might think.

Where are these opportunities? You don't see the opportunities that no one offers you. You don't see the suggestions, requests for collaboration, invitations to the user group, that didn't happen.

Where are these obstacles? Also invisible. They're a lack of inclusion, and of a single role model. They're not having your opinion asked for technical decisions. They're an absence of sponsorship -- of people who say in management meetings "Jason would make a great architect." Jason doesn't even know someone's speaking up for him, so how could Rokshana know she's missing this?

You can't see what isn't there. You can't fight for what you can't see.

In the post that triggered Amber's tweet, Tom describes the subtle, behind-the-scenes influences that make a career. Success is built on a hundred fortuitous circumstances. Lack of success is a thousand paper cuts. And since it's always been this way, why would a sliced-up person even notice? It feels normal. Like a child with poor vision, they don't even know anything is wrong until someone takes a broader perspective, makes a comparison.

How can we make this comparison? At the aggregate. You don't have to look far to know that at the aggregate, women and minorities are strangely missing. Women who are here are leaving.

This isn't overt discrimination, it isn't intentional, it's simply how our pattern-forming brains work. When people like me think about technologists, the image that comes to mind has pale skin and grows a moustache in November. People aren't trying to exclude anyone, we're just human. If you disagree, read some numbers.

Tom gets it wrong when he realizes there are "certain opportunities I get that women have to fight much harder for." There's no fighting, you can't fight for opportunities you don't see. Instead, there is waiting. Wait forever, wait until they're tired of feeling out of place. Until some other career offers them the encouragement they don't realize they're missing.
It's vague because no one can see what isn't there - until we back up and observe the indirect effects.

The invisible hand isn't pushing up down - it's pulling others up. Let's work on pulling everyone up.

So how must we be vigilant? I'll tell you.
  1. Create explicit opportunities to make up for the implicit ones minorities aren't getting. Invite women to speak, create minority-specific scholarships, make extra effort to reach out to underrepresented people.
  2. Make conscious effort to think about including everyone on the team in decisions. Don't always go with your gut for whom to invite to the table.
  3. Don't interrupt a woman in a meeting. (I catch myself doing this, now that I know it's a problem.) Listen, and ask questions.
  4. If you are a woman, be the first woman in the room. We are the start of making others feel like they belong.

The good news: once there are several women on every team, in every conference session, in every user group, then the bias will naturally shift. Same for other minorities. The bad news: it takes a generation. So be patient, and be encouraging. Encourage people to enter the field, to stay in it, and especially to lead. Sometimes all it takes is a suggestion.

Sunday, November 18, 2012

Causality: tougher than it looks, but we can take it on

We like to take a hunk of data, graph one factor against another, demonstrate correlation, and infer causality. This naive form of analysis is appealing in its simplicity, but it doesn't cut it in the real world. With Big Data, we can identify correlation out the wazoo, but it's time to get way more sophisticated in our causality analysis.

With data as big as we can get it today, the scientific method doesn't work anymore. (Don't take my word for it. Listen to Sandy Pentland.)

A correlation between two factors is judged statistically significant if there is less than a 5%, or 1%, or 0.5% chance that the results would come out this way by chance. At the strictest level, this means 1:200 false hypotheses will show up as true out of randomness. With tremendous data, we can test effectively infinite hypotheses. Plenty of them will look significant when they are not. As Sandy puts it, you can learn that people who drive Fords on Thursdays are more likely to get the flu. The correlation exists, but it's bullshit.
With big data, it's time to bring the word "significant" back to its regular-people meaning. We have to look for causality. We have to look for the micropatterns that lead to better health, smoother traffic, lower energy use. No more "this happened and this happened to the same people, so they must be related!" Causality delineates the difference between truth and publishability of an academic paper.

How can we find that causality? It is complex: many influences together trigger each event, and each of these factors are triggered by many influences including each other. How are we to analyze this?
A painfully simplified example: Jay's new web site

Manufacturing has a tool that could be useful. Quality Function Deployment, and in particular the House of Quality tool, addresses the chains and webs of causality. As Chad Fowler explained yesterday at 1DevDayDetroit, the House of Quality starts with desired product characteristics. It identifies the relative importance of each characteristic; a list of measurable factors that influence the characteristics; and which factors influence which characteristics, how much, and in what direction. Magic multiplication formulas then calculate which factors are the most important to the final product.


But don't stop there. Take the factors and turn them into the target characteristics in the next House of Quality. Find factors that influence this new, more detailed set of characteristics. Repeat the determination of what factors influence what characteristics and how much.
The factors from Iteration 1 become the goals in Iteration 2.

Iterate until you get down to factors specific enough that they can be controlled in a production facility. Actionable, measurable steps are then apparent, along with a priority for each based on how much they influence the highest-level product characteristics. Meanwhile, you have created a little network of causalities.

This kind of causality analysis is a lot of work. Creating this sad little example made my brain hurt. This analysis is no simple graph of heart attacks vs strawberry consumption across populations. On the upside, Big Data drastically expands our selection of measurable factors. If we can identify causality at a level this detailed, we can get a deeper level of information. We can get closer to truth.

Friday, November 16, 2012

Many weapons against one Big Ball of Mud


This morning my daughters fought over a kitchen stool. A box of fruit snacks from the nearby shelf hit the floor and spilled. "I didn't do it, she did!" "It wasn't me, it was you!" 
"It fell because of both of you together," I told them. "Causality is rarely singular."



In our biological world, events don't have just one cause. Like my daughters, we wish it was that simple. We want to say "A B" but in a complex system, each event has many causes. It's more like
and each of these have many causes, including each other.

We want causality to be straightforward, but it is not.

Fortunately, we're programmers. The computer is that rare, delightful world where causality works. In the computer, we can say "It did that because I told it to." Rare is the occurrence that can't be traced to some pieces of code or wiring.

In software we have the opportunity to keep causality simple. But we're humans, and as much as we wish we worked that way, we don't. Our neurons are interconnected, and our software grows that way too.

How do we combat this big ball of mud?

Programming principles and paradigms aim to stop this encroachment of biological-style networks of dependencies. We want to break these interconnected causalities. Make our programs less complicated.

A message-oriented architecture delineates boundaries, much like the original intent of SOA and OOP. We use functional principles like "no side effects" to restrict the influence of parts on other parts. Even the prime directive, the Single Responsibility Principle, helps keep causality direct.
Yet these principles only help when we follow the spirit of simplicity — or when we're forced to. A human in a hurry will grab the duct tape and glom all kinds of pieces together. A human not in a hurry, who is aware of commonalities in the internals of two components, will take advantage of these synergies and feel clever for it. Meanwhile they add to a dark web of dependencies that haunts the team later.

The more human-proof we can make these barriers, the better they work. OOP was invented for this — objects were supposed to communicate only by messages, hiding their internals. But it is easy to pass the whole world and yourself in a message to another object. OOP at its best enables a less-coupled design, but the language does not enforce it.

Then there's SOA. Process barriers between applications can enforce a stronger separation — but only if the message format is a simple one. RPC or Java serialization doesn't help! A giant ESB shuffling messages among Java apps doesn't simplify; it adds itself as a dependency to each one without breaking their dependencies on each other.

Consistent implementations of service providers and consumers can destroy the benefit of enforced isolation. Instead, let the teams developing each part choose languages and tools, let them communicate in JSON or straightforward XML, and the components will be less coupled. This is message-oriented, and it's hard to screw it up with duct tape and cleverness.

Components that pass simple messages are independently testable. Problems can be isolated to the system at fault by checking the messages.

Within a single process, prevent coupling by restricting side effects and environmental access. When a function is "data in, data out" then its interface is clearly stated and testable. Haskell, with strong typing and isolated I/O, enforces "data in, data out." In hybrid languages like F# or Scala, it is up to us to follow these rules and keep our application simple.

Functional style -- or more precisely, "expression-oriented" style -- helps at the very small level. OO works well for library or module interfaces when few objects are exposed for use. Messaging helps between application components. At each level, keep your functions, objects, and components as small and simple as possible, and keep them composable. Complex software should be assembled, not woven.

Our natural human tendency is to increase complexity, but software is one world where we don't have to. In the computer, we are God. But while we have the opportunity to know everything about our creation, we're still humans with limited working memory. To hold more in our heads, we have to chunk it. We have to compose small bits into larger parts. Principles of expression- and message-oriented programming enforce the lines between those parts, letting us scale up and scale down as needed to hold the interesting parts of the world in our head.

Work to overcome our biological origins. Build a system that defies the limitations of our working memory. Enforce decoupling. Relish diversity. Keep it simple. 






Sunday, November 4, 2012

Scala: when you wish that Seq was a List

As a long-time Java dev, it is tempting to declare function parameters as Lists. This is appropriate if we require an ordered sequence, but usually we only care that we can iterate over it. Our functions are more general if they accept any Seq. [1]

However, now and then I really miss the cons operator and deconstructor pattern (::) that is defined on List. An excerpt:

   case Nil => Nil
   case head :: tail => something(head) :: recurse(tail)

If what I'm matching is a sequence instead of list, I can't use :: in either context. Here is the equivalent for a sequence:

    case Seq() => Seq()
    case Seq(head, tail @ _*) => something(head) +: recurse(tail)
  • cons :: is replaced by prepend +: 
  • The sequence deconstruction pattern _* matches 0 or more sequence elements after the first one[2]
  • Meanwhile, the @ operator links an identifier to a portion of the pattern, capturing the rest of the sequence in tail.
  • The empty sequence is equivalent to Nil in List.
Lists are prettier, but sequences are more general. Consider both when defining your methods.

-----------------
[1] Technically we could be declaring the argument as GenIterable, but we're not that hard-core. 
[2] google unapplySeq to learn more about _*