Tuesday, January 20, 2015

Application vs. Reference Data

At my very first programming job, I learned a distinction between two kinds of data: Reference and Application.

Application data is produced by the application, consumed and modified by the software. It is customers, orders, events, inventory. It changes and grows all the time. Typically a new deployment of the app starts with no application data, and it grows over time.

Reference data is closer to configuration. Item definitions, tax rates, categories, drop-down list options. This is read by the application, but changed only by administrative interfaces. It's safe to cache reference data; perhaps it updates daily, or hourly, at the most. Often the application can't even run without reference data, so populating it is part of deployment.


Back at Amdocs we separated these into different database schemas, so that the software had write access to application data and read-only access to reference data. Application data had foreign key relationships to reference data; inventory items referenced item definitions, customers referenced customer categories. Reference data could not refer to application data. This follows the Stable Dependencies principle: frequently-changing data depended on rarely-changing data, never the other way around.

These days I don't go to the same lengths to enforce the distinction. It may all go in the same database, there may be no foreign keys or set schemas, but in my head the classification remains. Which data is essential for application startup? Reference. Which data grows and changes frequently? Application. Thinking about this helps me avoid circular dependencies, and keep a clear separation between administration and operation.

My first job included some practices I shudder at now[1], but others stick with me. Consider the difference between Reference and Application the next time you design a storage scheme.


[1] Version control made of perl on top of cvs, with file locking. Unit tests as custom drivers that we threw away. C headers full of #DEFINE. High-level design documents, low-level design documents, approvals, expensive features no one used. Debugging with println... oh wait, debugging with println is awesome again. 

Saturday, January 17, 2015

Fun with Optional Typing: narrowing errors

After moving from Scala to Clojure, I miss the types. Lately I've been playing with Prismatic Schema, a sort of optional typing mechanism for Clojure. It has some surprising benefits, even over Scala's typing sometimes. I plan some posts about interesting ones of those, but first a more ordinary use of types: locating errors.

Today I got an error in a test, and struggled to figure it out. It looked like this:[1]

expected: (= [expected-conversion] result)
  actual: (not (= [{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}] ([{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}])))

Hideous, right? It's super hard to see what's different between the expected and actual there. (The colors help, but the terminal doesn't give me those.)

It's hard to find the difference because the difference isn't content: it's type. I expected a vector of a map, and got a list of a vector of a map. Joy.

I went back and added a few schemas to my functions, and the error changed to

  actual: clojure.lang.ExceptionInfo: Output of calculate-conversions-since does not match schema: [(not (map? a-clojure.lang.PersistentVector))]

This says my function output was a vector of a vector instead of a map. (This is one of Schema's more readable error messages.)

Turns out (concat (something that returns a vector)) doesn't do much; I needed to (apply concat to-the-vector).[2]

Clojure lets me keep the types in my head for as long as I want. Schema lets me write them down when they start to get out of hand, and uses them to narrow down where an error is. Even after I spotted the extra layer of sequence in my output, it could have been in a few places. Adding schemas pointed me directly to the function that wasn't doing what I expected.

The real point of types is that they clarify my thinking and document it at the same time. They are a skeleton for my program. I like Clojure+Schema because it lets me start with a flexible pile of clay, and add bones as they're needed.

-----
[1] It would be less ugly if humane-test-output were activated, but I'm having technical difficulties with that at the moment.
[2] here's the commit with the schemas and the fix.

Monday, January 12, 2015

Readable, or reason-aboutable?

My coworker Tom finds Ruby unreadable.
What?? I'm thinking. Ruby can be quite expressive, even beautiful.
But Tom can't be sure what Ruby is going to do. Some imported code could be modifying methods on built-in classes. You can never be sure exactly what will happen when this Ruby code executes.

He's right about that. "Readable" isn't the word I'd use though: Ruby isn't "reason-aboutable." You can't be completely sure what it's going to do without running it. (No wonder Rubyists are such good testers.)

Tom agreed that Ruby could be good at expressing the intent of the programmer. This is a different goal from knowing exactly how it will execute.

Stricter languages are easier to reason about. In Java I can read the specification and make inferences about what will happen when I use the built-in libraries. In Java, I hate the idea of bytecode modification because it interferes with that reasoning.

With imperative code in Java or Python, where what you see it what you get, you can try to reason about these by playing compiler. Step through what the computer is supposed to do at each instruction. This is easier when data is immutable, because then you can trace back to the one place it could possibly be set.

Beyond immutability, the best languages and libraries offer more shortcuts to reasoning. Shortcuts let you be sure about some things without playing compiler through every possible scenario. Strong typing helps with this: I can be sure about the structure of what the function returns, because the compiler enforces it for me.

Shortcuts are like, I can tell 810 is divisible by 3 because its digits add to a number divisible by 3. I don't have to do the division. This is not cheating, because this is not coincidence; someone has proven this property mathematically.

Haskell is the most reason-aboutable language, because you can be sure that the environment won't affect execution of the code, and vice-versa, outside of the IO monad. Mathematical types like monoids and monads help too, because they come with properties that have been proven mathematically. Better than promises in the documentation. More scalable than playing compiler all day.[1]

"Readability" means a lot of things to different people. For Tom, it's predictability: can he be sure what this code will do? For many, it's familiarity: can they tell at a blink what this code will do? For me, it's mostly intent: can I tell what we want this code to do?

Haskellytes find Haskell the most expressive language, because it speaks to them. Most people find it cryptic, with its terse symbols. Ruby is well-regarded for expressiveness, especially in rich DSLs like RSpec.

Is expressiveness (of the intent of the programmer) in conflict with reasoning (about program execution)?


[1] "How do you really feel, Jess?"

We want to keep our programs simple, and avoid unnecessary complexity. The definition of a complex system is: the fastest way to find out what will happen is to run it. This means Ruby is inviting complexity, compared to Haskell. Functional programmers aim for reason-aboutable code, using all the shortcuts (proven properties) to scale up our thinking, to fit more in our head. Ruby programmers trust inferences made from example tests. This is easier on the brain, both to write and read, for most people. It is not objectively simpler.

Friday, January 9, 2015

Spring cleaning of git branches

It's time to clean out some old branches from the team's git repository. In memory of them, I record useful tricks here.

First, Sharon's post talks about finding branches that are ripe for deletion, by detecting branches already merged. This post covers those, plus how to find out more about the others. This post is concerned with removing unused branches from origin, not locally.

Here's a useful hint: start with
git fetch -p
to update your local repository with what's in origin, including noticing which branches have been deleted from origin.
Also, don't forget to
git checkout mastergit merge --ff-only
so that you'll be on the master branch, up-to-date with origin (and won't accidentally create a merge commit if you have local changes).

Next, to find branches already merged to master:

git branch -a --merged

This lists branches, including remote branches (the ones on origin), but only ones already merged to the current branch. Note that the argument order is important; the reverse gives a silly error.  Here's a one-liner that lists them:
git branch -a --merged | grep -v -e HEAD -e master | grep origin | cut -d '/' -f 3- 
This says, find branches already merged; exclude any references to master and HEAD; include only ones from origin (hopefully); cut out the /remotes/origin/ prefix.

The listed branches are safe to delete. If you're brave, delete them permanently from origin by adding this to the previous command:
 | xargs git push --delete origin
This says, take all those words and put them at the end of this other command, which says "delete these references on the origin repository."

OK, those were the easy ones. What about all the branches that haven't been merged? Who created those things anyway, and how old are they?

git log --date=iso --pretty=format:"%an %ad %d" -1 --decorate

is a lovely command that lists the author, date in ISO format (which is good for sorting), and branches and tags of the last commit (on the current branch, by default).

Use it on all the branches on origin:
git branch -a | grep origin | grep -v HEAD | xargs -n 1 git log --date=iso --pretty=format:"%an %ad %d%n" -1 --decorate | grep -v master | sort
List remote branches; only the ones from origin; exclude the HEAD, we don't care and that line is formatted oddly; send each one through the handy description; exclude master; sort (by name then date, since that's the information at the beginning of the line).

This gives me a bunch of lines that look like:

Shashy 2014-08-15 11:07:37 -0400  (origin/faster-upsert)
Shashy 2014-10-23 22:11:40 -0400  (origin/fix_planners)
Shashy 2014-11-30 06:50:57 -0500  (origin/remote-upsert)
Tanya 2014-10-24 11:35:02 -0500  (origin/tanya_jess/newrelic)
Tanya 2014-11-13 10:04:48 -0600  (origin/kafka)
Yves Dorfsman 2014-04-24 14:43:04 -0600  (origin/data_service)
clinton 2014-07-31 16:26:37 -0600  (origin/warrifying)
clinton 2014-09-15 13:29:14 -0600  (origin/tomcat-treats)

Now I am equipped to email those people and ask them to please delete their stray branches, or give me permission to delete them.


Thursday, January 1, 2015

Systems Thinking about WIT

Systems (and by "systems" I mean "everything") can be modeled as stocks and flows.[1] Stocks are quantities that accumulate (or disappear) over time, like money in a bank account or code in your repository. They may not be physical, but they are observable in some sense. Flows are how the stock is filled, and how it empties. Flows have valves representing the variable rates of ingress and outgress.
salary flows into the stock of money; spending flows out

Stocks don't change directly; they grow or shrink over time according to the rates of flow. If your salary grows while spending stays constant, money accumulates in your account. Flows are affected by changes to the rules of the system, and sometimes by the levels in each stock. The more money in your bank account, the faster you spend it, perhaps. When a flow affects a stock and that stock affects the flow, a feedback loop happens, and then things get interesting. If the more money in your bank account, the more you invest, and then the more money in your bank account... $$$$$!

salary and investment income flow into the account; spending flows out. An arrow indicates that the amount in the account affects the investment flow rate.

This is a reinforcing feedback loop, marked with an R. It leads to accelerating growth, or accelerating decline.

Enough about money, let's talk about Women[2] in IT. We can model this. The stock is all the women programmers working today. They come in through a pipeline, and leave by retiring.[3]
the pipeline inputs women into IT; they flow out through retirement.

People also leave programming because there is some other, compelling thing for them to do. This includes raising a family, writing a novel, etc. I'll call this the "Something else" flow. It should be about the same rate as in other lines of work.
Then there are women who drain out of IT because they feel drained. They're tired of being the only woman on the team, tired of harassment or exclusion, tired of being passed over for promotion. Threatened until they flee their homes, in some cases. Exhausted from years of microaggressions, in more. I'll call it the "F___ This" outflow.
The pipeline inputs to the stock of Women in IT; three outflows are Something Else; retirement; and F This.


There are feedback loops in this system. The more women in IT, the more young women see themselves as possible programmers, the more enter the pipeline. Therefore, the flow of the input pipeline depends on the level of the stock. This is a reinforcing feedback loop: the more already in, the more come in. The fewer already in, the fewer enter.
same picture as before, but an arrow indicates that the quantity of women in IT affects the rate of flow through the pipeline. Marked with R.


The quantity of women programmers also affects the experience for each of us. At any given conference or company, a few assholes are mean, and those microaggressions or harassments hit the women present. The fewer women, the more bullshit each of us gets. Also the less support we have when we complain. This is also a reinforcing feedback loop: the lower the number of women in IT, the more will say "F___ This" and leave.[4] If there are ever as many women programmers as men, that outflow might be the same as in other fields. Until then, the outflow gets bigger as the stock gets lower.
same picture, with additional arrow indicating that the quantity of women in IT affects the rate of the F This outflow. Marked with R.


One of my friends pointed at the pipeline and said, "It's not hopeless, because women are a renewable resource." He hopes we can overwhelm the outflows with a constant influx of entry-level developers. In Thinking in Systems[1], Meadows remarks that people have a tendency to focus on inflows when they want to change a system. Sure enough, there's a lot of focus these days, a lot of money and effort, toward growing the input pipeline. Teach young girls programming. Add new input pipelines: women-only bootcamps, or programming for moms. These are useful efforts. Yet as Meadows points out, the outflows have just as much effect on the stock.

Can we plug the holes in this bathtub? Everyone in IT affects the environment in IT. We can all help close the "F___ This" outflow. See Kat Hagan's list of ways we can each make this environment less sexist.  Maybe in a generation it'll get better?

This isn't new information. Let's add granularity to the system, and see what we learn. Divide the stock into two stocks: junior and senior women developers. The pipeline can only provide junior developers. Experience takes them to senior level, and finally retirement. Junior people are likely to find other careers, while senior people have been here long enough we're probably staying so I won't draw that as an outflow. The "F___ This" outflow definitely exists for both.
pipeline flows into a stock of Junior women. Outflows of Something Else and F This lead nowhere, while an outflow called Experience leads to a stock of Senior Women. Outflows from here are retirement and F This.


Consider now the effects of the two stocks on the flows. The senior developers are most impactful on all of them - on the pipeline as teachers, on "F___ This" as influencers in the organization, and even on the "Something else" outflow as role models! The more women architects and conference speakers and book authors and bloggers a junior dev sees, the more she sees a future for herself in this career. Finally, the stock of senior women impacts the rate at which younger women move into senior roles, through mentorship and by impacting expectations in the organization and community.
Same picture, with the addition of arrows indicating the affect of quantity of Senior Women on the input pipeline, the Something Else outflow, and both F This outflows. Also, arrows indicate an effect of quantity of junior women on input pipeline and their own F This outflow.


To sustainably change the stocks in a system, work with the feedback loops. In this case, the senior developer stock impacts flows the most. To get faster results that outlast the flows of money into women-only bootcamps, it's senior developers we need the most. And the senior-er the better. Zoom in on this box, and some suggestions emerge.


An Experience flow brings in Senior Women in IT; they flow out through retirement and through F This.

Inflows 

1. Mentor women developers. Don't make her wait for another woman to do it; she may wait forever, or leave first. Overcome the awkward.
2. Consider minorities carefully for promotion. A woman is rarely the obvious choice for architect or tech lead, so use conscious thought to overcome cognitive bias.

Outflows

1. Encourage senior women to stay past minimum retirement age. Our society doesn't value older women as much as older men. Can we defy that in our community?
2. Don't tolerate people who are mean, especially to women who are visibly senior in the community. See Cate Huston's post: don't be a bystander to cruelty or sexism. And above all, when a woman tells you something happened, believe her.[5] For every one who speaks up, n silently walked away.

Magnify the impact

1. Look for women to speak at your conference. Ask early, pay their travel, do what it takes to help us help the community.
2. Retweet women, repost blogs, amplify. I want every woman in development to know that this is a career she can enjoy forever.

Stocks don't change instantaneously. Flows can. It takes time for culture to improve, and time for that improvement to show up in the numbers. With reinforcing feedback loops like this, waiting won't fix anything - until we change the flows. Then time will be on our side.

Finally, please don't take a handful of counterexamples as proof that the existing system is fine. Yes, we all respect Grace Hopper. Yes, there are a few of us with unusual personalities who aren't affected by the obstacles most encounter. Your bank account is in the black with just a few pennies, but that won't make you rich.


[1] Thinking in Systems, by Donella Meadows. On Amazon or maybe in pdf
[2] Honestly women have it easier than a lot of other minorities, but I can only speak from any experience about women. Please steal these ideas if you can use them.
[3] There's a small flow into Women in IT from the stock of Men in IT, which is awesome and welcome and you are fantastic.
[4] Cate Huston talked about "how dismal the numbers were, and how the numbers were bad because the experience was bad, and how the numbers wouldn’t change unless the experience changed" in her post on Corporate Feminism.
[5] Anita Sarkeesian on "the most radical thing you can do to help: actually believe women when they talk about their experiences."

Saturday, December 27, 2014

Accidental vs Deliberate Context

In all decisions, we bring our context with us. Layers of context, from what we read about that morning to who our heroes were growing up. We don't realize how much context we assume in our communications, and in our code.

One time I taught someone how to make the Baby Vampire face. It involves poking out both corners of my lower lip, so they stick up like poky gums. Very silly. To my surprise, the person couldn't do it. They could only poke one side of the lower lip out at a time.


Turns out, few outside my family can make this face. My mom can do it, my sister can do it, my daughters can do it - so it came as a complete surprise to me when someone couldn't. There is a lip-flexibility that's part of my context, always has been, and I didn't even realize it.

Another time, I worked with a bunch of biologists. Molecular biology is harder than any business domain I've encountered. The biologists talked fluently amongst themselves about phylogenies and BLAST and PTAM and heterology and I'm making this up now. They shared all this context, and it startled them when developers were dumbfounded by the quantity of it.

Shared context is fantastic for communication. The biologists spoke amongst themselves at a higher level than with others. Unshared context, when I don't realize I'm drawing on a piece others don't share, is a disaster for communication. On the other hand, if I can draw on context that others don't have, and I can explain it, then I add a source of information and naming to the team.

In teams, it's tempting to form shared context around coincidental similarities. The shows we watched growing up, the movies we like, the beer we drink. The culture we all grew up in, the culture we are now immersed in. It gives us a feeling of belonging and connection, shared metaphors to communicate in. It's much easier than communicating with someone from a different culture. There, we have no idea how many assumptions we're making, how much unshared context there is.

Building a team around incidental shared context is cheating. It keeps all the worst of context: the assumptions we don't know we're making. It deprives us of the best of unshared context: the stock of models and ideas and values that one person alone can't hold.

Instead, build a deliberate shared context. Like the biologists have: a context around the business domain, the programming language we use, the coding styles and conventions that make the work flow, that make the code comprehensible. Team culture is important; we should understand each others' code through a shared context that's created deliberately.

Eschew incidental shared context by aiming for a diverse team. Create consciously a context that's conducive to the work.

Thursday, December 18, 2014

My First Leiningen Template

Every time I sit down to write a quick piece of code for a blog post, it starts with "lein new." This is amazing and wonderful: it's super fast to set up a clean project. Good practice, good play.[1]

But not fast enough! I usually start with a property-based test, so the first thing I do every time is add test.check to the classpath, and import generators and properties and defspec in the test file. And now that I've got the hang of declaring input and output types with prismatic.schema, I want that everywhere too.

I can't bring myself to do this again - it's time to shave the yak and make my own leiningen template.

The instructions are good, but there are some quirks. Here's how to make your own personal template, bringing your own favorite libraries in every time.

It's less confusing if the template project directory is not exactly the template name, so start with:

  lein new template your-name --to-dir your-name-template
  cd your-name-template

Next, all the files in that directory are boring. Pretty them up if you want, but the meat is down in src/leiningen/new.

In src/leiningen/new/your-name.clj is the code that will create the project when your template is activated. This is where you'll calculate anything you need to include in your template, and render files into the right location. The template template gives you one that's pretty useless, so I dugging into leiningen's code to steal and modify the default template's definition. Here's mine:

(defn jessitron
 [name]
 (let [data {:name name
             :sanitized (sanitize name)
             :year (year)}]
  (main/info "Generating fresh project with test.check and schema.")
  (->files data
     ["src/{{sanitized}}/core.clj" (render "core.clj" data)]
     ["project.clj" (render "project.clj" data)]
     ["README.md" (render "README.md" data)]
     ["LICENSE" (render "LICENSE" data)]
     [".gitignore" (render "gitignore" data)]
     ["test/{{sanitized}}/core_test.clj" (render "test.clj" data)]))

As input, we get the name of the project that someone is creating with our template.
The data map contains information available to the templates: that's both the destination file names and the initial file contents. Put whatever you like in here.
Then, set the message that will appear when you use the template.
Finally, there's a vector of destinations, paired with renderings from source templates.

Next, find the template files in src/leiningen/new/your-name/. By default, there's only one. I stole the ones leiningen uses for the default template, from here. They didn't work for me immediately, though: they referenced some data, such as {{namespace}}, that wasn't in the data map. Dunno how that works in real life; I changed them to use {{name}} and other items provided in the data.

When it's time to test, two choices: go to the root of your template directory, and use it.

lein new your-name shiny-new-project

This feels weird, calling lein new within a project, but it works. Now
cd shiny-new-project
lein test

and check for problems. Delete, change the template, try again.

Once it works, you'll want to use the template outside the template project. To get this to work, first edit project.clj, and remove -SNAPSHOT from the project version.[3] Then

lein install

Done! From now on I can lein new your-name shiny-new-project all day long.

And now that I have it, maybe I'll get back to the post I was trying to write when I refused to add test.check manually one last time.


[1] Please please will somebody make this for sbt? Starting a Scala project is a pain in the arse[2] compared to "lein new," which leans me toward Clojure over Scala for toy projects, and therefore real projects.

[2] and don't say use IntelliJ, it's even more painful there to start a new Scala project.

[3] At least for me, this was necessary. lein install didn't get it into my classpath until I declared it a real (non-snapshot) version.