Friday, March 20, 2015

Gaining new superpowers

When I first understood git, after dedicating some hours to watching a video and reading long articles, it was like I finally had power over time. I can find out who changed what, and when. I can move branches to point right where I want. I can rewrite history!

Understanding a tool well enough that using it is a joy, not a pain, is like gaining a new superpower. Like I'm Batman, and I just added something new to my toolbelt. I am ready to track down latent bug-villains with git bisect! Merge problems, I will defeat you with frequent commits and regular rebasing - you are no match for me now!

What if Spiderman posted his rope spinner design online, and you downloaded the plans for your 3D printer, and suddenly you could shoot magic sticky rope at any time? You'd find a lot more uses for rope. Not like now, when it's down in the basement and all awkward to use. Use it for everyday, not-flashy things like grabbing a pencil that's out of reach, or rolling up your laptop power cable, or reaching for your coffee - ok not that! spilled everywhere. Live and learn.

Git was like that for me. I solve problems I didn't know I had, like "which files in this repository haven't been touched since our team took over maintenance?" or "when was this derelict function last used?" or "who would know why this test matters?"

Every new tool that I master is a new superpower. On the Mac or linux, command-line utilities like grep and cut and uniq give me power over file manipulation - they're like the swingy grabby rope-shooter-outers. For more power, Roopa engages Splunk, which is like the Batmobile of log parsing: flashy and fast, doesn't fit in small spaces. On Windows, Powershell is at your fingertips, after you've put some time in at the dojo. Learn what it can do, and how to look it up - superpowers expand on demand! 

Other days I'm Superman. When I grasp a new concept, or practice a new style of coding until the flow of it sinks in, then I can fly. Learning new mathy concepts, or how and when to use types or loops versus recursion or objects versus functions -- these aren't in my toolbelt. They flow from my brain to my fingertips. Like X-ray vision, I can see through this imperative task to the monad at its core.

Sometimes company policy says, "You may not download vim" or "you must use this coding style." It's like they handed me a piece of Kryptonite. 

For whatever problem I'm solving, I have choices. I can kick it down, punch it >POW!< and run away before it wakes up. Or, I can determine what superpower would best defeat it, acquire that superpower, and then WHAM! defeat it forever. Find its vulnerability, so that problems of its ilk will never trouble me again. Sometimes this means learning a tool or technique. Sometimes it means writing the tool. If I publish the tool and teach the technique, then everyone can gain the same superpower! for less work than it took me. Teamwork!

We have the ultimate superpower: gaining superpowers. The only hard part is, which ones to gain? and sometimes, how to explain this to mortals: no, I'm not going to kick this door down, I'm going to build a portal gun, and then we won't even need doors anymore.

Those hours spent learning git may have been the most productive of my life. Or maybe it was learning my first functional language. Or SQL. Or regular expressions. The combination of all of them makes my unique superhero fighting style. I can do a lot more than kick.

Tuesday, March 17, 2015

Estimates and Our Brain

Why is it so hard to estimate how long a piece of work will take?

When I estimate how long to add a feature, I break it down into tasks. Maybe I'll need to create a table in the database, add a drop-down in the GUI, connect the two with a few changes to the service calls and service back-end. I picture myself adding a table to the database. That should take about a day, including testing and deployment. And so on for the other tasks.

Maybe it works out like this:

  Create Table     = 1 day
  Service back-end = 2 days
  New drop-down    = 2 days
+ Service call     = 1 day
-------------------------
  New feature      = 6 days

It almost never happens that way, does it? The estimate above is the happy path of feature development. Each component is probably accurate. If there's a 70% chance that each of four tasks works as expected, then the chance of the feature being completed on time is (0.7^4) = 24%. Those aren't very good odds.

It's worse than that. Take the first task: create table. Maybe there's a 70% chance of no surprises when we get to the details of schema design. And a 70% chance the tests work, nothing bites us. And a 70% chance of no problems in deployment. Then there's only a 34% chance that Create Table will take a day. Break each of the others into three 70% pieces, and our chance of completing the feature on time is 1%. Yikes! No wonder we never get this right!

We can picture the happy path of development. It's much harder to incorporate failure paths - how can we? We can't expect the deployment to fail because some library upgrade was incompatible with the version of Ruby in production (or whatever). The chance of each failure path is very low, so our brains approximate it to zero. For one likely happy path, there are hundreds of low-probability failure paths. All those different failures add up -- and then multiply -- until our best predictions are useless. The most likely single scenario is still the happy path and 6 days, but millions of different possible scenarios each take longer.

It's kinda like distributed computing. 99% reliability doesn't cut it when we need twenty service calls to work for the web page to load - our page will fail one attempt out of five. The more steps in our task, the more technologies involved, the worse our best estimates get.

Now I don't feel bad for being wrong all the time.

What can we do about this?

1. Smooth out incidental complexity: some tasks crop up in every feature, so making them very likely to succeed helps every estimate. Continuous integration and continuous deployment spot problems early, so we can deal with them outside of any feature task. Move these ubiquitous subtasks closer to 99%.

2. Flush out essential complexity: the serious delays are usually here. When we write the schema, we notice tricky relationships with other tables. Or the data doesn't fit well in standard datatypes, or it is going to grow exponentially. The drop-down turns out to require multiple selection, but only sometimes. Sensitive data needs to be encrypted and stored in the token service -- any number of bats could fly out of this feature when we dig into it. To cope: look for these problems early. Make an initial estimate very broad, work on finding out which surprises lurk in this feature, then make a more accurate estimate.

Say, for instance, we once hit a feature a lot like this one that took 4 weeks, thanks to hidden essential complexity. Then my initial estimate is 1-4 weeks. ("What? That's too vague!" says the business.) The range establishes uncertainty. To reduce it, spend the first day designing the schema and getting the details of the user interface, and then re-estimate. Maybe the drop-down takes some detail work, but the rest looks okay: the new estimate is 8-12 days, allowing for we-don't-know-which minor snafus.

Our brains don't cope well with low-probability events. The scenario we can predict is the happy path, so that's what we estimate. Reality is almost never so predictable. Next time you make an estimate, try to think about the possible error states in the development path. When your head starts to hurt, share the pain by giving a nice, broad range.

Sunday, February 15, 2015

Microservices, Microbusinesses

To avoid duplication of effort, we can build software uniformly. Everything in one monolithic system or (if that gets unwieldy) in services that follow the same conventions, in the same language, developed with uniform processes, released at the same time, connecting to the same database.

That may avoid duplication, but it doesn't make great software. Excellence in architecture these days involves specialization -- not by task, but by purpose. Microservices divide a software system into fully independent pieces. Instead of limiting which pieces talk to which, they specify clear APIs and let both the interactions and the innards vary as needed.

A good agile team is like a microservice: centered around a common purpose. In each retro, the team looks for ways to optimize their workings, ways particular to that team's members, context, objectives.

When a service has a single purpose[1], it can focus on the problems important to that purpose. Authentication? availability is important, consistency, and security. Search? speed is crucial, repeatability is not. Each microservice can use the database suited to its priorities, and change it out when growth exceeds capacity. The business logic at the core of each service is optimized for efficient and clear execution.

Independence has a cost. Each service is a citizen in the ecosystem. That means accepting requests that come in, with backwards compatibility. It means sending requests and output in the format other services require, not overloading services it depends on, and handling failure of any downstream service. Basically, everybody is a responsible adult.
That's a lot of overhead and glue code. Every service has to do translation from input to its internal format, and then to whatever output format someone else requires. Error handling, caching or throttling, failover and load balancing and monitoring, contract testing, maintaining multiple interface versions, database interaction details. Most of the code is glue, layers of glue protecting a small core of business logic. These strong boundaries allow healthy relationships with other services, including new interactions that weren't designed into the architecture from the beginning. For all this work, we get freedom on the inside. We get the opportunity to exceed expectations, rather than straining standardized languages and tools to meet requirements.

Do the teams in your company have the opportunity to optimize in the same way?

I've worked multiple places where management decreed that all teams would track work in JIRA, with umpteen required fields, because they like how progress tracking rolls up. They can see high-level numbers or drill down into each team's work. This is great for legibility.[3] All the work fits into nice little boxes that add together. However, what suits the organizational hierarchy might not suit the work of an individual team.

Managers love to have a standard programming language, standard testing tools with reports that roll up, standard practices. This gives them the feeling that they can move people from team to team with less impact to performance. If that feeling is accurate, it's only because the level of performance is constrained everywhere.

Like software components, teams have their own problems to solve. Communication between individuals matters most, so optimize for that[2]. Given the freedom to vary practices and tools, a healthy agile team gets better and better. The outcome of a given day's work is not only a task completed: it includes ideas about how to do this better next time.

Tom Marsh says, "Make everyone in your organisation believe that they are working in a business unit about 10 people big." A team can learn from decisions and make better ones and spiral upward into exceptional performance (plus innovation), when internal consensus is enough to implement a change. Like a microbusiness.

Still, a team exists as a citizen inside a larger organization. There are interfaces to fulfill. Management really does need to know about progress. Outward collaboration is essential. We can do this the same way our code does: with glue. Glue made of people. One team member, taking the responsibility of translating cards on the wall into JIRA, can free the team to optimize communication while filling management's strongest needs.

Management defines an API. Encapsulate the inner workings of the team, and expose an interface that makes sense outside. By all means, provide a reference implementation: "Other teams use Target Process to track work." Have default recommendations: "We use Clojure here unless there's some better solution. SQL Server is our first choice database for these reasons." Give teams a strong process and technology stack to start from, and let them innovate from there.

On a healthy team, people are accomplishing something together, and that's motivating. When we feel agency to share and try out ideas, when the external organization is only encouraging, then a team can gel, excel and innovate. This cohesive team culture (plus pairing) brings new members up to speed faster than any familiarity with tooling.

As in microservices, it takes work to fulfill external obligations and allow internal optimization. This duplication is not waste. Sometimes a little overhead unleashes all the potential.


[1] The "micro" in microservices is a misnomer: the only size restriction on a microservice is Conway's Law: each is maintained by a single team, 10 or fewer people. A team may maintain one or more system components, and one team takes full responsibility for the functioning of the piece of software.

[2] Teams work best when each member connects with each other member (NYTimes)

[3] Seeing Like a State points out how legibility benefits the people in charge, usually at the detriment of the people on the ground. Sacrificing local utility for upward legibility is ... well, it's efficiency over innovation.

Tuesday, February 10, 2015

Fun with Optional Typing: cheap mocking


For unit tests, it's handy to mock out side-effecting functions so they don't slow down tests.[1] Clojure has an easy way to do this: use with-redefs to override function definitions, and then any code within the with-redefs block uses those definitions instead.

To verify the input of the side-effecting function, I can override it with something that throws an exception if the input is wrong.[2] A quick way to do that is to check the input against a schema.

That turns out to be kinda pretty. For instance, if I need to override this function fetch-orders, I can enforce that it receives exactly the starting-date I expect, and a second argument that is not specified precisely, but still meets a certain condition.

(with-redefs [fetch-orders (s/fn [s :- (s/eq starting-date)
                                  e :- AtLeastAnHourAgo]
                            [order])]
... )

Here, the s/fn macro creates a function that (when validation is activated[3]) checks its input against the schemas specified after the bird-face operator. The "equals" schema-creating function is built-in; the other I created myself with a descriptive name. The overriding function is declarative, no conditionals or explicit throwing or saving mutable state for later.

If I have a bug that switches the order of the inputs, this test fails. The exception that comes out isn't pretty.
expected: (= expected-result (apply function-under-test input))
  actual: clojure.lang.ExceptionInfo: Input to fn3181 does not match schema: [(named (not (= #<DateTime 2014-12-31T23:55:00.000Z> a-org.joda.time.DateTime)) s) nil]
Schema isn't there yet on pretty errors. But hey, my test reads cleanly, it was simple to write, and I didn't bring in a mocking framework.

See the full code (in the literate-test sort of style I'm experimenting with) on github.




[1] for the record, I much prefer writing code that's a pipeline, so that I only have to unit-test data-in, data-out functions. Then side-effecting functions are only tested in integration tests, not mocked at all. But this was someone else's code I was adding tests around.

[2] Another way to check the output is to have the override put its input into an atom, then check what happened during the assertion portion of the test. Sometimes that is cleaner.

[3] Don't forget to (use-fixtures :once schema.test/validate-schemas) 

Saturday, January 31, 2015

Cropping a bunch of pictures to the same dimensions

Ah, command line tools, they're so fast. And so easy to use on a Mac.

Given a bunch of image files in the same dimensions, that you want to crop to a fixed portion of the image:

1) Install imagemagick
brew install imagemagick
2) put all the images in a directory by themselves, and cd to that directory in the terminal

3) check the size of one of them using an imagemagick command-line utility:

identify IMG_1400.jpg
IMG_1400.jpg JPEG 960x1280 960x1280+0+0 8-bit sRGB 434KB 0.000u 0:00.000

Oh look, that one has a width of 960 and a height of 1280.

4) crop one of them, look at it, tweak the numbers, repeat until you get the dimensions right:
convert IMG_1400.jpg -crop 750x590+60+320 +repage test.jpg
Convert takes an input file, some processing instructions, and an output file. Here, I'm telling it to crop the image to this geometry (widthxheight+xoffset+yoffset), and then make the output size match what we just cropped it to.

The geometry works like this: move down by the y offset and to the right by the x offset. From this point, keep the portion below and to the right that is as wide as width and as tall as height.

5) Create an output directory.
mkdir output
6) Figure out how to list all your input files. Mine are all named IMG_xxxx.jpg so I can list them like this:
ls IMG_*.jpgIMG_1375.jpg IMG_1380.jpg IMG_1385.jpg...
7) Tell bash to process them all:[1]
for file in `ls IMG*.jpg`
do
echo $file
convert $file  -crop
750x590+60+320 +repage output/$file
done
8) Find the results in your output directory, with the same names as the originals.

-----
[1] in one line:
for file in `ls IMG*.jpg`;> do echo $file; convert $file  -crop 7750x590+60+320 +repage out/$file; done

Tuesday, January 20, 2015

Application vs. Reference Data

At my very first programming job, I learned a distinction between two kinds of data: Reference and Application.

Application data is produced by the application, consumed and modified by the software. It is customers, orders, events, inventory. It changes and grows all the time. Typically a new deployment of the app starts with no application data, and it grows over time.

Reference data is closer to configuration. Item definitions, tax rates, categories, drop-down list options. This is read by the application, but changed only by administrative interfaces. It's safe to cache reference data; perhaps it updates daily, or hourly, at the most. Often the application can't even run without reference data, so populating it is part of deployment.


Back at Amdocs we separated these into different database schemas, so that the software had write access to application data and read-only access to reference data. Application data had foreign key relationships to reference data; inventory items referenced item definitions, customers referenced customer categories. Reference data could not refer to application data. This follows the Stable Dependencies principle: frequently-changing data depended on rarely-changing data, never the other way around.

These days I don't go to the same lengths to enforce the distinction. It may all go in the same database, there may be no foreign keys or set schemas, but in my head the classification remains. Which data is essential for application startup? Reference. Which data grows and changes frequently? Application. Thinking about this helps me avoid circular dependencies, and keep a clear separation between administration and operation.

My first job included some practices I shudder at now[1], but others stick with me. Consider the difference between Reference and Application the next time you design a storage scheme.


[1] Version control made of perl on top of cvs, with file locking. Unit tests as custom drivers that we threw away. C headers full of #DEFINE. High-level design documents, low-level design documents, approvals, expensive features no one used. Debugging with println... oh wait, debugging with println is awesome again. 

Saturday, January 17, 2015

Fun with Optional Typing: narrowing errors

After moving from Scala to Clojure, I miss the types. Lately I've been playing with Prismatic Schema, a sort of optional typing mechanism for Clojure. It has some surprising benefits, even over Scala's typing sometimes. I plan some posts about interesting ones of those, but first a more ordinary use of types: locating errors.

Today I got an error in a test, and struggled to figure it out. It looked like this:[1]

expected: (= [expected-conversion] result)
  actual: (not (= [{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}] ([{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}])))

Hideous, right? It's super hard to see what's different between the expected and actual there. (The colors help, but the terminal doesn't give me those.)

It's hard to find the difference because the difference isn't content: it's type. I expected a vector of a map, and got a list of a vector of a map. Joy.

I went back and added a few schemas to my functions, and the error changed to

  actual: clojure.lang.ExceptionInfo: Output of calculate-conversions-since does not match schema: [(not (map? a-clojure.lang.PersistentVector))]

This says my function output was a vector of a vector instead of a map. (This is one of Schema's more readable error messages.)

Turns out (concat (something that returns a vector)) doesn't do much; I needed to (apply concat to-the-vector).[2]

Clojure lets me keep the types in my head for as long as I want. Schema lets me write them down when they start to get out of hand, and uses them to narrow down where an error is. Even after I spotted the extra layer of sequence in my output, it could have been in a few places. Adding schemas pointed me directly to the function that wasn't doing what I expected.

The real point of types is that they clarify my thinking and document it at the same time. They are a skeleton for my program. I like Clojure+Schema because it lets me start with a flexible pile of clay, and add bones as they're needed.

-----
[1] It would be less ugly if humane-test-output were activated, but I'm having technical difficulties with that at the moment.
[2] here's the commit with the schemas and the fix.