Saturday, January 31, 2015

Cropping a bunch of pictures to the same dimensions

Ah, command line tools, they're so fast. And so easy to use on a Mac.

Given a bunch of image files in the same dimensions, that you want to crop to a fixed portion of the image:

1) Install imagemagick
brew install imagemagick
2) put all the images in a directory by themselves, and cd to that directory in the terminal

3) check the size of one of them using an imagemagick command-line utility:

identify IMG_1400.jpg
IMG_1400.jpg JPEG 960x1280 960x1280+0+0 8-bit sRGB 434KB 0.000u 0:00.000

Oh look, that one has a width of 960 and a height of 1280.

4) crop one of them, look at it, tweak the numbers, repeat until you get the dimensions right:
convert IMG_1400.jpg -crop 750x590+60+320 +repage test.jpg
Convert takes an input file, some processing instructions, and an output file. Here, I'm telling it to crop the image to this geometry (widthxheight+xoffset+yoffset), and then make the output size match what we just cropped it to.

The geometry works like this: move down by the y offset and to the right by the x offset. From this point, keep the portion below and to the right that is as wide as width and as tall as height.

5) Create an output directory.
mkdir output
6) Figure out how to list all your input files. Mine are all named IMG_xxxx.jpg so I can list them like this:
ls IMG_*.jpgIMG_1375.jpg IMG_1380.jpg IMG_1385.jpg...
7) Tell bash to process them all:[1]
for file in `ls IMG*.jpg`
do
echo $file
convert $file  -crop
750x590+60+320 +repage output/$file
done
8) Find the results in your output directory, with the same names as the originals.

-----
[1] in one line:
for file in `ls IMG*.jpg`;> do echo $file; convert $file  -crop 7750x590+60+320 +repage out/$file; done

Tuesday, January 20, 2015

Application vs. Reference Data

At my very first programming job, I learned a distinction between two kinds of data: Reference and Application.

Application data is produced by the application, consumed and modified by the software. It is customers, orders, events, inventory. It changes and grows all the time. Typically a new deployment of the app starts with no application data, and it grows over time.

Reference data is closer to configuration. Item definitions, tax rates, categories, drop-down list options. This is read by the application, but changed only by administrative interfaces. It's safe to cache reference data; perhaps it updates daily, or hourly, at the most. Often the application can't even run without reference data, so populating it is part of deployment.


Back at Amdocs we separated these into different database schemas, so that the software had write access to application data and read-only access to reference data. Application data had foreign key relationships to reference data; inventory items referenced item definitions, customers referenced customer categories. Reference data could not refer to application data. This follows the Stable Dependencies principle: frequently-changing data depended on rarely-changing data, never the other way around.

These days I don't go to the same lengths to enforce the distinction. It may all go in the same database, there may be no foreign keys or set schemas, but in my head the classification remains. Which data is essential for application startup? Reference. Which data grows and changes frequently? Application. Thinking about this helps me avoid circular dependencies, and keep a clear separation between administration and operation.

My first job included some practices I shudder at now[1], but others stick with me. Consider the difference between Reference and Application the next time you design a storage scheme.


[1] Version control made of perl on top of cvs, with file locking. Unit tests as custom drivers that we threw away. C headers full of #DEFINE. High-level design documents, low-level design documents, approvals, expensive features no one used. Debugging with println... oh wait, debugging with println is awesome again. 

Saturday, January 17, 2015

Fun with Optional Typing: narrowing errors

After moving from Scala to Clojure, I miss the types. Lately I've been playing with Prismatic Schema, a sort of optional typing mechanism for Clojure. It has some surprising benefits, even over Scala's typing sometimes. I plan some posts about interesting ones of those, but first a more ordinary use of types: locating errors.

Today I got an error in a test, and struggled to figure it out. It looked like this:[1]

expected: (= [expected-conversion] result)
  actual: (not (= [{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}] ([{:click {:who {:uuid "aeiou"}, :when #<DateTime 2014-12-31T23:00:00.000Z>}, :outcome {:who {:uuid "aeiou"}, :when #<DateTime 2015-01-01T00:00:00.000Z>, :what "bought 3 things"}}])))

Hideous, right? It's super hard to see what's different between the expected and actual there. (The colors help, but the terminal doesn't give me those.)

It's hard to find the difference because the difference isn't content: it's type. I expected a vector of a map, and got a list of a vector of a map. Joy.

I went back and added a few schemas to my functions, and the error changed to

  actual: clojure.lang.ExceptionInfo: Output of calculate-conversions-since does not match schema: [(not (map? a-clojure.lang.PersistentVector))]

This says my function output was a vector of a vector instead of a map. (This is one of Schema's more readable error messages.)

Turns out (concat (something that returns a vector)) doesn't do much; I needed to (apply concat to-the-vector).[2]

Clojure lets me keep the types in my head for as long as I want. Schema lets me write them down when they start to get out of hand, and uses them to narrow down where an error is. Even after I spotted the extra layer of sequence in my output, it could have been in a few places. Adding schemas pointed me directly to the function that wasn't doing what I expected.

The real point of types is that they clarify my thinking and document it at the same time. They are a skeleton for my program. I like Clojure+Schema because it lets me start with a flexible pile of clay, and add bones as they're needed.

-----
[1] It would be less ugly if humane-test-output were activated, but I'm having technical difficulties with that at the moment.
[2] here's the commit with the schemas and the fix.

Monday, January 12, 2015

Readable, or reason-aboutable?

My coworker Tom finds Ruby unreadable.
What?? I'm thinking. Ruby can be quite expressive, even beautiful.
But Tom can't be sure what Ruby is going to do. Some imported code could be modifying methods on built-in classes. You can never be sure exactly what will happen when this Ruby code executes.

He's right about that. "Readable" isn't the word I'd use though: Ruby isn't "reason-aboutable." You can't be completely sure what it's going to do without running it. (No wonder Rubyists are such good testers.)

Tom agreed that Ruby could be good at expressing the intent of the programmer. This is a different goal from knowing exactly how it will execute.

Stricter languages are easier to reason about. In Java I can read the specification and make inferences about what will happen when I use the built-in libraries. In Java, I hate the idea of bytecode modification because it interferes with that reasoning.

With imperative code in Java or Python, where what you see it what you get, you can try to reason about these by playing compiler. Step through what the computer is supposed to do at each instruction. This is easier when data is immutable, because then you can trace back to the one place it could possibly be set.

Beyond immutability, the best languages and libraries offer more shortcuts to reasoning. Shortcuts let you be sure about some things without playing compiler through every possible scenario. Strong typing helps with this: I can be sure about the structure of what the function returns, because the compiler enforces it for me.

Shortcuts are like, I can tell 810 is divisible by 3 because its digits add to a number divisible by 3. I don't have to do the division. This is not cheating, because this is not coincidence; someone has proven this property mathematically.

Haskell is the most reason-aboutable language, because you can be sure that the environment won't affect execution of the code, and vice-versa, outside of the IO monad. Mathematical types like monoids and monads help too, because they come with properties that have been proven mathematically. Better than promises in the documentation. More scalable than playing compiler all day.[1]

"Readability" means a lot of things to different people. For Tom, it's predictability: can he be sure what this code will do? For many, it's familiarity: can they tell at a blink what this code will do? For me, it's mostly intent: can I tell what we want this code to do?

Haskellytes find Haskell the most expressive language, because it speaks to them. Most people find it cryptic, with its terse symbols. Ruby is well-regarded for expressiveness, especially in rich DSLs like RSpec.

Is expressiveness (of the intent of the programmer) in conflict with reasoning (about program execution)?


[1] "How do you really feel, Jess?"

We want to keep our programs simple, and avoid unnecessary complexity. The definition of a complex system is: the fastest way to find out what will happen is to run it. This means Ruby is inviting complexity, compared to Haskell. Functional programmers aim for reason-aboutable code, using all the shortcuts (proven properties) to scale up our thinking, to fit more in our head. Ruby programmers trust inferences made from example tests. This is easier on the brain, both to write and read, for most people. It is not objectively simpler.

Friday, January 9, 2015

Spring cleaning of git branches

It's time to clean out some old branches from the team's git repository. In memory of them, I record useful tricks here.

First, Sharon's post talks about finding branches that are ripe for deletion, by detecting branches already merged. This post covers those, plus how to find out more about the others. This post is concerned with removing unused branches from origin, not locally.

Here's a useful hint: start with
git fetch -p
to update your local repository with what's in origin, including noticing which branches have been deleted from origin.
Also, don't forget to
git checkout mastergit merge --ff-only
so that you'll be on the master branch, up-to-date with origin (and won't accidentally create a merge commit if you have local changes).

Next, to find branches already merged to master:

git branch -a --merged

This lists branches, including remote branches (the ones on origin), but only ones already merged to the current branch. Note that the argument order is important; the reverse gives a silly error.  Here's a one-liner that lists them:
git branch -a --merged | grep -v -e HEAD -e master | grep origin | cut -d '/' -f 3- 
This says, find branches already merged; exclude any references to master and HEAD; include only ones from origin (hopefully); cut out the /remotes/origin/ prefix.

The listed branches are safe to delete. If you're brave, delete them permanently from origin by adding this to the previous command:
 | xargs git push --delete origin
This says, take all those words and put them at the end of this other command, which says "delete these references on the origin repository."

OK, those were the easy ones. What about all the branches that haven't been merged? Who created those things anyway, and how old are they?

git log --date=iso --pretty=format:"%an %ad %d" -1 --decorate

is a lovely command that lists the author, date in ISO format (which is good for sorting), and branches and tags of the last commit (on the current branch, by default).

Use it on all the branches on origin:
git branch -a | grep origin | grep -v HEAD | xargs -n 1 git log --date=iso --pretty=format:"%an %ad %d%n" -1 --decorate | grep -v master | sort
List remote branches; only the ones from origin; exclude the HEAD, we don't care and that line is formatted oddly; send each one through the handy description; exclude master; sort (by name then date, since that's the information at the beginning of the line).

This gives me a bunch of lines that look like:

Shashy 2014-08-15 11:07:37 -0400  (origin/faster-upsert)
Shashy 2014-10-23 22:11:40 -0400  (origin/fix_planners)
Shashy 2014-11-30 06:50:57 -0500  (origin/remote-upsert)
Tanya 2014-10-24 11:35:02 -0500  (origin/tanya_jess/newrelic)
Tanya 2014-11-13 10:04:48 -0600  (origin/kafka)
Yves Dorfsman 2014-04-24 14:43:04 -0600  (origin/data_service)
clinton 2014-07-31 16:26:37 -0600  (origin/warrifying)
clinton 2014-09-15 13:29:14 -0600  (origin/tomcat-treats)

Now I am equipped to email those people and ask them to please delete their stray branches, or give me permission to delete them.


Thursday, January 1, 2015

Systems Thinking about WIT

Systems (and by "systems" I mean "everything") can be modeled as stocks and flows.[1] Stocks are quantities that accumulate (or disappear) over time, like money in a bank account or code in your repository. They may not be physical, but they are observable in some sense. Flows are how the stock is filled, and how it empties. Flows have valves representing the variable rates of ingress and outgress.
salary flows into the stock of money; spending flows out

Stocks don't change directly; they grow or shrink over time according to the rates of flow. If your salary grows while spending stays constant, money accumulates in your account. Flows are affected by changes to the rules of the system, and sometimes by the levels in each stock. The more money in your bank account, the faster you spend it, perhaps. When a flow affects a stock and that stock affects the flow, a feedback loop happens, and then things get interesting. If the more money in your bank account, the more you invest, and then the more money in your bank account... $$$$$!

salary and investment income flow into the account; spending flows out. An arrow indicates that the amount in the account affects the investment flow rate.

This is a reinforcing feedback loop, marked with an R. It leads to accelerating growth, or accelerating decline.

Enough about money, let's talk about Women[2] in IT. We can model this. The stock is all the women programmers working today. They come in through a pipeline, and leave by retiring.[3]
the pipeline inputs women into IT; they flow out through retirement.

People also leave programming because there is some other, compelling thing for them to do. This includes raising a family, writing a novel, etc. I'll call this the "Something else" flow. It should be about the same rate as in other lines of work.
Then there are women who drain out of IT because they feel drained. They're tired of being the only woman on the team, tired of harassment or exclusion, tired of being passed over for promotion. Threatened until they flee their homes, in some cases. Exhausted from years of microaggressions, in more. I'll call it the "F___ This" outflow.
The pipeline inputs to the stock of Women in IT; three outflows are Something Else; retirement; and F This.


There are feedback loops in this system. The more women in IT, the more young women see themselves as possible programmers, the more enter the pipeline. Therefore, the flow of the input pipeline depends on the level of the stock. This is a reinforcing feedback loop: the more already in, the more come in. The fewer already in, the fewer enter.
same picture as before, but an arrow indicates that the quantity of women in IT affects the rate of flow through the pipeline. Marked with R.


The quantity of women programmers also affects the experience for each of us. At any given conference or company, a few assholes are mean, and those microaggressions or harassments hit the women present. The fewer women, the more bullshit each of us gets. Also the less support we have when we complain. This is also a reinforcing feedback loop: the lower the number of women in IT, the more will say "F___ This" and leave.[4] If there are ever as many women programmers as men, that outflow might be the same as in other fields. Until then, the outflow gets bigger as the stock gets lower.
same picture, with additional arrow indicating that the quantity of women in IT affects the rate of the F This outflow. Marked with R.


One of my friends pointed at the pipeline and said, "It's not hopeless, because women are a renewable resource." He hopes we can overwhelm the outflows with a constant influx of entry-level developers. In Thinking in Systems[1], Meadows remarks that people have a tendency to focus on inflows when they want to change a system. Sure enough, there's a lot of focus these days, a lot of money and effort, toward growing the input pipeline. Teach young girls programming. Add new input pipelines: women-only bootcamps, or programming for moms. These are useful efforts. Yet as Meadows points out, the outflows have just as much effect on the stock.

Can we plug the holes in this bathtub? Everyone in IT affects the environment in IT. We can all help close the "F___ This" outflow. See Kat Hagan's list of ways we can each make this environment less sexist.  Maybe in a generation it'll get better?

This isn't new information. Let's add granularity to the system, and see what we learn. Divide the stock into two stocks: junior and senior women developers. The pipeline can only provide junior developers. Experience takes them to senior level, and finally retirement. Junior people are likely to find other careers, while senior people have been here long enough we're probably staying so I won't draw that as an outflow. The "F___ This" outflow definitely exists for both.
pipeline flows into a stock of Junior women. Outflows of Something Else and F This lead nowhere, while an outflow called Experience leads to a stock of Senior Women. Outflows from here are retirement and F This.


Consider now the effects of the two stocks on the flows. The senior developers are most impactful on all of them - on the pipeline as teachers, on "F___ This" as influencers in the organization, and even on the "Something else" outflow as role models! The more women architects and conference speakers and book authors and bloggers a junior dev sees, the more she sees a future for herself in this career. Finally, the stock of senior women impacts the rate at which younger women move into senior roles, through mentorship and by impacting expectations in the organization and community.
Same picture, with the addition of arrows indicating the affect of quantity of Senior Women on the input pipeline, the Something Else outflow, and both F This outflows. Also, arrows indicate an effect of quantity of junior women on input pipeline and their own F This outflow.


To sustainably change the stocks in a system, work with the feedback loops. In this case, the senior developer stock impacts flows the most. To get faster results that outlast the flows of money into women-only bootcamps, it's senior developers we need the most. And the senior-er the better. Zoom in on this box, and some suggestions emerge.


An Experience flow brings in Senior Women in IT; they flow out through retirement and through F This.

Inflows 

1. Mentor women developers. Don't make her wait for another woman to do it; she may wait forever, or leave first. Overcome the awkward.
2. Consider minorities carefully for promotion. A woman is rarely the obvious choice for architect or tech lead, so use conscious thought to overcome cognitive bias.

Outflows

1. Encourage senior women to stay past minimum retirement age. Our society doesn't value older women as much as older men. Can we defy that in our community?
2. Don't tolerate people who are mean, especially to women who are visibly senior in the community. See Cate Huston's post: don't be a bystander to cruelty or sexism. And above all, when a woman tells you something happened, believe her.[5] For every one who speaks up, n silently walked away.

Magnify the impact

1. Look for women to speak at your conference. Ask early, pay their travel, do what it takes to help us help the community.
2. Retweet women, repost blogs, amplify. I want every woman in development to know that this is a career she can enjoy forever.

Stocks don't change instantaneously. Flows can. It takes time for culture to improve, and time for that improvement to show up in the numbers. With reinforcing feedback loops like this, waiting won't fix anything - until we change the flows. Then time will be on our side.

Finally, please don't take a handful of counterexamples as proof that the existing system is fine. Yes, we all respect Grace Hopper. Yes, there are a few of us with unusual personalities who aren't affected by the obstacles most encounter. Your bank account is in the black with just a few pennies, but that won't make you rich.


[1] Thinking in Systems, by Donella Meadows. On Amazon or maybe in pdf
[2] Honestly women have it easier than a lot of other minorities, but I can only speak from any experience about women. Please steal these ideas if you can use them.
[3] There's a small flow into Women in IT from the stock of Men in IT, which is awesome and welcome and you are fantastic.
[4] Cate Huston talked about "how dismal the numbers were, and how the numbers were bad because the experience was bad, and how the numbers wouldn’t change unless the experience changed" in her post on Corporate Feminism.
[5] Anita Sarkeesian on "the most radical thing you can do to help: actually believe women when they talk about their experiences."