Wednesday, January 29, 2014

Choosing an ExecutorService

TL;DR:

When you need an ExecutorService, the Executors class has several for you. Sometimes it doesn't matter which you choose, and other times it matters a LOT. The above flow chart is approximate. If it's simple parallel computation you're doing, then a fixed pool with as many threads as CPUs works. If those computations start other computations, then you'll want a ForkJoinPool (that's another post). If the purpose of threading is to avoid blocking on I/O, that's different. Maybe you don't want to limit the number of threads that can wait on I/O, and then Executors.newCachedThreadPool is a good choice.

When you would like to limit the number of threads you start AND your tasks might start other tasks in the same thread pool, then you must worry about Pool Induced Deadlock. Then you need to think about what happens when one task needs to start another. It's time to get into the nitty-gritty of a ThreadPoolExecutor. This is what the methods on Executors construct for you under the hood, and you can make your own for finer control. To choose them, you need to understand what the ThreadPoolExecutor does when a new task comes in.

Here's a nice new ThreadPoolExecutor.

Some tasks come in, Runnables passed through execute or submit. At first the ThreadPoolExecutor happily starts a new thread per task. It does this up to the core pool size. Note that even if a thread is idle, it'll still start a new thread per task until the core number of threads are running. These are up until the pool shuts down, unless you configure it otherwise.[1]


If all the core threads are busy, then the ThreadPoolExecutor will begin storing Runnables in its queue. The BlockingQueue passed in at construction can have capacity 0, MAX_INT, or anywhere in between. Note that Runnables are stored here even if the maximum number of threads are not running. This can cause deadlock if the tasks in the core threads are waiting on tasks they submitted.
Only if the queue is full will the ThreadPoolExecutor start more than the core number of threads. It'll start them up to the maximum pool size, only as long as the queue is full and more tasks are coming in.

Finally, if there's no more room in the queue and no more threads allowed, submitting a task to this ThreadPoolExecutor will throw RejectedExecutionException. (You can configure it to drop work, or to tell the calling thread to do its own task instead.[2])
The pictured case is an unusual one, where core size, queue capacity, and max size are all nonzero and finite. A more common case is FixedThreadPool, with a fixed number of threads and effectively infinite queue. Threads will start up and stay up, and tasks will wait their turns. The other common case is CachedThreadPool, with an always-empty queue and effectively infinite threads. Here, the threads will time out when they're not busy.

If you need something in between, you can construct it yourself. The fixed thread pool is good to put a limit on I/O or CPU context switches. The cached one avoids pool-induced deadlock. If you're doing interesting recursive calculations, then look into ForkJoinPool.

Aside: All of these ExecutorServices will, by default, start Threads that keep your application alive until they're done. If you don't like that, build a ThreadFactory that sets them up as daemon threads, and pass it in when you create the ExecutorService.

Bonus material: Scala REPL code that I used to poke around in the REPL and check these out.
If you're in Scala, a great way to create a thread pool is Akka's ThreadPoolBuilder.

Example: The thermometer-looking drawings represent tasks either in process or queued inside the ExecutorService. Red and green represent tasks in process, and amber ones are sitting in the queue. As tasks are completed, the level drops. As tasks are submitted to the ExecutorService, the level goes up.
If I set up an ExecutorService like this:

new ThreadPoolExecutor(
  5, // core pool size
  8, // max pool size, for at most (8 - 5 = 3) red threads
  10, TimeUnit.SECONDS, // idle red threads live this long
  new ArrayBlockingQueue(7)); // queue with finite maximum size

Assuming no tasks complete yet: The first 5 tasks submitted will start up threads; the next 7 tasks will queue; the next 3 tasks coming in will cause more threads to be started. (Tasks will be pulled from the front of the queue and put on the new threads, so the last tasks submitted can fit at the back of the queue.) Any more submission attempts throw RejectedExecution exception.

------
[1] Core threads will be closed when idle only if you set that up: threadPoolExecutor.allowCoreThreadTimeout(true)
[2] Do this by supplying a different RejectedExecutionHandler in the constructor.

Fun with Pool-Induced Deadlock


Did you know that a thread can achieve deadlock with itself? It can happen in any thread pool of constrained size. Watch out for... Pool-Induced Deadlock! [1]

This is easiest in a pool of size 1. Run a task in the pool, and from there, run a task in the same pool. While the outer task waits for the inner to complete, the inner waits for that thread to become available.

This is easy to show in scalaz.

import scalaz.concurrent.Task
import java.util.concurrent.Executors

val es1 = Executors.newFixedThreadPool(1)
def sayHiInTheFuture = Task { println("Hi!") }(es1)

val poolInducedDeadlock = Task { sayHiInTheFuture.run } (es1)
poolInducedDeadlock.run

... and your REPL session is dead. You'll have to Ctrl-C it. It never even says "Hi!" to you.

Why would you run a Task within a Task? We should use flatMap instead, to extend the calculation. That is trivial in this example, but the inner Task-run might be buried inside functions or library calls, and not obvious or extractable.

Why would you ever run those Tasks on the same single-thread pool? Perhaps you don't pass an ExecutorService into Task at all; it has a default. The default has as many threads as your computer has CPUs. Run on a machine with only one CPU, and herk. If your laptop has eight CPUs and the build server has one, you'll be surprised when your build hangs.
Or if your build server has four CPUs, then the test that runs four of these Tasks at the same time will freeze up. And it won't only freeze its own work: anything that wants to run on that default pool hangs forever.
This shows how side effects are dangerous: that call to Runtime.getRuntime.availableProcessors in the default executor service seems reasonable, but it's reaching into the outside world. Switching machines then changes the behaviour of your code.

Native Scala futures have two ways of preventing Pool-Induced Deadlock: timeouts and a magic execution context. Here is a similar example using Scala futures, this time submitting as many nested futures as we have processors:

import scala.concurrent._
import duration._
import java.util.concurrent.Executors

val n = Runtime.getRuntime.availableProcessors
val ecn = ExecutionContext.fromExecutorService(
             Executors.newFixedThreadPool(n))
// val ecn = ExecutionContext.global
def sayHiInTheFuture = future { println("Hi!") }(ecn)

val futures = Range(0,n).map { _ =>
  future { 
    try { 
      Await.ready(sayHiInTheFuture, 5 seconds) 
    } catch {
      case t: TimeoutException => println("Poo!")
    }
  } (ecn)
}

You'll see Poo before it ever says Hi, since  Await.ready times out. It's submitting the new future to its own thread pool, then blocking until it returns. Meanwhile, sayHiInTheFuture is waiting for its turn on the single thread. Good thing there's a timeout! The timeout solves the deadlock, and it's required by the Await call. Scala believes that waiting for futures is evil.

There's another reason you're less likely to see pool-induced deadlock in Scala futures: the suggested ExecutionContext is scala.concurrent.ExecutionContext.Implicits.global, which is magic. Try the above example with the global context instead, and it works fine: n Hi's and no Poo at all.

I can get Pool-Induced Deadlock on the global context, but only if I don't use Await.[2] This can be achieved; we hit it at work using scalaz.

How can we avoid this danger? In Scala, we can always use Await and the global context, if we're not blocking on I/O or anything else slow. In Java or Scala, we can choose cached thread pools without fixed limits. Any library or deep function that blocks on a Task or future should use an ExecutorService of its own. See the next post for suggestions.
Threading is hard, at least on the JVM.

------------
[1] described in Programming Concurrency on the JVM, by @venkat_s

[2] Here's a simple example. scalaz uses a CountDownLatch internally on its Futures.

import scala.concurrent._

import java.util.concurrent.CountDownLatch

val n = Runtime.getRuntime.availableProcessors
val ecng = ExecutionContext.global

val futures = Range(0,n).map { _ =>
  future { 
      val c = new CountDownLatch(1)
      future { c.countDown }(ecng)
      c.await  
      println("Something!")
  } (ecng)
}

Saturday, January 25, 2014

Star Trek and Computer Science

I was a geek as a teenager. In choir, I got in trouble for reading my Star Trek book during idle time, instead of whispering to each other like the other girls. I quit choir, and I joined the local Star Trek Club, along with my one geeky friend. Then in high school, my one geeky friend left for private school, and I was alone in my awkwardness. At lunch, I sat at "the retard table" because it was the one table that rejected no one.

I never thought about programming as a career, but I did some. On the little Macintosh at home, on whatever the equivalent of Access was. On my TI-85. On the Apple II-e at school, where I learned to make LOGO beep songs, and then learned about infinite loops. No one else I knew cared.

Those teenage years were the hardest. It's a good thing I had Grandmarti.[1]

Everyone wants to feel welcome. People go where we anticipate a feeling of belonging. Yet when we get a feeling of belonging, really get it, it isn't in a large group. It's one on one. For my young self, this was Grandmarti. She was the person always happy to see me, who wants to hear what I'm thinking about. That person I connect to as a fellow being, with feelings and interests as varied as my own. Real connection happens between individuals. We search for this by heuristic, looking for a place where we fit in better than others, a group of people like us.

I didn't fit in with the other girls at school, never had a group of girlfriends. Then in college, I went to engineering school, where it was easy to become one of the guys. I like being the center of attention in subtle ways, so it never bothered me to be the only woman in a group of up to ten people. By now, that gender ratio feels like the normal state of things.

My situation is an exception. In general, when a person walks into a group, and there's an obvious physical characteristic that's different between the person and most of the other people, it puts the person on edge. Perhaps it's a man joining a table of women, or a black person in a room of a hundred white people, or you walking alone into a dinner party of couples. There's a heightened sensitivity to subtle social signs. A person's subconscious searches for clues: "Is this difference relevant? Do I belong here?"[2] If the women are talking about pregnancy and the couples are holding hands, then the odd person out feels excluded. Without aiming to isolate anyone, the majority have signaled that the difference is relevant. The feeling of belonging is fragile when there's an unmistakable physical difference.

My hometown was such a desert of geek culture that I didn't know Monty Python existed until the summer after 10th grade. That was Missouri Scholars Academy, a 3-week summer program for the smartest kids in the state. It was the first time I found peers who considered intelligence a virtue. Finally I felt accepted, welcomed, even attractive. When it ended, my new world caved, and I cried for days.

That feeling of exclusion and then suddenly belonging, many of us geeks found our tribe with each other. And somehow, many of us also became programmers. It is among programmers I've met most of my closest friends. People who taught me about philosophy, joy, sexuality, science, and learning. Individuals with whom I find the deep sense of belonging I always wished for.

Halfway through my physics degree, my aunt and her friend found an internship for me at Federal Express. It was computer programming, and I figured I could do it. Sure enough. At that job I learned that I like 9-5 work, and that programming is in demand, fun, and easy for me. I added a minor in Computer Science.

It wasn't a career I considered. People never consider every possibility - we'd go mad! Humans always make unconscious decisions about where to direct their attention, based on visual cues and verbal suggestions, subtle and explicit. I was lucky that people pointed me to this option.

So many geeks code that there's a stereotype in our culture: programmers like Star Trek and Star Wars. They stay up late and drink a lot of Coke or Mountain Dew and play video games. Heck, we have conferences themed around the latest Sci Fi movies, bacon, and beer. Those of us rejected for geeky predilections in our school days now rejoice: geek-culture icons, once a marker of exclusion, now hallmark belonging in this new tribe we have created. And a powerful tribe it is! As programmers, we can make good money, express ourselves online, even change the world.

Together, we are building something even bigger than it seems. This profession has a unique combination: very intelligent people, many with backgrounds that leave us with little attachment to existing social bounds, with a mandate for creativity and leeway to set our own work conditions. We've given rise to whole new ways of working together. Collaboration, knowledge sharing, building on each others' work. Academic professions have conferences for "look at what I did," while programmers have conferences for "look what you can do with this idea or technology." Ideas grow when shared. We are changing the world.

It's such a powerful tribe that some of us look around and say, wait. Why is this important group so monocultural? Why are we 95% white[3] in a country that's 72%? Why are we 80% male? Each of these discrepancies is a spiral, exacerbating itself. It doesn't have to continue this way. The gender discrepancy, at least, may be mediated by social signals we can control.

It happens that Star Trek fans are mostly male. This is probably for historical reasons; the first series portrayed women mostly as territory for Kirk to conquer, while the subsequent series are progressive. It happens that video gamers are mostly male for much more logical and horrible reasons. For whatever combination of causes, Star Trek fandom and video games and staying up late drinking soda are perceived as masculine. Signs of these are read by most women as social cues of not-belonging.[4]

Personally, I've always liked Star Trek, and I enjoy League of Legends. (My favorite character is Jinx, because she enjoys havoc and lacks giant boobs.) Many other women appreciate science fiction and video games. So why should we care that some women are turned off by them?

It's a matter of numbers. If we want programming to be more than 20% women, then we need more than 40% of women to feel welcome. We need most women to feel welcome.

The data say that most women feel less welcome when the programmer stereotype is evoked -- even when it's evoked by a woman in computer science. These people go seek a feeling of belonging in another profession.

Our badges of newly-found belonging have become badges of rejection again. Now they're working in reverse, rejecting everyone but geeks.

As a community, we programmers share some ideals. We strive to write better software. While we use various tools, we agree on goals like readable, reliable code. We want to solve useful problems, and solve them well enough that we don't have to solve them each a million times. We explore new methods of collaboration, and reflect together on how to get better. For examples and allusions, it's convenient to draw on our shared background of Star Wars, Civilization, and Super Mario.

Maybe we shouldn't. If the geek-culture references reinforce a stereotype that drives potential programmers away before they even get started, maybe I should put away my Picard slide and stop referencing the Prime Directive like everybody knows what that means.

Hide away our geek identities?[5] What does this mean? Letting them reject us and push us back into isolation again?

When we form this tribe, come together and enjoy each others' intelligence and humor, what is important? We share cultural icons, and we share cultural values. Some of the icons signal exclusion to people who might otherwise contribute. In school people rejected us for being geeks. Now we have the power, we have a choice. We can reject those who once rejected us. Or we can reject rejection.

There's nothing wrong with Star Trek. There's only perception, perception that Star Trek is masculine and that programmers are Trekkers. It's enough to turn a smart, mathematically-talented cheerleader toward another career path. It's a perception that hurts us.

I'm not ashamed of geekiness. I'll happily share my geeky hobbies with anyone who wants to chat. I'll also share my interest in cognitive science, home birth, complexity theory, polyamory. It's at the individual level that we achieve connection, where the enduring sense of belonging comes. Sharing my full self with another person gives me that chance.

More than 10 years into a perfectly ordinary and happy programming career, someone suggested I get into speaking. I'd never considered it - maybe because I'd never seen a woman speak about programming. It turns out I'm a whiz at that, too. Still, I don't consider myself ambitious: I value candidness and personal connection over prestige. How can I, as a geek, be candidly me and not drive anyone away?

My time on stage is limited. There's plenty I could share that might distance people from me, but what matters for the profession is only what coincides with the stereotype. I can skip the dungeon crawl example and code about my sock collection instead. Boring? Maybe. But it doesn't divide the audience into geeks-who-belong and those-less-worthy. This avoids reinforcing the problem, but doesn't solve it.

How can anyone eliminate a stereotype?

Can't: you can't fight against anything, can't aim for a "not." We can only aim for something else. Currently when the majority of our culture thinks about "programmer," they hit upon the stereotypical geek, which also happens to be white and male. The alternative stereotypes we've seen so far are even worse: the neckbeard, the brogrammer.

Can: we can choose a better stereotype. What cues might correspond to programming potential? A likely programmer is smart, analytical, curious. They question traditions and violate irrelevant social norms. They share information: honesty, even bluntness. What visual cues associate with these qualities? Perhaps a preferable programmer stereotype has purple hair, a book, and colorful socks.

It's ridiculous that we should put away our Star Trek slides to make women feel more welcome in our profession. "Ridiculous" doesn't mean "false." In today's culture, geekiness is by white men, and programming is for geeks. If we care about welcoming a fair number of women and not-white people, we don't ask all of them to change. We tweak the system and the environment. We accommodate the perceptions they have, until those perceptions change as we get to know each other as individuals.

I am lucky. Lucky to be an outlier in many ways: geeky enough to be one of the guys, smart enough to make up for gender bias, confident enough to laugh at people who doubt me. Lucky to have received outside suggestions about programming and then speaking. I'm not alone, but I am an exception. Welcoming a handful of women is not welcoming women.

I'm going to be as geeky as I darn well please. I'm also going to be as programmery as I like. I won't make the assumption that the two are interdependent.[6]

This feeling of belonging in a group, it doesn't get stronger with exclusivity. It gets stronger by extending that belonging to others. Grandmarti always welcomed people into her home, people from different classes and races and backgrounds. She learned from every one of them, and our family was richer for it. Programming is awesome, our community is awesome, as we learn together about thinking and collaboration and our own humanity. Let's not keep it a path for geeks: make a path for people.[7]

---------------------------------

[1] my grandmother
[2] it's called stereotype threat
[3] I made this number up. Anyone know the real number?
[4] full article
[5] identifying with any interest is overrated
[6] Separation of Concerns
[7] path is not covariant in this sentence





Sunday, January 12, 2014

Removing files from git

TL;DR - "git rm --cached <file>" means "Yo git, as far as you know, this file is gone"
In a git repo, there are three places for files to be:
1) In your working directory
2) In the staging area
3) In the most recent commit.

Getting rid of a file means moving it out of all 3 places.
1) by deleting it
   rm <file>
2) by removing it from the staging area:
   git rm <filename>
3) by committing that removal.
   git commit -m "die stupid file die"

When you want the file to remain on your filesystem but NOT in the repo, then tell git to ignore it. But that isn't enough! You also have to get it out of the repo.

1) tell git to ignore it: add the file or directory name to .gitignore[1]
2) get it out of the staging area BUT NOT the working directory:
  git rm --cached <filename>
3) If the file has been committed before, commit that removal, along with your .gitignore changes:
  git add .gitignore
  git commit -m "hide stupid file hide"

Git etiquette: package the .gitignore updates along with the removal of the newly-ignored files in one commit.
Warning: if you ever check out a commit that doesn't have that file in .gitignore, whatever's in the commit will overwrite your current one. No warnings. I hope this was some sort of build output that you can regenerate.
Sometimes the file has never been committed, but it was accidentally added to the staging area, and now you want git to leave you alone and ignore that file already!

Delete the file and remove it from the staging area in one easy step:
1) git rm -f pee

Or keep it, and tell git to leave it the freak alone:
1) tell git to ignore it: add the file or directory name to .gitignore
2) get it out of the staging area BUT NOT the working directory:
  git rm --cached <filename>

Terminology: the "staging area" is also called the "index" and the "cache," for historical reasons.
If this seems complicated... yeah, I agree. If you know that "git rm --cached <file>" means "Yo git, take this file out of you," that'll get you through most of the frustration.

---------------
[1] For more ways to ignore files, and when to use each: http://jessitron.github.io/git-happens/ignore.html



Friday, January 3, 2014

Testing akka: turn on debug

Testing is easier when you can see what messages are received by your actors. To do this, add logging to your real code, and then change your configuration. This post shows how to hard-code some debugging configuration for testing purposes.

Step 1: put logging into your receive method, by wrapping your existing receive definition in LoggingReceive:

import akka.event.LoggingReceive
def receive = LoggingReceive({ case _ => "your code here" })

Step 2: In actor system creation, put some hard-coded configuration in front of the default configuration.

val config: Config = ConfigFactory.parseString("""akka {
         loglevel = "DEBUG"
         actor {
           debug {
             receive = on
             lifecycle = off
           }
         }
       }""").withFallback(ConfigFactory.load())
val system = ActorSystem("NameOfSystem", config)

Now when you run you'll see messages like:
[DEBUG] [01/03/2014 16:41:57.227] [NameOfSystem-akka.actor.default-dispatcher-4] [akka://Seqosystem/user/$c] received handled message Something
Step 3: This is a good time to add names for your actors at creation, so you'll see them in the log. 

val actor = system.actorOf(Props(new MyActor()), "happyActor")
[DEBUG] [01/03/2014 16:41:57.227] [NameOfSystem-akka.actor.default-dispatcher-4] [akka://Seqosystem/user/happyActor] received handled message Something
Now at least you have some indication of messages flying around during your tests. Do you have any other pointers for debugging actor systems?