REdeploy (for the first time)

The inaugural REdeployConf wrapped up yesterday (as I write this). I’m already feeling withdrawal from intense learning and conversations. I’ll attempt to summarize them in this post.

The RE in REdeploy doesn’t mean “again” (lo, it is the first of its kind). RE stands for Resilience Engineering. It is a newish field, focused on sociotechnical systems that continue to function in shifting, surprising, always-failing-somewhere conditions (aka, reality).

John Allspaw opened the conference with: resilience is in the humans. Your software might be robust, but in the end, it does what it was told. Only humans respond in new ways to new situations. People can be prepared to be unprepared.

John Allspaw is so excited this conference exists

Resilience is the antidote to complexity. Except not a full antidote: the complexity is still there. It just doesn’t kill us. Complexity is not avoidable, because success begets complexity. A successful system has impact, and impact means interdependence, and interdependence means complexity.

What is resilience? Laura Maguire enumerated some definitions. Rebound, robustness, and graceful extensibility are partial definitions that build into the real one: Resilience is sustained adaptive capacity. It’s the ability to find new abilities, to change in response to changing conditions to maintain functioning. Resilient systems are not the same moment to moment, but they keep fulfilling their purpose (even as their purpose morphs).

four definitions of resilience, illustrated

Resilience Engineering is not a computer science discipline. It’s broader than that. Industries like nuclear power and air traffic control have deeper roots in the study of coping with failure. This isn’t your old-school Root Cause Analysis that asked “why did this fail?” This is systems thinking, asking “how does this succeed?” How do systems constantly subject to new failures keep running anyway? (hint: people.)

Avery Regier pointed out that root cause analysis can prevent a specific failure from recurring. But we find new failures all the time. Some new service is going to run out of space. Some new query is going to be slow. Some new customer is going to call a new API a whole lot more than we expected. Prevention is never going to cut it, so don’t spend all your resources there. Grow your powers of recovery, and you mitigate whole classes of failures.

Resilience Engineering recognizes that our systems include software and humans, so half the talks were about code and half about people. Matty Stratton extended trauma therapy to organizations, and Lee Kussmann gave strategies for personal resilience to stress (notes for both). On the code side, Cici Deng spoke about making safer changes at AWS Lambda: like most things in this science, improvement isn’t having the right answers — it’s asking better questions (notes). Aaron Blohowiak talked about speeding recovery and isolating failure domains at Netflix. Then Hannah Foxwell on HumanOps: there is no failover for You. People are more difficult to work with than software, so start there. (notes for both)

Mary Thengvall and J Paul Reed organized this conference to beget conversations, to seed a community in this space. Existing communities exist around the SNAFUcatchers and the Lund program. This new one is an open, informal camerata of people who care about resilience in humans+computer systems within the software industry.

Mary and Paul lead the conversation

They succeeded! The conference was a conversation: speakers referred back to prior talks. Mary and Paul emceed with commentary before and after every talk, weaving them together, sharing their reactions and enthusiasm. At the end of each day, the speakers turned into a panel for Q&A. The questions drew from and among all the talks.

Liz asked, how can we move an organization toward resilience from the bottom? Matt and Cici went back and forth over “use data” and “data won’t convince some people.” Any solution must be opt-in, and then you need to collect stories. Stories move people. When every system is different, stories are what we have. We can’t do controlled experiments. What we can do is: dig into those stories to find the causes of success. This is what researchers like Laura Maguire do.

In one of the last questions, someone asked, “Where is accountability in all this?” Cici said, we have tons of talk about accountability in our culture already. I agree; every movement is relative to the culture it is moving. Other answers suggested: Accountability is assumed, not assigned. Personal theroy: maybe accountability at the individual-human level is too narrow for the larger networks that we require to work with systems of complexity larger than a personbyte. MAYBE teams need to be accountable for working safely and effectively, and people need to be accountable to their teams.

Aaron had a lovely rant during Q&A about the “sufficiently smart engineer.” This is the hypothetical engineer who would not make such mistakes. Who would understand the existing system thoroughly. This person is a myth. Our software is too complex for one person to hold in their head. You can’t hire a sufficiently smart engineer, and don’t feel bad that you aren’t one, because it’s not a thing. Instead, we need to build systems that support our own cognitive work.

Resilience Engineering is a new science. Its research does not take place in a lab, but in the field. “We refuse to simplify.” Laura Maguire closed with a description of next steps in research. In our own jobs, we can do resilience engineering by looking for who and what makes us more safe (learn from success), by keeping the messy details instead of seeking a clean story, and by maximizing for learning in our symmathesy-teams (including software, tools, and people). For instance, when you find a “root cause” of a failure, look for other situations when that trigger occurred and failure didn’t.

RE researchers study DevOps in real situations

Other fun stuff:

We witnessed the first open-source releases from Deere and Co.

Heidi Waterhouse got rate-limited on twitter from quoting the talks.

Paul Carleton told a story of Stripe’s journey from “We should restart old EC2 instances” to “Oh look, we’re chaos engineers now.” Matt Broberg told a scary story about stopping forward motion, about ⟳technical debt and social debt⟲ at Sensu, and the perils of IRC. (notes for Matt, Paul, and Laura)

Atomist sponsored — I hope we can sponsor every edition of this conference! We work on tools to help developers integrate the social and technical parts of our systems, so it’s relevant. This was our first lanyard sponsorship and they were beautiful, in my very biased opinion.

Yesterday (as I write this) we recorded a >Code episode (#95) with Heidi Waterhouse, and she and I brought up topics from REdeploy about a dozen times. Me: “This conference is going to keep coming up, over and over, for the rest of my life.”

Thank you, Mary and Paul and Jeremy and everyone.

Systems and context at THAT Conference

It’s all that

THAT Conference is not THOSE conferences. It’s about the developer as more than a single unit: this year, in multiple ways.

I talked about our team as a system — more than a system, a symmathesy. Cory House said that if you want to change your life, change your systems. As humans, our greatest power lies in changing ourselves by changing our environment. It’s more effective than willpower.

Cory and his family on the grassy stage

Many developers brought family with them; THAT conference includes sessions for kids and partners. It takes place in the Kalahari resort in Wisconsin Dells. My kids spent most of their time in the water parks. Socialization at this conference was different: even though fewer than half of attendees brought family with them, it changes the atmosphere. There’s a reminder that we are more than individuals, and the world will go on after we are gone.

my friend Emma, daughter Linda, and me at the outdoor water park

Technical sessions broaden perspective. Joe Morgan put JavaScript coding styles in perspective: yes, they’re evolving and we have more options to craft readable code. But what is “readable” depends on the people and culture of your team. There are no absolutes, and it is not all about me.

Brandon Minnick told us the nitty-gritty of async/await in C#, and how to do things right. I learned that by default, all the code in an async function (except calls with await) runs on the same thread. This is not the case in Node, which messes with the thread-local variables we use for logging. But in C# it’s easy to lose exceptions entirely; the generated code swallows them. This makes me appreciate UnhandledPromiseException.

Ryan Niemeyer gave us 35 tips and tricks for VSCode. I love this IDE because it is useful right away, and sweetly customizable when you’re ready. Since this session, I’ve got FiraCode set up, added some custom snippets for common imports, enabled GitLens for subtle in-line attributions, and changed several lines in a file simultaneously using multicursor. And now I can “add suggested import” without clicking the little light bulb: it’s cmd-. to bring up the “code actions” menu for arrow keys. Configuring my IDE is a tiny example of setting up my system to direct me toward better work.

Then there was the part where my kids and I goaded each other into the scary water slides. They start vertical. They count down “3, 2, 1, Launch” the floor drops out from under you and you fall into the whooshy tube of water. I am proud to have lived through this.

From personalizing your IDE, to knowing your programming language, to agreeing with your team on a shared style, our environment has a big effect on us.

McLuhan’s Law says: We shape our tools, and our tools shape us. This is nowhere more effective than in programming, where are tools are programs and therefore malleable.

But our tools aren’t everything: we also shape our environment in whom we hang around, whom we listen to. Conferences are a great tool for broadening this. THAT Conference is an unusually wholesome general-programming conference, and I’m very happy to have spoken there. My daughters are also ready to go back (but not to do that scary water slide again).

Functional principles come together in GraphQL at React Rally

Sometimes multiple pieces of the industry converge on an idea from
different directions. This is a sign the idea is important.

Yesterday I spoke at React Rally (video coming later) about the
confluence of React and Flux in the front end with functional
programming in the back end. Both embody principles of composition, declarative
style, isolation, and unidirectional flow of data.

In particular, multiple separate solutions focused on:

  •   components declare what data they need
  •   these small queries compose into one large query, one GET request
  •   the backend gathers everything and responds with data in the requested format

This process is followed by Falcor from Netflix (talk by Brian Hunt) and GraphQL from Facebook (talks by Lee Byron and Nick Schrock, videos later). Falcor adds caching on the client, with cache
invalidation defined by the server (smart; since the server owns
the data, it should own the caching policy). GraphQL adds an IDE for
queries, called GraphiQL (sounds like “graphical”), released as
open source for the occasion! The GraphQL server provides introspection
into the types supported by its query language. GraphiQL uses this to let the developer
work with live, dynamically fetched queries. This lets us explore the available
data. It kicks butt.

Here’s an example of GraphQL in action. One React component in a GitHub client might specify that it needs
certain information about each event (syntax is approximate):

  event {
    actor {

and another component might ask for different information:

{  event {    actor {      image_uri    }  }}

The parent component assembles these and adds context, including
selection criteria:

{  repository(owner:”org”, name:”gameotron”) {    event(first: 30) {       type,       datetime,       actor {         name,         image_url      }    }  }}

Behind the scenes, the server might make one call to retrieve the repository,
another to retrieve the events, and another to retrieve each actor’s
data. Both GraphQL and Falcor see the query server as an abstraction
layer over existing code. GraphQL can stand in front of a REST
interface, for instance. Each piece of data can be
fetched with a separate call to a separate microservice, executed in
parallel and assembled into the structure the client wants. One GraphQL
server can support many version of many applications, since the
structure of returned data is controlled by the client.
The GraphQL server assembles all the
results into a response that parallels the structure of the client’s

{  “repository” : {    “events” : [{      “type” : “PushEvent”,      “datetime” : “2015-08-25Z23:24:15”,      “actor” : {        “name” : “jessitron”,        “image_url” : “https://some_cute_pic”      }    }    …]  }}

It’s like this:

The query is built as a composition of the queries from all the components. It goes to the server. The query server spreads out into as many other calls as needed to retrieve exactly the data requested.

The query is composed like a fan-in of all the components’
desires. On the server this fans out to as many back-end calls as
needed. The response is isomorphic to the query. The client then spreads
the response back out to the components. This architecture supports
composition in the client and modularity on the server.

The server takes responses from whatever other services it had to call, assembles that into the data structure specified in the query, and returns that to the client. The client disseminates the data through the component tree.

This happens to minimize network traffic between the client and server.
That’s nice, but what excites me are these lovely declarative queries that
composes, the data flowing from the parent component into all the
children, and the isolation of data requests to one place. The exchange
of data is clear. I also love the query server as an abstraction over
existing services; store the data bits in the way that’s most convenient
for each part. Assembly sold separately.

Seeing similar architecture in Falcor and GraphQL, as well as in
ClojureScript and Om[1] earlier in the year, demonstrates that this is
important in a general case. And it’s totally compatible with
microservices! After React Rally, I’m excited about where front ends are

[1] David Nolen spoke about this process in ClojureScript at Craft Conf
earlier this year. [LINK]

Post-agile: microservices and heads-up development

Notes from Craft Conference 2015, Budapest.

Craft conference was all about microservices this year.[1] Yet, it was about so much more at the same time — even when it was talking about microservices.

lobby of the venue. Very cool, and always packed

Dan and I went on about microservices in our opening keynote,[2] about how it’s not about size, it’s about each service being a responsible adult and taking care of its own data and dependencies. And being about one bounded context, so that it has fewer conflicting cross-cutting concerns (security, durability, resilience, availability, etc) to deal with at any one time.

But it was Mary Poppendieck, in her Friday morning keynote,[3] who showed us why microservices aren’t going away, not any more than the internet is going away. This is how systems grow: through federation and wide participation. (I wish “federated system” wasn’t taken by some 1990s architecture; I like it better than “microservices.”) Our job is no longer to control everything all the computers do, to make it perfectly predictable.[a]

Instead, we need to adapt to the sociotechnical system around us and our code. No one person in can understand all the consequences of their decision, according to Michael Nygard.[4] We can’t SMASH our will upon a complex system, Mary says, but we can poke-poke-poke it; see how it responds; and adjust it to our purposes.

What fun is this?? I went into programming because physics became unsatisfying once I hit quantum mechanics, and I couldn’t know everything all at once anymore. Now I’m fascinated by systems; to work with a system is to be part of something bigger than me, bigger than my own mental model. This is going to be a tough transition for many programmers, though. We spent our training time learning to control computers, and now we are exhorted to give up control, to experiment instead.

And worse: as developers must adapt, so must our businesses. In the closing keynote,[5] Marty Cagan made it very clear that our current model is broken. When most ideas come from executives, implemented according to the roadmap, it doesn’t matter how efficient our agile teams are: we’re wasting our time. Most ideas fail to make money. And the ones that do make money usually take far longer than expected. He ridicules the business case: “How much revenue will it generate? How much will it cost?” We don’t know, and we don’t know! Instead of measuring the impact of an idea after months of development, product teams need to measure in hours or days. And instead of a few ideas from upper management, we need to try out many ideas from the most fruitful source: developers. Because we’re most in the position to see what has just become possible.

Exterior of the venue! (after the tent is down.)

I’d say “developers are a great source of innovation,” except Alf Rehn reports that the word has been drained of meaning.[6] Marty Cagan corroborates that by using “ideas” throughout his keynote instead of “innovation.” So where do these ideas come from? Diversity, says Pieter Hintjens,[7] let people try lots of things. Discovery, says Mike Nygard, let them see what other teams are doing.

Ideas come from having our heads up, not buried only in the code. They come from the first objective of software architecture: understanding the business problem. They come from handing teams an objective, NOT a roadmap. Marty Cagan made that point very clear. Adrian Trenaman concurred,[8] describing how Gilt teams went from a single IT to a team per line of business to a team per initiative. It is about results, measured outcomes.

All these measurements, of results, of expectations, of production service activity, come down to my favorite question – “How do we know what we know?”[b]Property-based (aka generative) testing is experiencing a resurgence (maybe its first major surgence) lately, as black-box testing around service-level components. In my solo talk,[10] I proposed a possible design for lowering the risk around interacting components. Mary had some other ideas in her talk too, which I will check out. Considering properties of a service can help us find the seams that align simplicity with options.

Mike Nygard remarked that the most successful microservices implementations he’s seen started as a monolith, where refactoring identified those seams. There’s nothing wrong with a monolith when that supports the business objectives; Randy Shoup said that microservices solve scaling problems, not business problems.[9]Mike and Adrian both pointed out that a target architecture is not a revolution, but an evolving direction. Architecture is like a city: as we build microservices in the new, hip part of town, those legacy tenements are still useful. The architecture is done only when the company goes out of business. Instead of working to a central plan, we want to develop situational awareness (“knowing what’s happening in time to do something about it”[3]), and choose to work on what’s most important right now.

It isn’t enough to be good at coding anymore. The new “full-stack” is from network to customer. Marty: if your developers are only coding, you’re not getting half their value. I want to do heads-up development. “Software Craftsmanship is less about internal efficiency, and more about engaging with the world around us,” says Alf Rehn. “Creators need an immediate connection to what they are creating,” quotes Mary Poppendieck.

As fun as it is to pop the next story off the roadmap and sit down and code it, we can have more impact. We can look up, as developers, as organizations. We can look at results, not requirements. We can learn from consequences, as well as conferences.

This transition won’t be easy. It’s the next step after agile. Microservices are a symptom of this kind of focus, the way good retrospectives are associated with constant improvement. Sure, it’s all about microservices – in that microservices are about reducing friction and lowering risk. The faster we can learn, the farther we can get.

I’ll add the links as Gergely posts the videos.

[1] Maciej was starting to get bored
[2] my keynote with Dan, “Complexity is Outside the Code”
[3] Mary Poppendieck’s keynote, “The New New Software Development Game”
[a] Viktor Klang: “Writing software that is completely deterministic is nonsense because no machine is completely deterministic,” much less the network.
[4] Mike Nygard’s talk, “Architecture Without an End State”
[5] Marty’s keynote
[6] Alf Rehn (ah!  what a beautiful speaker! such rhythm!) keynote. Maybe he didn’t allow recording?
[7] Pieter’s talk
[8] Adrian’s talk, “Scaling Micro-services at Gilt”
[b] OK my real favorite question is “What is your favorite color?” but this is a deep second.
[9] Randy’s talk, “From the Monolith to Microservices”
[10] my talk, “Contracts in Clojure: a compromise between types and tests”

Philly ETE, Design in the small

Every conference has little themes for me. Here’s one from Philly ETE this week.

The opening keynote from Tom Igoe, founder of Arduino, displayed examples of unique and original projects. Many of them were built for one person – a device to tell her mother whether she’s taken this pill already today, or to help his father play guitar again after a stroke. Others were created for one place: this staircase becomes a piano. Arduino enables design in the small: design completely customized for a context and an audience. A product that meets these local needs, that works; no other requirements.

Diana Larsen does something similar for agile retrospectives. She pointed out that repeating the same retrospective format shows diminishing returns. Before a 90-minute retro, Diana spends 1-3 hours preparing: gathering data, choosing a theme and an activity. She has high-scale models of how people learn, yet each learning opportunity is custom-designed for the context and people. This has great returns: when a team gains the skill of thoughtful retrospectives, all team meetings are transformed.

Individualization appeared in smaller ways on the second day: Erik mentioned the compiler flags available in the Typelevel Scala compiler, allowing people to choose their language features. Aaron Bedra customized security features to the attack at hand: a good defense is intentional, as thoughtfully built as product features. Every one different.

Finally, Rebecca Wirfs-Brock debunked the agile aversion to architecture. Architecture tasks can be scaled and timeboxed based on the size and cruciality of the project. Architecture deliverables meet current needs, work for the team, and nothing else. It’s almost like designing each design process to suit the context.

This is the magic that agile development affords us: suiting each process, each technology, each step to the local context. That helps us do our best work. The tricky bit is keeping that context in mind at all levels. From my talk: the best developers are reasoning at all scales, from the code details of this function to the program to the software system to the business of why are we doing this at all. This helps us do the right work. Maintaining the connection from what we’re doing to why we’re doing it, we can make better decisions. And as Tom said in the closing sentence of the opening keynote, we can remember: “The things we make are less important than the relationships they enable.”

This was one of the themes I took from PhillyETE. Other takeaways were tweeted. There were many great talks, and InfoQ filmed them, so I’ll link them in this post later. This is conference is thoughtfully put together. Recommend.

GOTO Amsterdam: Respect the past, renew the present

GOTO Amsterdam started with a retrospective on Java, and ended with the admonition that even Waterfall was an advancement in its time. The conference encouraged building on the past, renewing and adding for growth.

As our bodies renew themselves, replacing cells so we’re never quite the same organism; so our software wants renewal and gradual improvement, small frequent changes, all the way out to deployment. Chad Fowler reports that the average uptime of a node at Wunderkind is in hours, and he’d like to make it shorter. Code modules are small, data is small, and servers are small. Components fail and the whole continues. Don’t optimize the time between failures: optimize the time to recovery. And monitor! Testing helps us develop with confidence, but monitoring lets us deploy with confidence.

At Etsy as well, changes are small and deployments frequent. Bits of features are deployed to production before they’re ever activated. Then they’re activated gradually, a separate process from deployment, and carefully monitored. Etsy has one giant monolithic PHP app, and yet they’ve achieved 50/day deployments with great uptime. Monitoring removes fear: “It’s not about how often you deploy your app. It’s do you feel comfortable deploying from trunk, right now?”

That doesn’t happen all at once. It builds up through careful tooling and through careful consideration of each production outage, in post-mortems where everyone listens and root causes are investigated to levels impossible in a blame-assigning culture.  Linda Rising said, “The real problem in our organizations is nobody wants to talk about how we might be doing things wrong.” At Etsy, they talk about failure.

Even as we’re deploying small changes and gradually improving our code, there’s another way our code can renew and grow: the platform underneath.  Georges Saab told us part of the magic of Java, the way the JIT compiler team works with the people creating the latest hardware. By the time that hardware is released, the JVM is optimized for the latest improvements. Even beyond the platform, Java developers moved the industry away from build-it-yourself toward finding an open-source solution, building on the coding and testing and design efforts of others. And if we update those libraries, they’re renewing as well. We are not doing this alone.

And now in Java 8, there are even more opportunities for library-level optimization, as Stream processing raises the level of abstraction, letting us declare our intentions with a lambda expression instead of specifying the steps. Tell the language what you want it to do, not how, and it can optimize. Dan North used this technique back when he invented DevOps (I’m mostly kidding): look at the outcome you want, and ask how to get there. The steps you’ve used before are clues, not the plan.

Yet be careful with higher levels of abstraction: Horia Dragomir reminded us this can also hurt performance. This happens when the same code
compiles for Android and iPhone. There’s a Japanese concept called bokeh (pronounced like bouquet) of blurring parts of an image to bring others into focus. Abstraction can do that for us, if we’re careful as the photographer.

In the closing keynote, Linda Rising reminded us, to our chagrin: people don’t make decisions based on data. We make decisions based on stories. What stories are we telling ourselves and each other? Do our processes really work? There aren’t empirical studies about our precise situation. The best we can do is to keep trying new tweaks and different methods, and find out what works for us. Like a baby putting everything in their mouth.

We can acquire more data, and choose to use this for better decisions. At Etsy every feature implementation comes with monitoring: How will you know it’s working? How will you know if it breaks? Each feature has a dashboard. And then in the post-mortems, a person can learn “how immensely hard it is to fight biases.” If we discard blame, we can reveal our mistakes, and build on each others’ experiences.

Overcome fear: experience the worst-case scenario. Keep changing our code, and ourselves: “As a person, if you can’t change, you might as well be dead.” It’s OK to be wrong, when you don’t keep being wrong.

As Horia said, “You’re there, you’re on the shoulders of giants. You need to do your own thing now. Add your own twist.”

This post based on talks by Linda Rising, Chad Fowler, Georges Saab and Paul Sandos, Horia Dragomir, Daniel Schauenberg; conversations with Silvana Wasitova and Kevlin Henney, all at GOTO Amsterdam 2014. Some of these may be online eventually

Limitations of Abstraction, and the Code+Coder symbiosis

Notes from #qconnewyork

I went into programming because I loved the predictability of it. Unlike physics, programs were deterministic at every scale. That’s not true anymore – and it doesn’t mean programming isn’t fun. This came out in some themes of QCon New York 2014.

In the evening keynote, Peter Wang told us we’ve been sitting pretty on a stable machine architecture for a long time, and that party is over. The days of running only on x86 architecture are done. We can keep setting up our VMs and pretending, or we can pay attention to the myriad devices cropping up faster than people can build strong abstractions on top of them. The Stable Dependencies Principle is crumbling under us.

Really we haven’t had a good, stable architecture to build on since applications moved to the web, as Gilad Bracha reminded us in the opening keynote. JavaScript has limitations, but even more, the different browsers keep programmers walking on eggshells trying not to break any of them. The responsibility of a developer is no longer just their programming language. They need to know where their code is running and all the relevant quirks of the platform. “It isn’t turtles all the way down anymore. We are the bottom turtle, or else the turtle under you eats your lunch.” @pwang

As a developer’s scope deepens, so also is it widening. Dianne Marsh’s keynote and Adrian Cockroft’s session about how services are implemented at Netflix emphasized developer responsibility through the whole lifecycle of the code. A developer’s job ends when the code is retired from production. Dianne’s mantra of “Know your service” puts the power to find a problem in the same hands that can fix it. Individual developers implement microservices, deploy them gradually to production, and monitor them. Developers understanding the business context of their work, and what it means to be successful.

It’d be wonderful to have all the tech and business knowledge in one head. What stops us is: technical indigestion. Toooo much information! The Netflix solution to this is: great tooling. When a developer needs to deploy, it’s their job to know what the possible problems are. It is the tool’s job to know how to talk to AWS, how to find out what the status is of running deployments, how to reroute between old-version and new-version deployments. The tool gives all the pertinent information to the person deploying, and the person makes the decisions. Enhanced cognition, just like Engelbert always wanted (from @pwang’s keynote).
“When you have automation plus people, that’s when you get something useful.” – Jez
“Free the People. Optimize the Tools.”- Dianne Marsh

Those gradual rollouts, they’re one of the new possibilities now that machines aren’t physical resources in data centers. We can deploy with less risk, because rollback becomes simply a routing adjustment. Lowering the impact of failure lets us take more risks, make more changes, and improve faster without impacting availability. To learn and innovate, do not prevent failure! Instead, detect it and stay on your

This changed deployment process is an example of something Adrian Cockroft emphasizes: question assumptions. What is free that used to be expensive? What can we do with that, that we couldn’t before? One of those is the immutable code, where every version of a service is available until someone makes the decision to take it down. And since you’re on pager duty for all your deployed code, there’s incentive to take it down.

When developers are responsible for the code past writing it, through testing and deploy and production, this completes a feedback loop. Code quality goes up, because the consequences of bugs fall directly on the person who can prevent them. This is a learning opportunity for the developer. It’s also a learning opportunity for the code! Code doesn’t learn and grow on its own, but widen the lines. Group the program in with the programmer into one learning organism, a code+coder symbiote. Then the code in production, as its effects are revealed by monitoring, can teach the programmer how to make it better in the next deployment.

Connection between code and people was the subject of Michael Feathers’ talk. Everyone knows Conway’s Law: architecture mirrors the org chart. Or as he phrases it, communication costs drive structure in software. Why not turn it to our advantage? He proposed structuring the organization around the needs of the code. Balance maintaining an in-depth knowledge base of each application against getting new eyes on it. Boundaries in the code will always follow the communication boundaries of social structure, so divide teams where the code needs to divide, by organization and by room. Eric Evans also suggested using Conway’s Law to maintain boundaries in the code. Both of these talks also emphasized the value of legacy code, and also the need for renewal: as the people turn over, so must the code. Otherwise that code+coder symbiosis breaks down.

Eric Evans emphasized: When you have a legacy app that’s a Big Ball of Mud, and you want to work on it, the key is to establish boundaries. Use social structure to do this, and create an Anti-Corruption Layer to intermediate between the two, and consider using a whole new programming language. This discourages accidental dependencies, and (as a bonus) helps attract good programmers.

Complexity is inevitable in software; bounded contexts are part of the constant battle to keep it from eating us. “We can’t eliminate complexity any more than a physicist can eliminate gravity.” (@pwang)

In code and with people, successful relationships are all about establishing boundaries. At QCon it was a given that people are writing applications as groups of services, and probably running them in the cloud. A service forms a bounded context; each service has its internal model, as each person has a mental model of the world. Communications between services also have their own models. Groups of services may have a shared interstitial context, as people in the same culture have established protocols. (analogy mine) No one model covers all of communications in the system. This was the larger theme of Eric Evans’ presentation: no one model, or mandate, or principle applies everywhere. The first question
of any architecture direction is “When does this apply?”

As programmers and teams are going off in their own bounded contexts doing their own deployments, Jez Humble emphasized the need to come together — or at least bring the code together — daily. You can have a separate repo for every service, like at Netflix, or one humongoid Perforce repository for everything, like with Google. You can work on feature branches or straight on master. The important part is: everyone commits to trunk at the end of the day. This forces breaking features into small features; they may hide behind feature flags or unused APIs, but they’re in trunk. And of course that feeds into the continuous deployment pipeline. Prioritize keeping that trunk deployable over doing new work. And when the app is always deployable, a funny thing happens: marketing and developers start to collaborate. There’s no feature freeze, no negotiating of what’s going to be in the next release. As developers take responsibility of the post-coding lifecycle, they gain insight into the pre-coding part too. More learning can happen!

As developers start to follow the code more closely, organizational structure can’t hold to a controlled hierarchy. Handoffs are the enemy of innovation, according to Adrian. The result of many independent services is an architecture diagram that can only be observed from production monitoring, and it looks like the Death Star:

I wonder how long before HR diagrams catch up and look like this too?

Dianne and Jez both used “Highly aligned, loosely coupled” to describe code and organization. Leadership provides direction, and the workers figure out how to reach the target by continually trying things out. Managers enable this experimentation. If the same problem is solved in multiple ways, that’s a win: bring the results together and learn from both. No one solution applies in all contexts.

Overall, QCon New York emphasized: question what you’re used to. Question what’s under you, and question what people say you can’t do. Face up to realities of distributed computing: consistency doesn’t exist, and failure is ever present. We want to keep rolling through failure, not prevent it. We can do this by building tools that support careful decision making. If we each support our code, our code will support our work, and we can all improve.

This post draws from talks by Peter Wang, Dianne Marsh, Adrian Cockroft, Eric Evans, Michael Feathers, Jez Humble, Ines Sombra, Richard Minerich, Charles Humble. It also draws from my head.
Most of the talks will be available on InfoQ eventually.