Friday, October 2, 2015

ElixirConf keynote: Elixir for the world

Video was recorded by Confreaks. I'll post it here when it's available.

Slides with notes

Big PDF with notes (15M)

Slides only (on speakerdeck)


Camille Fournier on distributed systems: video
Caitie McCaffrey on stateful services: video 
Denise Jacobs, creativity: video
Marty Cagan on what's better than agile: video
How to Measure Anything: book
BrenĂ© Brown on vulnerability, Grounded Theory: video book
Property testing: video
Property testing: QuickCheck CI 
Elm: Richard Feldman's talk: video
My talk about React and Elm: video
Structure of Scientific Revolutions: about the book

Tuesday, September 29, 2015

This was not OK (regrets)

I have one major regret from StrangeLoop. I want to apologize and find a kinder way to be.

At dinner the last night, I said something mean about Tony Morris. I've never met Tony Morris. I have a feeling of certainty that he was mean to people in the Scala community, and this contributed to a splintering of the community. I am not in favor of communities including or elevating people who are mean to anyone else. See Pieter Hintjens on this.

And I was wrong to speak scornfully of him. I can see this because I saw the hurt in Philip Wadler when I said it. Later he mentioned collaborating with Tony Morris on some important work.

People aren't all bad or all good. Some of us are horrible and fantastic. Mean in some situations and great contributors elsewhere. The good and the bad, they don't cancel each other. Both exist. We are not a sum; there is not a one-dimensional number line between "good" and "bad." 

If he caused a splintering in a community, then probably that community is better off without his direct participation. And if he collaborated on great work with someone else, or did great work alone, I am grateful for it. 

If I denigrate him or that work, by implication or indirectly, then I am causing splintering in the community. I ruined a chance to exchange ideas with Philip Wadler, who was extremely kind to me (he even didn't show his hurt feelings) and who invited me to that dinner. Who is bringing excellent ideas, in code and in talks, to the whole programming community. Who is wide open to new ideas and experiments.

I don't yet know the best way to talk about these community problems, which are very important to discuss. Now I know that it is not by denigrating or scorning anyone. I need, I feel it in my soul, to celebrate everyone in the times when they shine, to cherish the contributions they do make, even when they don't shine in every situation. We can celebrate this without perfect inclusivity (there is no such thing). I deserve to be excluded from that dinner; it would have made everyone else more comfortable, a net win for the community. And separately, I do help in other ways. Can't belong everywhere. 

I am sorry. I see that my words caused pain and it's my fault. In the future I will endeavor to not speak scornfully of anyone. To criticize actions and people-in-roles only when working on improving the system, not a whole person ever. To deal with my own experience of being bullied instead of lashing out whenever my brain makes an association with it. 

I want to be a source of healing and encouragement to a community, not further splinter it. Thank you for your patience. 

Tuesday, August 25, 2015

Functional principles come together in GraphQL at React Rally

Sometimes multiple pieces of the industry converge on an idea from
different directions. This is a sign the idea is important.

Yesterday I spoke at React Rally (video coming later) about the
confluence of React and Flux in the front end with functional
programming in the back end. Both embody principles of composition, declarative
style, isolation, and unidirectional flow of data.

In particular, multiple separate solutions focused on:

  •   components declare what data they need
  •   these small queries compose into one large query, one GET request
  •   the backend gathers everything and responds with data in the requested format

This process is followed by Falcor from Netflix (talk by Brian Hunt) and GraphQL from Facebook (talks by Lee Byron and Nick Schrock, videos later). Falcor adds caching on the client, with cache
invalidation defined by the server (smart; since the server owns
the data, it should own the caching policy). GraphQL adds an IDE for
queries, called GraphiQL (sounds like "graphical"), released as
open source for the occasion! The GraphQL server provides introspection
into the types supported by its query language. GraphiQL uses this to let the developer
work with live, dynamically fetched queries. This lets us explore the available
data. It kicks butt.

Here's an example of GraphQL in action. One React component in a GitHub client might specify that it needs
certain information about each event (syntax is approximate):
  event {
    actor {
and another component might ask for different information:
{  event {    actor {      image_uri    }  }}

The parent component assembles these and adds context, including
selection criteria:
{  repository(owner:"org", name:"gameotron") {    event(first: 30) {       type,       datetime,       actor {         name,         image_url      }    }  }}
Behind the scenes, the server might make one call to retrieve the repository,
another to retrieve the events, and another to retrieve each actor's
data. Both GraphQL and Falcor see the query server as an abstraction
layer over existing code. GraphQL can stand in front of a REST
interface, for instance. Each piece of data can be
fetched with a separate call to a separate microservice, executed in
parallel and assembled into the structure the client wants. One GraphQL
server can support many version of many applications, since the
structure of returned data is controlled by the client.
The GraphQL server assembles all the
results into a response that parallels the structure of the client's
{  "repository" : {    "events" : [{      "type" : "PushEvent",      "datetime" : "2015-08-25Z23:24:15",      "actor" : {        "name" : "jessitron",        "image_url" : "https://some_cute_pic"      }    }    ...]  }}
It's like this:
The query is built as a composition of the queries from all the components. It goes to the server. The query server spreads out into as many other calls as needed to retrieve exactly the data requested.
The query is composed like a fan-in of all the components'
desires. On the server this fans out to as many back-end calls as
needed. The response is isomorphic to the query. The client then spreads
the response back out to the components. This architecture supports
composition in the client and modularity on the server.
The server takes responses from whatever other services it had to call, assembles that into the data structure specified in the query, and returns that to the client. The client disseminates the data through the component tree.
This happens to minimize network traffic between the client and server.
That's nice, but what excites me are these lovely declarative queries that
composes, the data flowing from the parent component into all the
children, and the isolation of data requests to one place. The exchange
of data is clear. I also love the query server as an abstraction over
existing services; store the data bits in the way that's most convenient
for each part. Assembly sold separately.

Seeing similar architecture in Falcor and GraphQL, as well as in
ClojureScript and Om[1] earlier in the year, demonstrates that this is
important in a general case. And it's totally compatible with
microservices! After React Rally, I'm excited about where front ends are

[1] David Nolen spoke about this process in ClojureScript at Craft Conf
earlier this year. [LINK]

Sunday, August 16, 2015

An Opening Example of Elm: building HTML by parsing parameters

I never enjoyed front-end development, until I found Elm. JavaScript with its `undefined`, its untyped functions, its widely scoped mutable variables. It's like Play-Doh, it's so malleable. And when I try to make a sculpture, the arms fall off. It takes a lot of skill to make Play-Doh look good.

Then Richard talked me into trying Elm. Elm is more like Lego Technics. Fifteen years ago, I bought and built a Lego Technics space shuttle, and twelve years ago I gave up on getting that thing apart. It's still in my attic. Getting those pieces to fit together takes some work, but once you get there, they're solid. You'll never get "method not found on `undefined`" from your Elm code.

Elm is a front-end, typed functional language; it to JavaScript for use in the browser. It's a young language (as of 2015), full of opportunity and surprises. My biggest surprise so far: I do like front-end programming!

To guarantee that you never get `undefined` and never call a method that doesn't exist, all Elm functions are Data in, Data out. All data is immutable. All calls to the outside world are isolated. Want to hit the server? Want to call a JavaScript library? That happens through a port. Ports are declared in the program's main module, so they can never hide deep in the bowels of components. Logic is in one place (Elm), interactions in another.
one section (Elm) has business logic and is data-in, data-out. It has little ports to another section( JavaScript) that can read input, write files, draw UI. That section blurs into the whole world, including the user.

This post describes a static Elm program with one tiny port to the outside world. It illustrates the structure of a static page in Elm. Code is here, and you can see the page in action here. The program parses the parameters in the URL's query string and displays them in an HTML table.[1]

All web pages start with the HTML source:
  <title>URL Parameters in Elm</title>
  <script src="elm.js" type="text/javascript"></script>
  <link href="" rel="stylesheet"></link>
<script type="text/javascript">
  var app = Elm.fullscreen(Elm.UrlParams,
                           { windowLocationSearch:

This brings in my compiled Elm program and some CSS. Then it calls Elm's function to start the app, giving it the name of my module which contains main, and extra parameters, using JavaScript's access to the URL search string.

Elm looks for the main function in my module. The output of this function can be a few different types, and this program uses the simplest one: Html. This type is Elm's representation of HTML output, its virtual DOM.

module UrlParams where

import ParameterTable exposing (view, init)
import Html exposing (Html)

main : Html
main = view (init windowLocationSearch)

port windowLocationSearch : String
The extra parameters passed from JavaScript arrive in the windowLocationSearch port. This is the simplest kind of port: input received once at startup. Its type is simply String. This program uses one custom Elm component, ParameterTable. The main function uses the component's view function to render, and passes it a model constructed by the component's init method.

Somewhere inside the JavaScript call to Elm.fullscreen, Elm calls the main function in UrlParams, converts the Html output into real DOM elements, and renders that in the browser. Since this is a static application, this happens once. More interesting Elm apps have different return types from main, but that's another post.

From here, the data flow of this Elm program looks like this:
The three layers are: a main module, a component, and a library of functions.
The main module has one input port for the params.  That String is transformed by init into a Model, which is transformed by View into Html. The Html is returned by main and rendered in the browser. This is the smallest useful form of the Elm Architecture that I came up with.

Here's a piece of the ParameterTable module:
module ParameterTable(view, init) where

import Html exposing (Html)
import UrlParameterParser exposing (ParseResult(..), parseSearchString)

type alias Model = { tableData: ParseResult }

init: String -> Model
init windowLocationSearch =
  { tableData = parseSearchString windowLocationSearch }

--- VIEW
viewModel -> Html
view model =
  Html.div ...
The rest of the code has supporting functions and details of the view. These pieces (Model, init, and view) occur over and over in Elm. Often the Model of one component is composed from the Models of subcomponents, and the same with init and view functions.[2]

All the Elm files are transformed by elm-make into elm.js. Then index.html imports elm.js and calls its Elm.fullscreen function, passing UrlParams as the main module and in the extra parameter. And so, a static (but not always the same) web page is created from data-in, data-out Elm functions. And I am a happy programmer.

[1] Apparently there's not a built-in thing in JavaScript for parsing these. Which is shocking. I refused to write such a thing in JavaScript (where by "write" I mean "copy from StackOverflow"), so I wrote it in Elm.

[2] Ditto with update and Action, but that's out of scope. This post is about a static page.

Monday, August 3, 2015

Data-in, Data-out

In functional programming, we try to keep our functions data-in, data-out: they take some data as parameters, return some data as output, and that's it. Nothing else. No dialog boxes pop, no environment variables are read, no database rows are written, no files are accessed. No global state is read or written. The output of the function is entirely determined by the values of its input. The function is isolated from the world around it.

A data-in, data-out function is highly testable, without complicated mocking. The test provides input, looks at the output, and that's all that it needs for a complete test.[1]

A data-in, data-out function is pretty well documented by its declaration; its input types specify everything necessary for the function to work, its output type specifies the entire result of calling it. Give the function a good name that describes its purpose, and you're probably good for docs.

It's faster to comprehend a data-in, data-out function because you know a lot of things it won't do. It won't go rooting around in a database. It won't interrupt the user's flow. It won't need any other program to be running on your computer. It won't write to a file[2]. All these are things I don't have to think about when calling a data-in, data-out function. That leaves more of my brain for what I care about.

If all of our code was data-in, data-out, then our programs would be useless. They wouldn't do anything observable. However, if 85% of our code is data-in, data-out, with some input-gathering and some output-writing and a bit of UI-updating -- then our program can be super useful, and most of it still maximally comprehensible. Restricting our code in this way when we're writing it provides more clarity when we're reading it and freedom when we're refactoring it.
Think about data-in, data-out while you're coding; make any dependencies on the environment and effects on the outside world explicit; and write most of your functions as transformations of data. This gets you many of the benefits of functional programming, no matter what language you write your code in.

[1] Because the output is fixed for a given input, it would be legit to substitute the return value for the function-call-with-that-input at any point. Like, one could cache the return values if that helped with performance, because it's impossible for them to be different next time, and it's impossible to notice that the function wasn't called because calling it has no externally-observable effect. Historically, this property is called referential transparency.

[2] We often make an exception for logging, especially logging that gets turned off in production.

Saturday, June 6, 2015

Ultratestable Coding Style

Darn side-effecting programs. Programs that change things in the outside world are so darn useful, and such a pain to test.
what's better than green? Ultra!For every piece of code, there is another piece of code that answers the question, "How do I know that code works?" Sometimes that's more work than the code itself -- but there is hope.

The other day, I made a program to copy some code from one project to another - two file copies, with one small change to the namespace declaration at the top of each file. Sounds trivial, right?

I know better: there are going to be a lot of subtleties. And this isn't throwaway code. I need good, repeatable tests.

Where do I start? Hmm, I'll need a destination directory with the expected structure, an empty source directory, files with the namespace at the top... oh, and cleanup code. All of these are harder than I expected, and the one test I did manage to write is specific to my filesystem. Writing code to verify code is so much harder than just writing the code!

Testing side-effecting code is hard. This is well established. It's also convoluted, complex, generally brittle.
The test process looks like this:
grumpy cat says "0 out of 10"input to code under test to output, but also prep the files in the right place and clear old files out, then the code under test does read & write on the filesystem, then check that the files are correct

Before the test, create the input AND go to the filesystem, prepare the input and the spot where output is expected.
After the test, check the output AND go to the filesystem, read the files from there and check their contents.
Everything is intertwined: the prep, the implementation of the code under test, and the checks at the end. It's specific to my filesystem. And it's slow. No way can I run more than a few of these each build.

The usual solution to this is to mock the filesystem. Use a ports-and-adapters approach. In OO you might use dependency injection; in FP you'd pass functions in for "how to read" and "how to write." This isolates our code from the real filesystem. Test are faster and less tightly coupled to the environment. The test process looks like this:
input and "how to read" and "how to write" go into the test, plus prepare results in "how to read"; code under test hits "how to read" and "how to write"; check the number and input of calls to "how to write" at the end.

Before the test, create the input AND prepare the mock read results and initialize the mock for write captures.
After the test, check the output AND interrogate the mock for write captures.

It's an improvement, but we can do better. The test is still convoluted. Elaborate mocking frameworks might make it cleaner, but conceptually, all those ties are still there, with the stateful how-to-write that we pass in and then ask later, "What were your experiences during this test?"

If I move the side effects out of the code under test -- gather all input beforehand, perform all writes afterward -- then the decisionmaking part of my program becomes easier and more clear to test. It can look like this (code):
grumpy cat smiles and says "YES"input and contents you might read go in; code under test; "please write this to here" and "please write that to there" come out with output

The input includes everything my decisions need to know from the filesystem: the destination directory and list of all files in it; the source directory and list plus contents of all files in it.
The output includes a list of instructions, for the side effects the code would like to perform. This is super easy to check at the end of a test.

The real main method looks different in this design. It has to gather all the input up front[1], then call the key program logic, then carry out the instructions. In order to keep all the decisionmaking, parsing, etc in the "code under test" block, I keep the interface to that function as close as possible to that of the built-in filesystem-interaction commands. It isn't the cleanest interface, but I want all the parts outside "code-under-test" to be trivial.
simplest possible code to gather input, to well-tested code that makes all the decisions, to simplest-possible code to carry out instructions.

With this, I answer "How do I know this code works?" in two components. For the real-filesystem interactions, the documentation plus some playing around in the REPL tell me how they work. For the decisioning part of the program, my tests tell me it works. Manual tests for the hard-to-test bits, lots of tests for the hard-to-get-right bits. Reasoning glues them together.

Of course, I'm keeping my one umbrella test that interacts with the real filesystem. The decisioning part of the program is covered by poncho tests. With an interface like this, I can write property-based tests for my program, asserting things like "I never try to write a file in a directory that doesn't exist" and "the output filename always matches the input filename."[2]

As a major bonus, error handling becomes more modular. If, on trying to copy the second file, it isn't found or isn't valid, the second write instruction is replaced with an "error" instruction. Before any instructions are carried out, the program checks for "error" anywhere in the list (code). If found, stop before carrying out any real action. This way, validations aren't separated in code from the operations they apply to, and yet all validations happen before operations are carried out. Real stuff happens only when all instructions are possible (as far as the program can tell). It's close to atomic.

There are limitations to this straightforward approach to isolating decisions from side-effects. It works for this program because it can gather all the input, produce all the output, and hold all of it in memory at the same time. For a more general approach to this same goal, see Functional Programming in Scala.

Moving all the "what does the world around me look like?" side effects to the beginning of the program, and all the "change the world around me!" side effects to the end of the program, we achieve maximum testability of program logic. And minimum convolution. And separation of concerns: one module makes the decisions, another one carries them out. Consider this possibility the next time you find yourself in testing pain.

The code that inspired this approach is in my microlib repository.
Interesting bits:
Umbrella test (integration)
Poncho tests (around the decisioning module) (I only wrote a few. It's still a play project right now.)
Code under test (decisioning module)
Main program
Instruction carrying-out part

Diagrams made with Monodraw. Wanted to paste them in as ASCII instead of screenshots, but that'd be crap on mobile.

[1] This is Clojure, so I put the "contents of each file" in a delay. Files whose contents are not needed are never opened.
[2] I haven't written property tests, because time.

Monday, May 25, 2015

git: handy alias to find the repository root

To quickly move to the root of the current git repository, I set up this alias:

git config --global alias.home 'rev-parse --show-toplevel'

Now,  git home prints the full path to the root directory of the current project.
To go there, type (Mac/Linux only)

cd `git home`

Notice the backticks. They're not single quotes. This executes the command and then uses its output as the argument to cd.

This trick is particularly useful in Scala, where I have to get to the project root to run sbt compile. (Things that make me miss Clojure!)