Tuesday, August 25, 2015

Functional principles come together in GraphQL at React Rally

Sometimes multiple pieces of the industry converge on an idea from
different directions. This is a sign the idea is important.

Yesterday I spoke at React Rally (video coming later) about the
confluence of React and Flux in the front end with functional
programming in the back end. Both embody principles of composition, declarative
style, isolation, and unidirectional flow of data.

In particular, multiple separate solutions focused on:

  •   components declare what data they need
  •   these small queries compose into one large query, one GET request
  •   the backend gathers everything and responds with data in the requested format

This process is followed by Falcor from Netflix (talk by Brian Hunt) and GraphQL from Facebook (talks by Lee Byron and Nick Schrock, videos later). Falcor adds caching on the client, with cache
invalidation defined by the server (smart; since the server owns
the data, it should own the caching policy). GraphQL adds an IDE for
queries, called GraphiQL (sounds like "graphical"), released as
open source for the occasion! The GraphQL server provides introspection
into the types supported by its query language. GraphiQL uses this to let the developer
work with live, dynamically fetched queries. This lets us explore the available
data. It kicks butt.

Here's an example of GraphQL in action. One React component in a GitHub client might specify that it needs
certain information about each event (syntax is approximate):
{
  event {
    type,
    datetime,
    actor {
      name
    }
  }
}
and another component might ask for different information:
{  event {    actor {      image_uri    }  }}

The parent component assembles these and adds context, including
selection criteria:
{  repository(owner:"org", name:"gameotron") {    event(first: 30) {       type,       datetime,       actor {         name,         image_url      }    }  }}
Behind the scenes, the server might make one call to retrieve the repository,
another to retrieve the events, and another to retrieve each actor's
data. Both GraphQL and Falcor see the query server as an abstraction
layer over existing code. GraphQL can stand in front of a REST
interface, for instance. Each piece of data can be
fetched with a separate call to a separate microservice, executed in
parallel and assembled into the structure the client wants. One GraphQL
server can support many version of many applications, since the
structure of returned data is controlled by the client.
The GraphQL server assembles all the
results into a response that parallels the structure of the client's
query:
{  "repository" : {    "events" : [{      "type" : "PushEvent",      "datetime" : "2015-08-25Z23:24:15",      "actor" : {        "name" : "jessitron",        "image_url" : "https://some_cute_pic"      }    }    ...]  }}
It's like this:
The query is built as a composition of the queries from all the components. It goes to the server. The query server spreads out into as many other calls as needed to retrieve exactly the data requested.
The query is composed like a fan-in of all the components'
desires. On the server this fans out to as many back-end calls as
needed. The response is isomorphic to the query. The client then spreads
the response back out to the components. This architecture supports
composition in the client and modularity on the server.
The server takes responses from whatever other services it had to call, assembles that into the data structure specified in the query, and returns that to the client. The client disseminates the data through the component tree.
This happens to minimize network traffic between the client and server.
That's nice, but what excites me are these lovely declarative queries that
composes, the data flowing from the parent component into all the
children, and the isolation of data requests to one place. The exchange
of data is clear. I also love the query server as an abstraction over
existing services; store the data bits in the way that's most convenient
for each part. Assembly sold separately.

Seeing similar architecture in Falcor and GraphQL, as well as in
ClojureScript and Om[1] earlier in the year, demonstrates that this is
important in a general case. And it's totally compatible with
microservices! After React Rally, I'm excited about where front ends are
headed.


[1] David Nolen spoke about this process in ClojureScript at Craft Conf
earlier this year. [LINK]

Sunday, August 16, 2015

An Opening Example of Elm: building HTML by parsing parameters

I never enjoyed front-end development, until I found Elm. JavaScript with its `undefined`, its untyped functions, its widely scoped mutable variables. It's like Play-Doh, it's so malleable. And when I try to make a sculpture, the arms fall off. It takes a lot of skill to make Play-Doh look good.

Then Richard talked me into trying Elm. Elm is more like Lego Technics. Fifteen years ago, I bought and built a Lego Technics space shuttle, and twelve years ago I gave up on getting that thing apart. It's still in my attic. Getting those pieces to fit together takes some work, but once you get there, they're solid. You'll never get "method not found on `undefined`" from your Elm code.


Elm is a front-end, typed functional language; it to JavaScript for use in the browser. It's a young language (as of 2015), full of opportunity and surprises. My biggest surprise so far: I do like front-end programming!

To guarantee that you never get `undefined` and never call a method that doesn't exist, all Elm functions are Data in, Data out. All data is immutable. All calls to the outside world are isolated. Want to hit the server? Want to call a JavaScript library? That happens through a port. Ports are declared in the program's main module, so they can never hide deep in the bowels of components. Logic is in one place (Elm), interactions in another.
one section (Elm) has business logic and is data-in, data-out. It has little ports to another section( JavaScript) that can read input, write files, draw UI. That section blurs into the whole world, including the user.


This post describes a static Elm program with one tiny port to the outside world. It illustrates the structure of a static page in Elm. Code is here, and you can see the page in action here. The program parses the parameters in the URL's query string and displays them in an HTML table.[1]

All web pages start with the HTML source:
<html><head>
  <title>URL Parameters in Elm</title>
  <script src="elm.js" type="text/javascript"></script>
  <link href="http://yui.yahooapis.com/pure/0.6.0/pure-min.css" rel="stylesheet"></link>
</head>
<body></body>
<script type="text/javascript">
  var app = Elm.fullscreen(Elm.UrlParams,
                           { windowLocationSearch:
                               window.location.search
                           });
</script></html>

This brings in my compiled Elm program and some CSS. Then it calls Elm's function to start the app, giving it the name of my module which contains main, and extra parameters, using JavaScript's access to the URL search string.

Elm looks for the main function in my module. The output of this function can be a few different types, and this program uses the simplest one: Html. This type is Elm's representation of HTML output, its virtual DOM.

module UrlParams where

import ParameterTable exposing (view, init)
import Html exposing (Html)

main : Html
main = view (init windowLocationSearch)

port windowLocationSearch : String
The extra parameters passed from JavaScript arrive in the windowLocationSearch port. This is the simplest kind of port: input received once at startup. Its type is simply String. This program uses one custom Elm component, ParameterTable. The main function uses the component's view function to render, and passes it a model constructed by the component's init method.

Somewhere inside the JavaScript call to Elm.fullscreen, Elm calls the main function in UrlParams, converts the Html output into real DOM elements, and renders that in the browser. Since this is a static application, this happens once. More interesting Elm apps have different return types from main, but that's another post.

From here, the data flow of this Elm program looks like this:
The three layers are: a main module, a component, and a library of functions.
The main module has one input port for the params.  That String is transformed by init into a Model, which is transformed by View into Html. The Html is returned by main and rendered in the browser. This is the smallest useful form of the Elm Architecture that I came up with.

Here's a piece of the ParameterTable module:
module ParameterTable(view, init) where

import Html exposing (Html)
import UrlParameterParser exposing (ParseResult(..), parseSearchString)

--- MODEL
type alias Model = { tableData: ParseResult }

init: String -> Model
init windowLocationSearch =
  { tableData = parseSearchString windowLocationSearch }

--- VIEW
viewModel -> Html
view model =
  Html.div ...
The rest of the code has supporting functions and details of the view. These pieces (Model, init, and view) occur over and over in Elm. Often the Model of one component is composed from the Models of subcomponents, and the same with init and view functions.[2]

All the Elm files are transformed by elm-make into elm.js. Then index.html imports elm.js and calls its Elm.fullscreen function, passing UrlParams as the main module and window.location.search in the extra parameter. And so, a static (but not always the same) web page is created from data-in, data-out Elm functions. And I am a happy programmer.



[1] Apparently there's not a built-in thing in JavaScript for parsing these. Which is shocking. I refused to write such a thing in JavaScript (where by "write" I mean "copy from StackOverflow"), so I wrote it in Elm.

[2] Ditto with update and Action, but that's out of scope. This post is about a static page.





Monday, August 3, 2015

Data-in, Data-out

In functional programming, we try to keep our functions data-in, data-out: they take some data as parameters, return some data as output, and that's it. Nothing else. No dialog boxes pop, no environment variables are read, no database rows are written, no files are accessed. No global state is read or written. The output of the function is entirely determined by the values of its input. The function is isolated from the world around it.

A data-in, data-out function is highly testable, without complicated mocking. The test provides input, looks at the output, and that's all that it needs for a complete test.[1]

A data-in, data-out function is pretty well documented by its declaration; its input types specify everything necessary for the function to work, its output type specifies the entire result of calling it. Give the function a good name that describes its purpose, and you're probably good for docs.

It's faster to comprehend a data-in, data-out function because you know a lot of things it won't do. It won't go rooting around in a database. It won't interrupt the user's flow. It won't need any other program to be running on your computer. It won't write to a file[2]. All these are things I don't have to think about when calling a data-in, data-out function. That leaves more of my brain for what I care about.

If all of our code was data-in, data-out, then our programs would be useless. They wouldn't do anything observable. However, if 85% of our code is data-in, data-out, with some input-gathering and some output-writing and a bit of UI-updating -- then our program can be super useful, and most of it still maximally comprehensible. Restricting our code in this way when we're writing it provides more clarity when we're reading it and freedom when we're refactoring it.
Think about data-in, data-out while you're coding; make any dependencies on the environment and effects on the outside world explicit; and write most of your functions as transformations of data. This gets you many of the benefits of functional programming, no matter what language you write your code in.


[1] Because the output is fixed for a given input, it would be legit to substitute the return value for the function-call-with-that-input at any point. Like, one could cache the return values if that helped with performance, because it's impossible for them to be different next time, and it's impossible to notice that the function wasn't called because calling it has no externally-observable effect. Historically, this property is called referential transparency.

[2] We often make an exception for logging, especially logging that gets turned off in production.

Saturday, June 6, 2015

Ultratestable Coding Style

Darn side-effecting programs. Programs that change things in the outside world are so darn useful, and such a pain to test.
what's better than green? Ultra!For every piece of code, there is another piece of code that answers the question, "How do I know that code works?" Sometimes that's more work than the code itself -- but there is hope.

The other day, I made a program to copy some code from one project to another - two file copies, with one small change to the namespace declaration at the top of each file. Sounds trivial, right?

I know better: there are going to be a lot of subtleties. And this isn't throwaway code. I need good, repeatable tests.

Where do I start? Hmm, I'll need a destination directory with the expected structure, an empty source directory, files with the namespace at the top... oh, and cleanup code. All of these are harder than I expected, and the one test I did manage to write is specific to my filesystem. Writing code to verify code is so much harder than just writing the code!

Testing side-effecting code is hard. This is well established. It's also convoluted, complex, generally brittle.
The test process looks like this:
grumpy cat says "0 out of 10"input to code under test to output, but also prep the files in the right place and clear old files out, then the code under test does read & write on the filesystem, then check that the files are correct


Before the test, create the input AND go to the filesystem, prepare the input and the spot where output is expected.
After the test, check the output AND go to the filesystem, read the files from there and check their contents.
Everything is intertwined: the prep, the implementation of the code under test, and the checks at the end. It's specific to my filesystem. And it's slow. No way can I run more than a few of these each build.


The usual solution to this is to mock the filesystem. Use a ports-and-adapters approach. In OO you might use dependency injection; in FP you'd pass functions in for "how to read" and "how to write." This isolates our code from the real filesystem. Test are faster and less tightly coupled to the environment. The test process looks like this:
input and "how to read" and "how to write" go into the test, plus prepare results in "how to read"; code under test hits "how to read" and "how to write"; check the number and input of calls to "how to write" at the end.

Before the test, create the input AND prepare the mock read results and initialize the mock for write captures.
After the test, check the output AND interrogate the mock for write captures.

It's an improvement, but we can do better. The test is still convoluted. Elaborate mocking frameworks might make it cleaner, but conceptually, all those ties are still there, with the stateful how-to-write that we pass in and then ask later, "What were your experiences during this test?"

If I move the side effects out of the code under test -- gather all input beforehand, perform all writes afterward -- then the decisionmaking part of my program becomes easier and more clear to test. It can look like this (code):
grumpy cat smiles and says "YES"input and contents you might read go in; code under test; "please write this to here" and "please write that to there" come out with output

The input includes everything my decisions need to know from the filesystem: the destination directory and list of all files in it; the source directory and list plus contents of all files in it.
The output includes a list of instructions, for the side effects the code would like to perform. This is super easy to check at the end of a test.

The real main method looks different in this design. It has to gather all the input up front[1], then call the key program logic, then carry out the instructions. In order to keep all the decisionmaking, parsing, etc in the "code under test" block, I keep the interface to that function as close as possible to that of the built-in filesystem-interaction commands. It isn't the cleanest interface, but I want all the parts outside "code-under-test" to be trivial.
simplest possible code to gather input, to well-tested code that makes all the decisions, to simplest-possible code to carry out instructions.

With this, I answer "How do I know this code works?" in two components. For the real-filesystem interactions, the documentation plus some playing around in the REPL tell me how they work. For the decisioning part of the program, my tests tell me it works. Manual tests for the hard-to-test bits, lots of tests for the hard-to-get-right bits. Reasoning glues them together.

Of course, I'm keeping my one umbrella test that interacts with the real filesystem. The decisioning part of the program is covered by poncho tests. With an interface like this, I can write property-based tests for my program, asserting things like "I never try to write a file in a directory that doesn't exist" and "the output filename always matches the input filename."[2]

As a major bonus, error handling becomes more modular. If, on trying to copy the second file, it isn't found or isn't valid, the second write instruction is replaced with an "error" instruction. Before any instructions are carried out, the program checks for "error" anywhere in the list (code). If found, stop before carrying out any real action. This way, validations aren't separated in code from the operations they apply to, and yet all validations happen before operations are carried out. Real stuff happens only when all instructions are possible (as far as the program can tell). It's close to atomic.

There are limitations to this straightforward approach to isolating decisions from side-effects. It works for this program because it can gather all the input, produce all the output, and hold all of it in memory at the same time. For a more general approach to this same goal, see Functional Programming in Scala.

Moving all the "what does the world around me look like?" side effects to the beginning of the program, and all the "change the world around me!" side effects to the end of the program, we achieve maximum testability of program logic. And minimum convolution. And separation of concerns: one module makes the decisions, another one carries them out. Consider this possibility the next time you find yourself in testing pain.


The code that inspired this approach is in my microlib repository.
Interesting bits:
Umbrella test (integration)
Poncho tests (around the decisioning module) (I only wrote a few. It's still a play project right now.)
Code under test (decisioning module)
Main program
Instruction carrying-out part

Diagrams made with Monodraw. Wanted to paste them in as ASCII instead of screenshots, but that'd be crap on mobile.

[1] This is Clojure, so I put the "contents of each file" in a delay. Files whose contents are not needed are never opened.
[2] I haven't written property tests, because time.



Monday, May 25, 2015

git: handy alias to find the repository root

To quickly move to the root of the current git repository, I set up this alias:

git config --global alias.home 'rev-parse --show-toplevel'

Now,  git home prints the full path to the root directory of the current project.
To go there, type (Mac/Linux only)

cd `git home`

Notice the backticks. They're not single quotes. This executes the command and then uses its output as the argument to cd.

This trick is particularly useful in Scala, where I have to get to the project root to run sbt compile. (Things that make me miss Clojure!)

Saturday, May 2, 2015

Fitting in v. Belonging

In your team, do you feel like you fit in? Do you have a feeling of belonging?

These are very different questions.[2] When I fit in, it's because everyone is sufficiently alike. We have inside jokes, TV shows or sports we talk about, opinions we share and common targets of ridicule. New people can fit in by adopting these opinions and following these sports.

When I belong, it's because everyone wants me to be there, because the group wouldn't be the same without me. We value each other for our differences. We have values that we share, and opinions we discuss. New people are integrated as we come to know and appreciate each other.

"Fitting in," that superficial team culture of common interests and inside jokes, is much easier to establish. And it's easier to hire for, because we can base choices on appearances and social cues. Hoodies and quotes from The Princess Bride. But it doesn't get us a strong team. On a strong team, people share ideas, they pull from their varied perspectives, they emphasize their differences because that's their unique contribution. This weaving together of various strengths, respect for the unexpected -- this is how a team can be stronger than its parts, this is where novel solutions come from. It emerges from feelings of belonging, which come from the group's deeper culture. That's much harder to establish.

On a wholehearted team, we show up as our whole selves and put all our creativity into the team's goals. How can we achieve this? Hire for value fit, not culture fit. Don't settle for the comfort of "fitting in" - aim for the safety of belonging.

I had this kind of team at my last job, at Outpace. We loved each other as people and respected each other as developers. And this was a remote team - we didn't fall back on physical proximity as appearance of teamwork. We shared goals, discussed them and evolved them. We shared our frustrations, both work and personal. When opinions clashed, we asked why, and learned. On this team, I explored ideas like the sea map. We grew individually and together.

That feeling of belonging makes it safe to express ideas and to run with them. And to take ideas from others and expand on them. Poof: innovation. Without that feeling of belonging, when the aim is to fit in, we express agreement with dominant voices.[1] Superficial cultural fit actively represses new ideas.

How can we move our teams toward a greater sense of belonging? Ask people about their interests that you don't share. Respect each person's experiences and opinions, especially when these are unique among the group. Instead of "We all agree, so we must be right," say, "We all agree. This is dangerous; can we find another view?" When pair programming, if you think your pair has the wrong idea, try it anyway. When someone says something dumb, their perspective differs; respond with curiosity, not judgement. Cherish our differences, not superficial similarities. Sacrifice the comfort of fitting in for the safety to be ourselves.


[1] Research has shown that teams of similar-looking people emphasize similarities. They're driven toward groupthink, quiet silencing of dissent. When someone breaks the uniformity, the not-obviously-different people starts expressing the parts of them that are unique. (I can't find the reference, anyone know it?)

[2] The dichotomy between fitting-in and belonging comes from BrenĂ© Brown's book, Daring Greatly.


Wednesday, April 29, 2015

Data v Awareness

In the computer industry, data and conscious thinking are praised, as opposed to an integrated awareness.[1] How is the work going? the task-tracking tools, the commits, and the build results provide data, but only conversations with the team can provide awareness. Awareness of mood and relationships and trends, of uncertainties and risks. Perhaps this is part of organizations' fear of remote work: colocation provides opportunities to read the mood of the team. Data alone can't provide that.

In sociological research like Brené Brown's, she starts with awareness: interviews, a person's story in context. Then she codes (in this context, "to code" is to categorize and label) the answers, and they become data. She aggregates that data to get a broader picture, and that leads to a broader awareness.

The key is: local awareness, to data, to aggregated data, to broader awareness.

On my last team, we were working on this. I wanted to track what was holding us back, and what was helping us move. Which tools in our technology stack cost us the most energy, and which improvements are paying off. To do this, we started posting in Slack whenever something frustrated us or helped us along, with custom emoticons as labels. For instance:
weight: clojure set operations behave unpredictably if passed a vector; lift: test-data generation utility for X service; weight: local elasticsearch version different from prod
This turns our awareness of the current situation into data, which a program can aggregate later. At retro time, I turned the words next to the hot-air balloon ("lift," because it helps us move the project up and forward) into a word cloud.[2] The words next to the kettlebell ("weight," because it's weighing down the balloon, holding us back) formed a separate word cloud. This gave us a visualization to trigger discussion.

The aggregation of the data produced a broader level of awareness in our retrospective. This contrasts with our remembered experience of the prior sprint. Our brains are lousy at aggregating these experiences; we remember the peak and the end. The most emotional moment, and the most recent feelings. The awareness -> data -> aggregation -> awareness translation gives us a less biased overview.

The honest recording of local awareness happens when the data is interpreted within the team, within the circle of trust, within context. There's no incentive to game the system, except where that is appropriate and deliberate. For instance, the week after the first word cloud, Tanya posted in the channel:
weight: elasticsearch elasticsearch elasticsearch elasticsearch elasticsearch
She's very deliberately inflating a word in the word cloud, corresponding to the level of pain she's experiencing. (context: we were using Elasticsearch poorly, totally nothing wrong with the tech, it was us.) Her knowledge of how the data would be used allowed her to translate her local awareness into a useful representation.

Data alone is in conflict with a broad, compassionate awareness of the human+technological interactions in the team. But if the data starts with awareness, and is aggregated and interpreted with context, it can help us overcome other limitations and biases of our human brains. In this way, we can use both data and awareness, and perhaps gain wisdom.

----
[1] "Computing: Yet Another Reality Construction," by Rodney Burstall, inside Software Development and Reality Construction
[2] Thank you @hibikir1 for suggesting the first reasonable use of a word cloud in my experience