All I want is a web page. I want this one thing on the left and this other thing on the right — why is this so hard?? Can I just make a table in HTML like I used to do in the nineties? Why do I have to worry about stylesheets? and, why are they so hard?
As a backend developer, I’m used to giving the computer instructions. Like “put this on the left and this on the right.” But that is not how web development works. For good reason!
As the author of a web page, I do not have enough information to decide how that page should be laid out. I don’t know who is using it, on what device, in what program, on what screen, in what window, with what font sizes.
You know who does know that stuff? The user agent. That’s a technical term for an application that presents documents to people. The browser is a user agent. The user agent could also create printed documents, or it could speak the document to a person whose eyes are unavailable.
The user agent runs on a particular device. Computer, phone, TV, whatever. It knows the limitations of the hardware. It can be configured by the user. The user agent can conform to various CSS specifications.
CSS is not a programming language. It is a syntax for rules, rules which give the browser (that user agent) clues about how to display the document. The browser combines that information with what it knows about the world to come up with a format to display (or speak) the document.
It turns out that rule-based programming is hard. It sounds like it should be easier than imperative code, but it is not.
So no, you don’t get to decide that this thing goes on the left and that thing goes on the right. The browser gets that choice.
But here’s something I learned yesterday: put each thing in a div, and give those divs display: inline-block. then the browser has the option of putting them next to each other, if that fits with those constraints that only it knows.
Lately I’m working on our documentation. We write it in markdown, turn it into a web site, and then serve it from s3. To turn it into a web site, we use mkdocs and the material theme for mkdocs. Mkdocs is written in Python. Then we test it with HtmlProofer, which is in Ruby. Okay.
A week ago, I set out to add an “Edit on GitHub” link to each of our pages.
That’s built-in functionality in mkdocs; define a repo_url and an edit_uri in mkdocs.yml and it should just work. It didn’t work right away (although now I wonder whether I just missed the little pencil symbol because I was looking for text). Before I dug into figuring out why, I upgraded mkdocs and material because we were two breaking versions behind; the latest is 3.0.4 and we were on 1.0.4. If I’m gonna study a tool, I want to use the latest docs.
The broken links
The upgrade was no trouble as far as producing the site. HtmlProofer, though, found a bunch of broken links. To troubleshoot this, I went through pretty much all the docs on mkdocs and on material. (They’re beautiful docs; there’s a reason we use these tools.) Then dug around in the material templates and the mkdocs Python code. I created an issue on mkdocs (it’s fixed already!) and on material (the maintainer said thanks!) for the bug, and then worked around it by adding an exception to our HtmlProofer invocation, after looking at its docs to find out how to do it. Which required breaking our HtmlProofer call into its own script because we call it in three places.
After that, there was still a broken link. I diagnosed this one also as a bug that could be fixed either in mkdocs or the template, but didn’t have the heart to make another issue report. I worked around it instead, by overriding that page in the template. (Just now, having noticed how nice the maintainers were, I made the effort to create another issue. I even tried to make a PR but the build steps didn’t work on my computer. This is not a surprise.)
A brief interval of work that I wanted to do
Now the tests and build work, and the upgrade is done. After this, I had an adventure getting the edit link working without having a GitHub repo link in the upper-right corner (it was useless and animated, yuck). To have the “Edit on GitHub” I need repo_url defined in the config, but that always results in a repo link as well, so I had to override the entire header.html to remove that link. When we update the theme, that override will be out of date. I considered various ways to use Atomist to make sure I remember to do that, then settled for a detailed commit message.
By the end of the day, I had a PR in to our docs repository with the upgrade. Over the weekend, I got that reviewed, modified and merged into master.
In which I break the entire site
The master branch of this repository gets published to GitHub pages for a final review. It looked fine there, so I deployed it to s3. A few hours later, I stopped by docs.atomist.com, and oh no!
That is not what our site should look like! None of the styles are loading!
Is it me? is it everyone? is it my browser? I tried clearing some caches, I tried a few browsers, then I rolled the site back. Then forward, and looked at it some more, then back again. There were some 404s on CSS, but later there weren’t, and then everything was loading OK but still it looked like garbage. Computed styles showed nearly-empty in Firefox but had more stuff in Chrome (later, someone pointed out that Chrome has a bunch of default styles).
This was beyond my paltry web diagnostic skills. The next day I asked for help from the team, and Danny volunteered to be a second pair of eyes. We modified the build process to push to a subdirectory so that we could leave the working site up while inspecting the newer, nonworking version. Danny spotted that the CSS files were loading, but the content type in the headers was wrong. It should be text/css but is text/plain . So the browser is loading the file and then ignoring it with no error. 😠
Useful! I had already noticed that the CSS files had changed name formats after the upgrade. Instead of application-palette-f1354325.css (or something similar) we have application-palette.f231453.css. Ah-ha, what if the dot is causing something to think it is not a .css file but a .f231453.css file? I went looking for that. I searched source for application-palette and found it in the material theme.
I found application-palette.css in the src directory and application-palette-f1354325.css in the material directory, which (according to mkdocs.yml) is where the templates for the theme live. So something is adding that extra number, in some sort of build process. OK, how does it build? There’s a package.json so I check it for scripts. Sure enough, scripts.build contains a call to … make. OK, look for a Makefile. Yup, and that contains a call to … webpack. Gah! I’ve been avoiding webpack because I know it is deep. Where does it get its definition? probably that webpack.config.js file. Look in there, and it lists plugin after plugin, all of which are unfamiliar. Noooooo. But then I spot it! I found that stupid dot in the webpack.config.js … but changing that would mean rebuilding the theme, so I look for another way.
Which is good, because that wasn’t even the problem. Later I noticed that all the CSS files had the wrong content type, not just the ones with the dots. But I learned something, right? Right.
Next I searched for “s3 content type” since whatever makes a website available from s3 is sending these. That proved fruitful. The content type comes from metadata on s3, associated with each uploaded file. I opened the AWS console and looked in S3, found this bucket, found the CSS, looked at its metadata. Sure enough, it has a content-type element set to text/plain. So how does that get there?
Not pictured: at least half an hour of learning enough about the s3 command-line interfaces to be able to list metadata. (For the record, it’s aws s3api head-object --bucket my.bucket --key path/to/file ) This included some frustration of “why is it saying Forbidden when I clearly have read permissions” which resolved to “oh right, because I’m authenticated in this one terminal but not this other one.” The API is pretty complicated; there are two. The friendly one does not list metadata. But it did let me manually aws s3 sync some files up, which let me test more things. That program understands that .css files are text/css.
While I’m working on this, I post updates and notes to myself in Slack, in my jessitron-stream channel. David notices and contributes some history and some research. Our files get to s3 using s3cmd. That should be setting the content-type metadata. David remarked “There used to be complaints in the build logs that python-magic was not installed so s3cmd was going to guess the content type,” so he installed python-magic like 3 weeks ago to end that warning.
He linked to this issue: https://github.com/s3tools/s3cmd/issues/198 (and maybe there was another one?) and suggested adding --no-mime-type --no-guess-content-type to the s3 arguments. He also removed python-magic from the build. I tried those arguments. It complained about the first one being invalid (maybe because python-magic was gone?) so I removed it. The upload happened, but when I visited that version of the site, it asked me if I wanted to download this DMG! (I’m on a Mac. That would be a .exe on Windows.) The content-type of index.html was set to octet-stream. Um, no, that’s worse. Deeper in that issue thread I found a suggestion to use --guess-content-type and tried that. My commit message (on a branch) was “Wave the wand this direction” because this is spellcasting, not understanding.
Lo, it worked! Everything worked! I rebased those changes to get rid of all the intermediate things we tried, merged them to master and tagged the new version to trigger deploy. Hooray, we are able to update docs.atomist.com again!
A red herring
In Slack, we got a ping from our designer, who was having trouble building the site now. David and I were like, oh no, the upgrade broke something. Setting up the development environment for these docs, with Python and Ruby, is a pain. Ben was getting an error that resolved on Google to “wrong version of Python,” something about an exception in a loop which was a change between Python 3.6 and 3.7. Ben uninstalled and installed Python eight different ways. Both David and I joined a screenshare to help. Now nothing (including pip, Python’s package manager) can find the Python library zlib, which is a wrapper of a native libz library for compresssion (which is installed; he has xcode tools on his mac). This means pip can’t install packages, because it can’t unzip anything, including virtualenv, which we use to control the version of Python and of libraries. His machine is a giant circle of middle fingers.
I am not even gonna try to list the things we tried here. It was a mess. Homebrew was involved, and sudo rm. The worst part is, you know what the problem was? He hadn’t updated mkdocs. He hadn’t pulled the master branch. What he had done was upgrade Python, which didn’t work with the old version of mkdocs but did work with the new! This was a few hours of all of our lives we would like to have back.
Not so fast
But this story is not over! Oh, no. The next day, some people complained on our Slack that the docs site was not loading. They were seeing the unstyled garbage. Clear the browser caches, same problem. David went to our CDN, CloudFront, and told it to not cache these things, and to manually refresh the caches. But NO. Somewhere in the bowels of the internet, bad content-types are cached for these CSS files. The files haven’t changed, so the caches decline to refresh. They do not notice that the content-type has changed. Having been bad for an hour or two, those files are now bad for some unknown amount of time, only to some people. The URLs are cursed.
The only thing to do is to rename them. I write a script that renames all the .css files and changes all references to them. I kluge that into our build process, after we build the site and before we copy it up to s3. This works.
OK. Incident over, as far as we can tell.
There is no such thing as “root cause” in systems this complex. There are “conditions that allowed this to be a problem.” And crucially, there are many conditions and actions that kept it from being worse. Eliminating the former, trying to “make sure this never happens again” is Safety-I. Amplifying the latter, sharpening our vision into potential problems strengthening our ability to solve them, is Safety-II. In this analysis, I’ll remark briefly on the Safety-I sources of problems and more extensively on Safety-II sources of resilience.
How did this happen?
It seems likely that adding the python-magic library contributed. That changed the behavior of s3cmd, except that it didn’t show until new files were created. So s3cmd’s behavior of not updating the metadata on files that already exist made this problem into a sneaky lurking one, dark debt.
Ironically, that library was added to reduce debt, as a response to this line in the build:
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
It turns out guessing MIME types based on file extensions is a great way to do it. It’s great because it’s predictable by humans, a key property of collaborative automation. This beats clever-but-unpredictable magic.
The upgrade of mkdocs and material did trigger the problem, because it met the necessary condition of adding new files. It’s tempting to avoid upgrades, because upgrades often trigger latent problems, just like this. But upgrades also remove problems, like Ben’s upgrade to Python 3.7.
How did it get so bad?
We didn’t know before deploying the live site that this wasn’t going to work. Later, we had to develop a way of deploying to s3 without overwriting the live site in order to diagnose it. I also didn’t notice immediately that it was broken, so the site was garbage for a few hours … long enough for some CDN nodes to grab and keep the evil content-types.
Gosh, there was so much.
I checked the site at all. That was deliberate.
Communication and people: I couldn’t figure this out without Danny and David. Our daily standup, Zoom, and Slack were essential collaboration tools. Our notifications from Atomist in the #docs channel of Atomist community Slack helped us see what the others were doing.
This is not a checklist. This is a set of learnings that affect our priorities for future work.
We want to see the rendering of a version of the site on s3 before we release it. This is something to build in the Atomist SDM I’m making for this site, which will replace the inflexible Travis build.
I want to check the site after each release. I can make my SDM send me a direct message whenever one is done.
We could switch from s3cmd to aws s3 sync for more predictable behavior, testable on more systems.
Setting up the right versions of Python and Ruby on a local computer is bad. I want a development process that uses Docker for isolation. Ideally, an SDM that runs locally in Docker. (That’s already been on my list, and now it seems more important.)
At elm-conf and CodeMesh and YOW! Australia this year, I did live demos using automated code modification with Atomist Rug.
Rug is now officially open source, and the Rug CLI is available so that you can try (and change! and improve!) these editors on your own Elm code. This blog post tells you how.
I usually start a new Elm project as a static page, make it look like something; then turn it into a beginner program, add some interactivity; then turn it into an advanced program and add subscriptions. I like how this flow lets me start super-simple, and then add the pieces for access to the world as I need it.
Now you can do this too!
Watch out: these editors (and the parser behind them) work for the code I’ve tried them on. As you try them, you’ll find cases I didn’t cover. Please file an issue when you do, or find me on Atomist-Community slack.
The local version of the Rug runtime is the Rug CLI. Complete installation instructions are here.
TL;DR for Mac:
brew tap atomist/tap brew install rug-cli
Generate a project
This will create a directory containing a newstatic Elm app, with a build script etc. This will put a project named banana under your current directory, make it a git repo and make an initial commit:
Now your src/Main.elm contains the beginnings of a beginner program. The model is empty and the only message is Noop, which does nothing. This is the beginner program template from the Elm tutorial, except that the view function is populated based on your main from the static page.
Now your src/Main.elm contains a new message type, ButtonPushed. Your update function handles it, but does nothing interesting.
type Msg = Noop | ButtonPushed
update : Msg -> Model -> Model update msg model = case msg of Noop -> model ButtonPushed->model
Find a new function hanging out at the end of the file, buttonPushedButton. Incorporate that into your view to display the button. Run ./build and refresh target/index.html; push the button and see the message in the debugger.
This adds a function, a message, and a field to the model so that you’ll have access to the content of the text input.
Try passing -R to rug, and it’ll make a commit for you after the editor completes. You have to make a commit yourself right before running rug, or it’ll complain.
For further edit operations, see my elm-rugs repo. You can upgrade to a full program, and add subscriptions to clicks and window size.
Change these editors! Add more!
The best part of running locally is running local versions. Clone my repository: git clone firstname.lastname@example.org:jessitron/elm-repo.git Now, go to the secret directory holding the editors: cd elm-repo/.atomist/editors Here, you can see the scripts that work on the code, like AddButton.rug.
To run the local versions, be in that elm-rugs directory, and point rug at your project directory with -C:
I don’t have to qualify the editor name with jessitron:elm-rugs when it’s local.
There’s more information in the Atomist docs on how rug works. TL;DR is, the files in the top level of elm-rugs/ are the starting point for newly generated project. NewStaticPage.rug, as a generator, starts from those and then changes the project name. The editors all start from whatever project they’re invoked on, and they can change files in place, or create new ones from templates in the elm-rugs/.atomist/templates directory. (Most of my templates are straight files, with a .vm suffix to make Rug’s merge function work.)