Line endings in git

Git tries to help translate line endings between operating systems with different standards. This gets sooo frustrating. Here’s what I always want:

On Windows:

git config --global core.autocrlf input
This says, “If I commit a file with the wrong line endings, fix it before other people notice.” Otherwise, leave it alone.

On Linux, Mac, etc:

git config --global core.autocrlf false
This says, “Don’t screw with the line endings.”

Nowhere:

git config --global core.autocrlf true
This says, “Screw with the line endings. Make them all include carriage return on my filesystem, but not have carriage return when I push to the shared repository.” This is not necessary.

Windows and Linux on the same files:

This happens when you’re running Linux in a docker container and mounting files that are stored on Windows. Generally, stick with the Windows strategy of core.autocrlf=input, unless you have .bat or .cmd (Windows executables) in your repository.

The VS Code docs have tips for this case. They suggest setting up the repository with a .gitattributes file that says “mostly use LF as line endings, but .bat and .cmd files need CR+LF”:

* text=auto eol=lf
*.{cmd,[cC][mM][dD]} text eol=crlf
*.{bat,[bB][aA][tT]} text eol=crlf

Troubleshooting

When git is surprising you:

Check for overrides

Within a repository, the .gitattributes file can override the autocrlf behavior for all files or sets of files. Watch out for the text and eol attributes. It is incredibly complicated.

Check your settings

To find out which one is in effect for new clones:
git config --global --get core.autocrlf

Or in one repository of interest:
git config --local --get core.autocrlf

Why is it set that way? Find out:
git config --list --show-origin
This shows all the places the settings are set. Including duplicates — it’s OK for there to be multiple entries for one setting.

Why does this even exist?

Historical reasons, of course! (If you have a Ruby Tapas subscription, there’s a great little history lesson on this.)

Back in the day, many Windows programs expected files to have line endings marked with CR+LF characters (carriage return + line feed, or \r\n). These days, these programs work fine with either CR+LF or with LF alone. Meanwhile, Linux/Mac programs expect LF alone.

Use LF alone! There’s no reason to include the CR characters, even if you’re working on Windows.

One danger: new files created in programs like Notepad get CR+LF. Those files look like they have \r on every line when viewed in Linux/Mac programs or (in code) read into strings and split on \n.

That’s why, on Windows, it makes sense to ask git to change line endings from CR+LF to LF on files that it saves. core.autocrlf=input says, screw with the line endings only in one direction. Don’t add CR, but do take it away before other people see it.

Postscript

I love ternary booleans like this: true, false, input. Hilarious! This illustrates: don’t use booleans in your interfaces. Use enums instead. Names are useful. autocrlf=ScrewWithLineEndings|GoAway|HideMyCRs

Layers in software: from data to value

Then

Back in the 2000s, we wrote applications in layers.

Presentation layer, client, data transfer, API, business logic, data access, database. We maintained strict separation bet ween these layers, even though every new feature changed all of them. Teams organized around these layers. Front end, back end, DBAs.

Each layer of software is a wide box, next to its team.
They stack on top of each other: frontend stuff, backend stuff, database, each with its team.
At the top are some customers. Value flows from them to the db and back, crossing all the layers.
Business value exists only by flowing through all the layers to the DB and back.

Layers crisscrossed the flow of data.

Responsibility for any one thing to work fell across many teams.

Interfaces between teams updated with every application change.

Development was slow and painful.

Now

In 2019, we write applications in layers.

A business unit is supported by a feature team. Feature teams are supported by platforms, tooling, UI components. All teams are supported by software as a service from outside the company.

Feature teams at the top of the software are multicolored, with multiple components in their software.
Under them are platform and component teams, each different.
Under them are nice square boxes of external services.
Business value flows through the top layer (feature teams), staying close to the business people.
Developer value flows between the feature teams, through the internal teams, to external services and back.
Business value is concentrated in the feature teams; developer value flows through support teams and external services.

Back in the day, front end, back end, operations, and DBAs separated because they needed different skills. Now we accept that a software team needs all the skills. We group by responsibility instead — responsibility for business value, not for activities.

Supporting teams provide pieces in which consistency is essential: UI components and some internal libraries.

Interfaces between teams change less frequently than the software changes.

Layers crisscross the flow of value.

DevEx

Feature teams need to do everything, from the old perspective. But that’s too hard for one team — so we make it easier.

This is where Developer Experience (DevEx) teams come in. (a.k.a. Developer Productivity, Platform and Tools, or inaccurately DevOps Teams.) These undergird the feature teams, making their work smoother. Self-service infrastructure, smooth setup of visibility and control for production software. Tools and expertise to help developers learn and do everything necessary to fulfill each team’s purpose.

Internal services are supported by external services. Managed services like Kubernetes, databases, queueing, observability, logging: we have outsourced the deep expertise of operating these components. Meanwhile, internal service teams like DevEx have enough understanding of the details, plus enough company-specific context, to mediate between what the outside world provides and what feature teams need.

This makes development smoother, and therefore faster and safer.

We once layered by serving data to software. Now we layer by serving value to people.

Development aesthetic: experiments

Software development is a neverending string of “What happens if I…?” Each new runtime, language, or tool is a new world with its own laws of nature. We get to explore each one with experiments.

Objective

Today I added another alias to my PowerShell $profile:

echo "Good morning!"
# ...

Function ListFilesWithMostRecentAtBottom {
    Get-ChildItem | Sort-Object -Property LastWriteTime
}
Set-Alias ll ListFilesWithMostRecentAtBottom

To use that alias in my current shell, I need to source the profile again. I googled how to do this. The first useful page said:

& $profile

So I typed that. It echoed “Good morning!” but the alias did not work.

Hmm, did it not save?

I can test that. I changed the echo to “Good morning yo!” and tried again.

It printed the new text, but still didn’t get the alias.

Hmm, is something wrong with the alias?

I opened a new shell window to test it.

The new alias works in the new window. Therefore, it’s the & $profile command that is not doing what I want.

Investigation

I could ignore the problem and continue work in the new window. My alias is working there. But dang it, I want to understand this. I want to know how to reload my $profile in the future.

Time for more googling. The next post had a new suggestion:

. $profile

I typed that, and it worked. yay!

But wait, was that the old window or the new window? What if it only worked becuase I was in the new window?

I want to be certain that running . $profile brings in any new aliases I just added. For a proper experiment, I need to see the difference.

Experiment

I add a new alias to my $profile, and also change the echo so that I’ll be sure it’s running the new version.

echo "Good morning yo tyler!"
# ...

Function ListFilesWithMostRecentAtBottom {
    Get-ChildItem | Sort-Object -Property LastWriteTime
}
Set-Alias ll ListFilesWithMostRecentAtBottom
Set-Alias tyler ListFilesWithMostRecentAtBottom

In my terminal, I run tyler as a test case, then the command I’m investigating (. $profile), then the test case tyler again.

Now I can see the before and after, and they’re different. I can tell that . $profile has the desired effect. Now I have learned something about PowerShell.

Epilogue

I remove the extra tyler stuff from $profile.

As far as I can tell, & runs the script in a subshell, and . runs the contents of the script in the current shell. The . command works like this in bash too, so it’s easy for me to remember.

Today I took a few extra minutes and several extra steps to make an experiment and figure out what PowerShell was doing. Now I know how to reload my $profile. Now you know how to run a tiny experiment to ascertain that what just happened, happened for the reason you think it did.

This verbosity makes me happy

Today I learned how to create aliases in PowerShell. I’m switching from Mac to Windows, and I want the terminal in VS Code to do what I want.

No terminal will work for me until it interprets gs as git status. I type that compulsively.

In bash, setting that up looks like this:

alias gs='git status'

But in PowerShell, aliases can only refer to single words. No parameters. Wat.

You can make a function with the whole command in it, and then set an alias to that function.

Function GitStatus { git status }
Set-Alias gs GitStatus

The first time I did this it felt kinda silly. But then the second time …

Function CommitDangit { 
    git add .
    git commit -m "temp" 
}
Set-Alias c CommitDangit

This alias c makes a crappy commit as quickly as possible. I use it when live coding, to make insta-savepoints when stuff works. (I’m a bit compulsive about committing, too. Just commit, dangit!)

The PowerShell syntax requires a long name for my command before I give it a short one. This is more expressive than the bash:

alias c='git add . && git commit -m "temp"'

My CommitDangit function is named for readability, plus a tiny alias for fast typing.

This is a win. I like it more than the bash syntax. PowerShell is a more modern scripting language, and it shows.

Bonus: in bash I put those aliases in a file like .bashrc or .bash_profile or sometimes another one, it depends. In PowerShell, I put the aliases in a file referenced by $profile. Edit it with: code $profile, no figuring out which file it is.

Next: reload the $profile in an existing window with . $profile

Morning Stance

It is 7:09. One child is out, and I have returned to bed. Alexa will wake me at 7:15.

Six minutes: I could make my bed or do tiny morning yoga. Six minutes of rest is useless; I’ll feel worse afterward. What am I likely to do?

I picture the probability space in front of me. Intention, habit, and a better start to the day push me toward yoga. Yet there’s a boundary there, a blockage: it is my current stance.

At 7:09, if I were standing, I’d likely do yoga. But at 7:09 and horizontal, I’m gonna stay horizontal. Only a change in surrounding conditions (beep, beep, beep!) will trigger motion.

Cat Swetel talks about stances. By changing your stance, you change your inclinations.

It is 7:10. I choose to change my stance. I stand up.

I make my bed.

One deliberate change of stance, and positive habits and intentions take it from there.

Developer aesthetic: a command line

Today I typed psql to start a database session. That put me in the wrong place, so I typed \connect org_viz to get into the database I wanted.

But then I stopped myself, quit psql, and typed psql -d org_viz at the command prompt.

Why?

It smooths my work. I knew I would exit and re-enter that database session several times today, and this way pushing up-arrow to get to the last command would get me to the right command. No more “oh, right, I have to \connect” for today.

It makes my work more reproducible. As a dev, every command I type at a shell or REPL is either an experiment or an action. If it’s an experiment, I’ll do different things as fast as I can. If it’s an action, I want it to express my intention.

What I’m not doing is meandering around a toilsome path to complete some task that I know perfectly well how to do. Once known, all those steps belong in one repeatable, intention-expressing automation.

Correcting the command I typed is a tiny thing. It expresses a development aesthetic: repeatability. If I’m not exploring, I’m executing, and I execute in a repeatable fashion. I executed that tiny command to open the database I wanted. Then I re-used it a dozen times. Frustration saved, check. Developer aesthetic satisfied, check.

Don’t build systems. Build subsystems.

Always consider your design a subsystem.

Jabe Bloom

When we build software, we aren’t building it in nowhere. We aren’t building a closed system that doesn’t interact with its environment. We aren’t building it for our own computer (unless we are; personal automation is fun). We are building it for a purpose. Chances are, we build it for a unique purpose — because why else would they pay us to do it?

Understanding that surrounding system, the “why” of our product and each feature, makes a big difference in making good design decisions within the system.

It’s like, the system we’re building is our own house. We build on a floor of infrastructure other people have created (language, runtime, dependency manager, platform), making use of materials that we find in the world (libraries, services, tools). We want to understand how those work, and how our own software works. This is all inside our house.

To do that well, keep the windows open. Look outside, ask questions of the world. What purpose is our system serving? What effects does it have, and what effects from other subsystems does it strengthen?

Whenever you’re designing something, the first step is: What is the system my system lives in? I need to understand that system to understand what my system does.

Jabe Bloom

It is a big world out there, and these are questions we can never answer completely. It’s tempting to stay indoors where it’s warm. We can’t know everything, but we gotta try for more.

Nested learning loops at Netflix

Today in a keynote at Spring One, Tom Gianos from Netflix talked about their internal data platform. He listed several components, ending with quick mention of the “Insights Services” team, which studies how the platform is used inside Netflix. A team of people that learns about how internal teams use an internal platform to learn about whatever they’re doing. This is some higher-order learning going on.

It’s like, a bunch of teams are making shows for customers. They want to get better at that, so they need data about how the shows are being watched.

So, Netflix builds a data platform, and some teams work on that. The data platform helps the shows teams (and whatever other teams, I’m making this up) complete a feedback loop, so they can get better at making shows.

diagram: customers get shows from the show team; that interaction sends something to the data platform, which sends something to the shows team. That interaction (between the shows team and the data platform) sends something to the Insights Services team, which sends info to the data platform team.

Then the data platform teams want to make a better data platform, so an Insights Services team collects data about how the data platform itself is used. I’m betting they use the data platform for that. I also bet they talk to people on the shows teams. Then Insights Services closes that feedback loop with the data platform team, so that Netflix can get better at getting better at making shows.

Essential links in this loops include telemetry in all these platforms. The software that delivers shows to customers is emitting events. The data platform jobs are emitting events about what they’re doing and for whom.

When a human does a job, reporting what they’re doing is extra work for them. (Usually flight attendants write drink orders on paper, or keep them in memory. The other day I saw them entering orders into iPads. Guess which was faster.) In any human system, gathering information costs money, time, and customer service. In a software system, it’s a little extra network traffic. Woo.

Software systems give us the ability to study them. To really find out what’s going on, what was working, and what wasn’t. The Insights Services team, as part of the data platform organization, can form hypotheses and then test them, adding telemetry as needed. As a team with internal customers, they can talk to the humans to find out what they’re missing. They can get both the data they think they need, and a glimpse into everything else.

Software organizations are a beautiful opportunity for learning about systems. We can do science here: a kind of science where we don’t try to find universal laws, and instead try to find the forces at work in our local situation, learn them and then sometimes change them.

When we get better at getting better — wow. That adds up to some serious acceleration over time. With learning loops about learning loops, Netflix has impressive and growing advantages over competitors.

Don’t just keep trying; report your limits

The other day we had problems with a service dying. It ran out of memory, crashing and failing to respond to all open requests. That service was running analyses of repositories, digging through their files to report on the condition of the code.

It ran out of memory trying to analyze a particular large repository with hundreds of projects within it. This is a monorepo, and Atomist is built to help with microservices — code scattered across many repositories.

This particular day, Rod Johnson and I paired on this problem, and we found a solution that neither of us would have found alone. His instinct was to work on the program, tweaking the way it caches data, until it could handle this particular repository. My reaction was to widen the problem: we’re never going to handle every repository, so how do we fail more gracefully?

The infrastructure of the software delivery machine (the program that runs the analyses) can limit the number of concurrent analyses, but it can’t know how big a particular one will be.

However, the particular analysis can get an idea of how big it was going to be. In this case, one Aspect finds interior projects within the repository under scrutiny. My idea was: make that one special, run it first, and if there are too many projects, decline to do the analysis.

Rod, as a master of abstraction, saw a cleaner way to do it. He added a veto functionality, so that any Aspect can declare itself smart enough to know whether analysis should continue. We could add one that looks at the total number of files, or the size of the files.

We added a step to the analysis that runs these Vetoing Aspects first. We made them return not only “please stop,” but a reason for that stop. Then we put that into the returned analysis.

The result is: for too-large repositories, we can give back a shorter analysis that communicates: “There are too many projects inside this repository, and here is the list of them.” That’s the only information you get, but at least you know why that’s all you got.

And nothing else dies. The service doesn’t crash.

When a program identifies a case it can’t handle and stops, then it doesn’t take out a bunch of innocent-bystander requests. It gives a useful message to humans, who can then make the work easier, or optimize the program until it can handle this case, or add a way to override the precaution. This is a collaborative automation.

When you can’t solve a problem completely, step back and ask instead: can I know when to stop? “FYI, I can’t do this because…” is more useful than OutOfMemoryError.

Stick with “good enough,” until it isn’t

In business, we want to focus on our core domain, and let everything else be “good enough.” We need accounting, payroll, travel. But we don’t need those to be special if our core business is software for hospitals.

As developers, we want to focus on changing our software, because that is our core work. We want other stuff, such as video conferencing, email, and blog platforms to be “good enough.” It should just work, and get out of our way.

The thing is: “good enough” doesn’t stay good enough. Who wants to use Concur for booking travel? No one. It’s incredibly painful and way behind modern web applications that we use for personal travel. Forcing them into an outdated travel booking system holds your people back and makes recruiting a little harder.

When we rent software as a service, then it can keep improving. I shuddered the last time I got invited to a WebEx, but it’s better than it used to be. WebEx is not as slick as Zoom, but it was fine.

There is a lot of value in continuing with the same product that your other systems and people integrate with, and having it improve underneath you. Switching is expensive, especially in the focus it takes. But it beats keeping the anachronism.

DevOps says, “If it hurts, do it more.” This drives you to improve processes that are no longer good enough. Now and then you can turn a drag into a competitive advantage. Now and then, like with deployment, you find out that what you thought was your core business (writing code) is not core after all. (Operating useful software is.)

Limiting what you focus on is important. Let everything else be “good enough,” but check it every once in a while to make sure it still is. Ask the new employee, “What around here seems out of date compared to other places you’ve worked?” Or try a full week of mob programming, and notice when it gets embarrassing to have six people in the same drudgery.

You might learn something important.