Sunday, September 9, 2012

Git: The Good Parts - history is written by the victors

Why change history?

Version control has two reasons for being: save your work, and tell the story of your project. These two goals are in conflict. The more often you save your work, the more jumbled and cluttered the story of your project.

"Time for lunch" is a great reason to save your work, but it makes a lousy commit message for posterity.

Git solves this: it lets you rewrite history. It says, "Do what you need to do, and then decide how you want to remember it." Why take this extra step? Because software is not written file by file, line by line. Software is built feature by feature, fix by fix. This is what you want to see when you look back at history.

How to change history

There are two aspects to telling your story in git: rewriting feature branch history and bringing feature branches into mainline. This post covers the former.

Before I share a change with other people, I use interactive rebase to put the changes in a rational order. It's like going back to a savepoint in a video game to optimize my play, except I don't have to fight the enemies again; I just tell git what order I should have fought them, and git replays the game in that order.

Let's pretend that I slayed a troll, an ogre, an innocent bystander, the big boss, and then a stray ogre. I want it to look like I killed two ogres at once, then the troll, then the big boss; leave innocent bystander alone.
Currently, my commit history looks like this:
The first step is to find that savepoint. Often this is where our work diverged from origin. The farthest you can go back is the last commit that was fetched or pushed. Never change shared history. (git will let you change it - Bad Part Alert.) Take a look at the commit graph in gitx or gitk, or use git log, and find the last good commit that you don't want to change. In this example, it's the green commit where I entered the level.

Next, take a deep breath and type:
git rebase --interactive b6f5708
Put the hash of your commit in place of mine, of course. (Bad Parts Alert: if you use a branch name instead of a commit hash, different things happen.)

*Poof* some magic happens and a file opens before your eyes. Each post-savepoint commit is listed, oldest first. Part of the commit hash and the first line of the commit message identifies each. Each is prefixed by the mysterious word "pick."
pick a4546fe killed a troll
pick 07c4384 killed an ogre
pick c8370ba killed an innocent bystander
pick 4bc809a killed the big boos
pick f07833b killed one more ogre
The file also contains comments containing handy instructions. The concept is: mess with the contents of the file, replacing pick with one of the other spells, until the list of commits looks like what we wish we had done. The good parts here are pick and squash. Pick means "use this commit" and squash means "use these changes, but put them into the previous commit."

There are two less-obvious spells: the order of lines in the file controls the final order of commits, and deleting a line from a file eliminates that commit from the final result. Be careful with that [1].

Oh hey! While I'm in here, I notice a typo in the commit message where I killed the big boos. Changing it in this file has no effect, but changing pick to reword will trigger git to open the commit message for editing.

Change the file by rearranging the lines to the correct order, and making the second ogre commit say "squash":
pick 07c4384 killed an ogre
squash f07833b killed one more ogre
pick a4546fe killed a troll
reword 4bc809a killed the big boos
With much anticipation, I save and quit. I am stymied by another editor window: the squashed-together commit wants a new commit message. In this case, I'll record this as "killed two ogres," then save and quit again. Next the commit message I want to reword opens; fix this, save and quit.

Finally, the rebase is complete. Our old commits go away, and new ones are created in the order specified. our history looks like this:

Hurray! Our branch is all pretty.
Real life happens on the unhappy path
Rebase doesn't always go this smoothly. As the rebase is recreating each commit, conflicts are possible. For instance, two commits might change the same part of a file, and the rebase re-orders them. When this happens, git stops the rebase and asks you to resolve the conflicts. You have two options:
  • use git status to see the files with conflicts; edit them to resolve the conflicts; use git add <file> to tell git they're ready to go; git rebase --continue.
  • abort! abort! git rebase --abort and try something different. This gives up on the entire rebase and puts you back where you started. [2]
Bad Part Alert: until the rebase is either aborted or continued, your repository is in a strange state, and weird things could happen.
From here, it's time to get these changes into the mainline branch - the different ways to do that is a topic for another post.

If you thought this was fun, there are other reasonable ways to rewrite history: change commit messages; split one commit into multiple commits; move some commits to a different branch. These are stories for another day. For now, keep saving your work, and consciously change your history so your coworkers don't know what time you went to lunch.

[1] deleting a line from the rebase file is not entirely a good part - the functionality is useful, but this is a cruel way to implement it. I wish we could change "pick" to "skip" instead, and receive an error if we accidentally delete a line from the file.

[2] if you find yourself restarting rebases or merges, you might want to enable git rerere, because it saves you from repeating the same merges.

5 comments:

  1. You say different things happen when you enter a branch name instead of a commit hash. This is not true, because the first argument to git rebase is used to define where to rebase onto.

    If you'd use git rebase -i master, the only difference is that one extra commit is shown in the editor (enter the level).

    And that weird state is called a detached HEAD, where new commits won't belong on any branch, and might get lost track of (except that the reflog keeps it force while) when checking out a branch or continuing the rebase.

    ReplyDelete
    Replies
    1. In this particular example, it is true that "rebase -i master" will do very nearly the same thing. However, if master had diverged -- it there were commits on master that are not on this branch -- then "rebase -i master" would apply the new commits to the tip of master. This is something different.
      Rebasing to a branch has some great purposes, but in this post i'm using rebase only to rearrange commits in immediate history, not to move them to a different point in the tree.

      The weird state mid-rebase is weirder than detached HEAD. You can get to detached HEAD with a simple git checkout HEAD^ (or any other address to a commit that isn't at the tip of a branch). Mid-rebase is something else; git status is different, and there's some stuff inside .git that isn't normally there, and generally your repository will not operate normally until you either continue or abort the rebase. (I thought maybe a reset --hard HEAD would get you out, and it sort of does - but not to where you started! Yikes, I need to do a post on the reflog.)

      Delete
  2. I'm really enjoying your Git posts. Keep 'em coming!
    -deech

    ReplyDelete