Tuesday, December 11, 2012

TFS and Git: a comparison

TL;DR: If you want an integrated, enforced process, use TFS. If you value developer freedom and individual work strategies, use git. 
The players:

TFS (Team Foundation Server) is Microsoft's answer to version control. TFS was designed big and bad-ass enough to support development of Visual Studio, like 5 million lines of code. [1] Since not everyone lives in Redmond, TFS tries to minimize network traffic and keep the work on the server.

git is open-source. It was created to manage the source code of the Linux kernel, roughly 15 million lines of code. [2] Development is distributed among thousands of contributors around the world. Many features and versions are in process all the time.

The setting:
DevTeach attendees and speakers provided all the TFS information in this post. Thanks everybody!

Why should your team use one or the other?

First, if you don't use Visual Studio, forget about TFS.
[Correction: see Wes MacDonald's comment below. It can be done.]
Second, do you want version control or a full suite including a build server, work item tracking, and analytics? git is version control only. Github has some of the same bells and whistles as TFS, but Github is outside the scope of this post.

[update: TFS has started to embrace git, and you can use git as version control within TFS. If you've worked this way and have opinions, please add your comment.]

Here's a high-level comparison of features, for those who enjoy such formats. Below that, find discussion of the important feature differences, user experiences, and the one question that will tell you which is right for you.

Feature TFS git
Save source code Yes Yes
Retain all version history Yes Yes
Group changes into sets Yes (file level only) Yes
Automatic change detection in 2012 Yes
Branch Yes (not easy) Yes (very easy)
Merge between related branches Yes Yes (very easy)
Safe merge between unrelated branches No Yes
Offline access to history No Yes
Offline commit No Yes
Offline commit No Yes
Sneaky automatic merges on the server Yes No
Enforce requirements before commit Yes No
Private local branches No Yes
Learning curve for users Low Medium
Learning curve for administrators Very high Medium-high
GUI support Strong Poor
Conditional commit (if tests work) Yes No
Save IDE state Yes No
Work item tracking Yes No
Automated builds and tests Yes No
Analytics and charting Yes No
Manual test tracking Yes No
Installation 1/2 day 10 minutes
Deployment, central and local days to months 1 day
Free online hosting Yes Yes
Cost $$$ free
Security Yes No

What does this chart say? It says, "Git does version control, and does it well" and "TFS does version control and build automation and work item tracking and test tracking and analytics in a SQL Server data cube with graphs and charts."

TFS is a whole development workflow suite. It integrates all its parts, and then integrates with Visual Studio, Sharepoint, and Active Directory. Git doesn't impose or supply any of these parts; there are other solutions both open-source and commercial. If you want to select each item of furniture separately, hunting through different stores until you find the best chair and couch and table for you, then git is your source control rug. If you prefer to stop at Ethan Allen and choose a coordinating set, that's TFS.

Michael's company uses TFS for source control. On work item tracking, he says, "We haven't made use of this feature because we haven't adapted our business process to it... which come to think of it doesn't make sense." You don't have to use all the features of TFS.

Choosing to use all of TFS touches the whole business of producing software. An impact like this can lead to a lot of meetings, and that is why J.R. Roy and James both said deployment can take months, even though installation is fast.

I'll highlight a few killer features of TFS, and then a few of git.

If you like tight, granular control over security, TFS has this. Use AD groups or define your own; assign read and write permissions down to the individual file level if needed. In contrast, access to  whole git repositories is enforced by the file system.

The tight integration of TFS components enforces traceability: for instance, organizations can require that each changeset be associated with a work item. Luc says TFS is "good stuff. All code is related to a work item, everything is linked together."

TFS has an explicit checkout process. This is both bad and good. In TFS 2010 and earlier, files are read-only until Visual Studio checks them out, which drives people nuts. External editors don't integrate. In TFS 2012 there's an improvement: changes are noticed automatically, and the checkout performed behind the scenes. This explicit checkout process is good because when the server knows who is working on each file, a developer can find out who else is changing the file they're about to slaughter. I've wished for this in IntelliJ IDEA with git.

Then again, as Maxime points out, "If you're working on the kernel of Linux you don't want to know" whether sixty other developers in fifty different feature branches are working on the same file.

There's another feature of TFS that will either make you say "fantastic!" or "run away!" TFS requires SQL Server, and it uses the database to store all kinds of tracking data about work items, commits, code quality, and build history. It outputs an entire data cube to generate burndown charts, bugs over time, and all those high-level pictures that give managers the illusion that they know what's going on in the development team.

Now for git's killer features. Git is a distributed version control system (DVCS), which means that every developer's copy can access every version of every file ever, even on an airplane. Incidentally, this serves as a free offline backup for every dev who takes their laptop home. More importantly, people can view history, save their own commits, and branch all without a connection to the central repository. The central depot is accessed only when explicitly requested.

There's a subtler advantage to having a whole repository locally: saving work is separated from sharing work. Developers can save a series of changes and then test thoroughly before transferring those changes to the central depot and build server. As Tri put it: "I like being able to save my code without making my friends suffer."

Then again, as Etienne Tremblay sees it, the extra step of saving locally and then sharing "adds a layer of complexity." It is another step the developer has to think about, another decision to be made.

Good news for the developer who wants the privacy of git while his organization prefers TFS: integration. There's git tfs, which is a git plugin, and git-tf, written by Microsoft. Git repositories and TFS can talk to each other. Try git without impacting your team.

At this point, git fans are screaming, "What about branching and merging??" The subtleties of the differences here are beyond the scope of this post, but in brief: branches and merges are orders of magnitude smaller and simpler in a git workflow. I create git branches and merge changes several times a day. Contrast with Daniel, a TFS user: "We tried branching one time, but we didn't end up using it."

Then again, Etienne points out, "dealing with branches is very complicated for people." Maybe developers don't want this alternative.

At its core, TFS is centralized. To reduce network traffic, Microsoft "chose a design where the vast majority of information (including information about your local workspace) was stored on the server." [1] Git, on the other hand, keeps every local copy fully independent. Even the central depot is only central by convention. The happy rumor is that TFS is embracing more of the distributed model in future releases.

Now for the real, defining difference between TFS and git. TFS says, "This is how to run your process. Follow these steps." There are customization points; these are a lot of work and will be decreed by the organization. Git says, "it's your repository, do what you need to do. Then choose how to tell the story of your project." Git gives you, as a developer and as a team, a lot of choices. That's not always a good thing.

Howard Dierking provides the one question you need to ask yourself: "Do you want something that's easier to apply corporate governance to?" If yes, then TFS.

[1] http://blogs.msdn.com/b/bharry/archive/2011/08/02/version-control-model-enhancements-in-tfs-11.aspx
[2] http://en.wikipedia.org/wiki/Linux_kernel

Sunday, December 9, 2012

Git, the many parts: five categories of files

If you've ever been burned by stash not stashing everything you wanted it to stash, or reset --hard not reseting everything you wanted to reset, then this post is for you.

I used to think there were three places that files can be in git: the working directory, the staging area, and the repository.

I think of them like this: you make your changes, you put some of them on a pallet (the staging area), and when you're satisfied with how the pallet looks, you wrap 'em up and load 'em on a truck (the commit). That's when they get a label (the 40-hex-digit commit identifier), and when you're ready to share them, you send the truck on its way (push).

Three places for files... it's one more than subversion, but not so bad. Recently, after much confusion and cursing, I have learned that there are at least five.

To start with, take a look at git status. Git status compares the staging area and the working directory with the most recent commit (HEAD), and tells us what changes we've made.

This shows changes that are staged, modified, and untracked.

What is the difference between modified and untracked? Ay, there's the rub. Untracked files do not exist at all in the HEAD commit.

The five categories of changed files

We know about files that are committed; they're saved in the repository. We know about files that are staged; they're ready to go in the next commit. Files that have local changes are in the modified category, and they're not weird. Let's zoom in on the two new designations.

Untracked files are new to git -- that is, they're not in the HEAD commit; they could be in some commit somewhere. Git assumes no responsibility for these files. They're in an unstable state and I recommend moving them asap. Here are your options:

1. git add to put them in the staging area. This tells git to care about them. From here, you can stash them or commit them. IntelliJ IDEA asks me, when I create a new file, "Do you want to add this to git?" I used to tell it, no, go away and leave me alone! But now I realize that my IDE was right: it's better to have these files in the staging area than in this never-never-land of untrackedness.

2. Ignore them. Find details on the two ways to ignore files and which you should choose. It's a good idea to get those files right into ignored status before ever committing them. If you decide to ignore files after they're committed, they wind up in two states at the same time: modified and ignored. Use git rm to combat this.

What if untracked files show up when you least expect it?

This happens to me when .gitignore is different in different branches. Rectify this ASAP. (This is a good time to say, "Ooh, I can use cherry-pick!")

Sometimes when I'm trying to get my repo to match HEAD, I want those untracked files to go away dammit. The command for that is:
 git clean -df
The -d says "yes, directories too dammit" and the -f says "no really, freaking delete the untracked files." By default, git clean will refuse to do any work unless you curse at it with -f. You can change this permanently with configuration:
git config --global clean.requireForce false
That tells git clean "do your job when I call you."

In summary: untracked files are your enemy. Ignore what should be ignored, and get the good ones into the staging area right away.