Saturday, August 17, 2013

Converting from svn to git

Moving from svn to git, git-svn is your best friend. However, it is a recent best friend and doesn't understand you some days.

I'm working on moving a repository from sourceforge (in svn) to bitbucket (in git). Theoretically, I should be able to clone the repo into git locally, then push all the history up to bitbucket. That would be too easy. It's more like this:

1) Build user translations. Create a text file (call it users.txt) that looks like
bobberdude = Bobber Dude <bobberdude@gmail.com>
coleslaw = Cole Slaw <coleslaw@gmail.com>
... for each of the committers to your svn repo. [1]

2) Download from svn to git locally. Theoretically this could be "git svn clone <repo url>". But with this friend, ya gotta be more specific. Tell it to do things right.

git svn clone --stdlayout --prefix 'svn/' -A users.txt https://svn.code.sf.net/p/my_project/code/ my_project

git svn clone says "Copy this svn repo with all its history into git"
--stdlayout says "this repo has a trunk, a directory of branches, and a directory of tags. Get all those."
--prefix 'svn/' says "create the branches and tags with names like svn/trunk and svn/my_branch and svn/tags/my_tag" This is how they'd normally look if we got them from a git remote. By default, git svn doesn't prefix them, which is confusing.
-A users.txt says "Use these username translations to set author/committer name and email in commits"
The URL is where svn lives.
my_project is a directory where I want the repo to be. Otherwise git-svn will create one with the same name as the last part of the repo's URL.

This is going to bring down all the branches and tags, but it doesn't do everything we want. It creates the tags as git branches, and it doesn't make local branches to link up to these remote ones.

If this giant download gets interrupted:
Try going into the repository directory and git svn fetch. This should pick up where it left off.

Caution: 
If you missed any users, it'll stop and tell you about the missing user. Add that person to the file, and then continue with git svn fetch, or by re-running the clone command. It should pick up where it left off.
Caution: 
Branches with spaces in their names can throw it off. In my experience (git 1.8.3.4), this happened after resuming a download after the "you forgot this user" error. I deleted that repo and starting over from the beginning with a fuller user file, and it worked then.

3) Create git branches to match all the remote branches. If there's only a few, you can type
git branch my_branch svn/my_branch
for each of them.

If there's a bunch, here is one giant command:

git for-each-ref refs/remotes/svn --format="%(refname:short)" | sed 's#svn/##' | grep -v '^tags' | while read aBranch; do git branch $aBranch svn/$aBranch; done

This says: Take the name of each remote reference that starts with svn, and give me the reference name. Strip svn/ from it. Skip any that starts with tags. For each line, call the contents of the line $aBranch, and run this command at the shell: create a local branch called $aBranch that tracks svn/$aBranch. 

This happened to create a trunk branch that's a duplicate of master, so delete it:
git branch -d trunk 

Whew, that was a lot of magic to do something that sounds simple in English.

4) Create git tags for all of the svn tags. Git-svn brings the svn tags down as if they were branches with a 'tags/' prefix. I don't know why, but I do know how to fix it.
If there's only a few tags, type this for each:
git tag my_tag svn/tags/my_tag

If there are several, here's a hairy shortcut:
git for-each-ref refs/remotes/svn/tags --format="%(refname:short)" | sed 's#svn/tags/##' | while read aTag; do git tag $aTag svn/tags/$aTagdone

This says: Take the name of each remote reference that starts with svn/tags, and give me the reference nameStrip svn/tags/ from it. For each line, call the contents of the line $aTag, and run this command at the shell: create a local tag called $aTag that points at svn/tags/$aTag. 

5) Hurray! We have all the info we need locally. Now we can do warm-fuzzy git operations.
If your goal is to push this to bitbucket, go there and create a brand-new repo.
Come back to your shiny new local git repo, and tell it about bitbucket:

git remote add origin https://new-bitbucket-repo-url 

This says "Yo, learn about this place called 'origin.' You can talk to it here."

Then shove everything up:

git push -u --all origin
git push --tags origin

That says, "Push all branches and tags up to origin, and then make my branches track that." You can leave off the -u option if you're planning to re-clone fresh for your own work, which is probably a good idea.

Done. Now I can put git-svn on the back burner and hang out with more fun friends, like rebase and reflog.

-----------------------
[1] How to get the list of all committers?
In svn, I don't know. (If you do, please comment)
If you've already cloned the repo into git without supplying a user translation, then do this (mac or linux):
git log --all --format="%aE" | sort -u

git log says "list all the commits"
--all says "log all the branches and tags you know about"
--format="%aE" says "print only the author email"
| (pipe) is linuxy for "send the output to"
sort -u is linuxy for "sort all the lines, removing duplicates."

5 comments:

  1. Very useful post. Also check out SubGit which does most of this 'out-the-box'.

    ReplyDelete
  2. Thank you, this was extremely helpful.
    I like the way you explain each command and describe what the switches do.

    ReplyDelete
  3. Thank you; I've converted a few, rather haphazardly, but I'll likely put this guide to use shortly.

    To get the list of committers in svn, I've used (assuming https://svn.code.sf.net/p/my_project/code/ as your repository root):

    svn log -q https://svn.code.sf.net/p/my_project/code/ | perl -nE '/r\d+ \| ([^\|]+) \| / and $seen{$1}=1; END { say for sort keys %seen }'

    ReplyDelete
  4. The reason why git-svn translates Subversion tags to Git branches is that although Subversion tags are semantically equivalent to Git tags, they are effectively equivalent to Git branches.

    A Git tag is a first-class object, unlike a Subversion tag. A Git tag is fixed (by a sha): once created it can never change. A Subversion tag is just a copy of a directory, exactly like a Subversion branch is. Once you create a Subversion tag you can make new commits to it, altering what a snapshot looks like when you check out that tag. Convention discourages you from changing a tag, but Subversion does not stop you from doing that.

    With that in mind, how would you have git-svn translate tags? It could translate them as tags hoping for the best, but when it encounters a Subversion tag change what should git-svn do? The answer is that there are a number of equally poor solutions. At the end, git-svn decides to translate them uniformly as branches and let you deal with this headache.

    ReplyDelete
  5. Step 3 below to get the authors file from SVN. Thanks a lot for the tutorial.

    https://www.semitwist.com/articles/article/view/the-better-svn-git-guide

    ReplyDelete