DTEK

Web Strategy for Progressive Causes and Big Ideas

Migrating projects from subversion to git

Benj's picture
Thu, 11/17/2011 - 5:12pm -- Benj

A version control system is essential to keep track of code changes, and provide a repeatable, reversible process for applying those changes to development, staging and production websites. At DTEK, we used Subversion (svn) for years, but recently made the change to Git. I won't get too deep into the reasons why, but the main win for us is the fast and simple branching; our workflow seems to be less "work" and more "flow" now that we're using git. ;-)

The Plan

Our svn repositories contained supporting files like documents, graphic source files and HTML prototypes, but for the new Git workflow we developed we'll need two separate repositories to make site deployment easier (and keep things like internal documentation off the web servers). We chose the simple naming convention of "projectname" for the site files, and "projectname.support" for everything else we'd like to version.

This post will lay out the details of how we migrated each of our projects from a single Subversion repository to multiple bare Git repositories, keeping years of history intact. "Bare" repositories do not have a local working checkout of the files available for modification. They are the recommended format for shared repositories, and also make for a simple transition from Subversion's centralized model.

The basic process will be:

  1. Prep: pull the svn authors and create a text file for git to use
  2. Website: clone just the website directory and its contents, and create a new bare repository on our central server
  3. Support: clone the entire svn repository to a temporary git repository, ignoring paths to the website; move it to a bare "projectname.support.git" repository on the server; use the git filter-branch command to remove all commit references pointing at website files.

Let's Dig In

These steps are largely based on this guide from John Albin, though the first couple are from this d.o post by FatherShawn. I will be using a base directory called 'repos' to contain my work.

Prep: Find svn author info

The following short bash script will pull out the svn author information, then we'll update it for git.

#!/usr/bin/env bash
 
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = NAME <user@domain>";
done

Make sure the file is executable, then we run it from the root of our svn repository.

$ cd projectname/
$ ../extractSvnAuthors.sh > ../mySvnAuthors.txt
$ cat ../mySvnAuthors.txt
arh1 = NAME <USER@DOMAIN>
benj = NAME <USER@DOMAIN>

Next we need to update that authors file with valid info to use in our git commit history. We'll use this file in the next step, and again later when importing our svn history for the new git repositories.

Website: Clone just the website directory

This is pretty simple, although it can take a while. We are going to use the 'git-svn' command to clone only our website directory and its contents (back out of the svn repo first). As you can see, this step is using the author mapping file we created above. Also, the "--no-metadata" bit means you won't get the svn commit metadata at the bottom of each commit message in git. We're going to call the new git directory projectname_temp.

$ cd ../
$ git svn clone svn+ssh://ourserver.com/path/to/projectname/web/trunk -A mySvnAuthors.txt --no-metadata projectname_temp

Website: Create a .gitignore file

You can either create a .gitignore file manually, or from the svn:ignore properties, as shown below (from within the new git repo). Add and commit this file in the git repo.

$ cd projectname_temp
$ git svn show-ignore > .gitignore
$ git add .gitignore
$ git commit -m"adding our .gitignore file"

Website: Create a new bare git repo

Since we are still going to use a centralized repository model, we need to create a 'bare' repository to put on our server and actually work on/from. The steps below will 1) create a new, empty, bare git repository, 2) add the new repo as a 'remote' named 'bare' in our projectname_temp repo, 3) push the git "stuff" to the new bare repo, 4) rename the 'git-svn' branch to 'master', 5) push our .gitignore commit to the bare repo.

$ cd ../
$ git init --bare projectname.git # (1)
$ cd ../projectname_temp/
$ git remote add bare ../projectname.git # (2)
$ git config remote.bare.push 'refs/remotes/*:refs/heads/*' # (2)
$ git push bare # (3)
$ cd ../projectname.git
$ git branch -av # verbosely list all (local and remote) branches
$ git branch -m git-svn master # (4)
$ cd ../projectname_temp/
$ git pull bare master
$ git push bare master # (5)

You can now remove the projectname_temp directory.

Website: Do a local test clone

Before we put this out on our server, let's do a local test clone.

$ cd ../
$ git clone projectname.git projectname_test_clone

Website: Push to the central server

Assuming the above test looks ok, we can now put the new bare git repo on our centralized server and begin work by cloning from it.

Support: Clone the supporting files

We're going to use 'git-svn' to clone the repo again, this time starting in the project root and ignoring the website-related directory.

$ cd ../
$ git svn clone svn+ssh://ourserver.com/path/to/projectname -A mySvnAuthors.txt --no-metadata --ignore-paths="^web*" projectname.support_temp

Support: Create the new bare git repo

Similar to the steps above, except we add the 'filter-branch' command to remove commits and references to the website files. You can do a quick 'git log --oneline' before and after the filter-branch to see the difference.

$ cd ../
$ git init --bare projectname.support.git
$ cd ../projectname.support_temp/
$ git remote add bare ../projectname.support.git
$ git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch web\*" --prune-empty HEAD
$ git config remote.bare.push 'refs/remotes/*:refs/heads/*'
$ git push bare
$ cd ../projectname.support.git
$ git branch -av
$ git branch -m git-svn master
$ cd ../projectname.support_temp/
$ git pull bare master
$ git push bare master

You can now remove the projectname.support_temp directory.

Support: Do a local test clone

Before we put this out on our server, let's do a local test clone and poke at the logs a bit.

$ cd ../
$ git clone projectname.support.git projectname.support_test_clone
$ git log --decorate

Support: Push to the central server

Assuming the above test looks ok, we can now put the new bare git repo on our centralized server and begin work by cloning from it.

Final Thoughts

What are we "losing"? We previously used svn branches as release tags, which we aren't going to the trouble of keeping. Otherwise, we have the entire history intact, and available in the git repositories. This was actually much smoother than I expected, thanks in large part to the two resources I mentioned above. Once again, the Drupal community has come through with a reasonably elegant solution. We did things manually, but if you have a lot of svn repositories to convert, you might check out John Albin's script and post about automating conversion of 147 svn repos!

Please post a comment if you have questions or clarifications!

@ksenzee it was good to chat a bit at seadug last night. Get in touch any time if you want to coordinate about that Drupal training day 1 day 1 hour ago
Gorgeous fall morning in Seattle 1 day 1 hour ago
RT @slashdot: Pope Francis Declares Evolution and Big Bang Theory Are Right http://t.co/ABw0qLIn9m 1 day 1 hour ago
RT @slashdot: Verizon Launches Tech News Site That Bans Stories On US Spying http://t.co/nBM8xbLs9C 1 day 1 hour ago
RT @slashdot: Windows 10 Gets a Package Manager For the Command Line http://t.co/WVRRHwoyPN 1 day 1 hour ago

I thought I'd have to upgrade in pieces due to financial and time constraints, however I was pleasantly surprised that the project could be done within my budget and time frame and result in a website product that was more than adequate and so vastly improved over my previous version.

- Emily C.