Merging Two Git Repositories Into One Repository Without Losing File History

A while ago my team had code for our project spread out in two different Git repositories.  Over time we realized that there was no good reason for this arrangement and was just a general hassle and source of friction, so we decided to combine our two repositories into one repository containing both halves of the code base, with each of the old repositories in its own subdirectory.  However, we wanted to preserve all of the change history from each repo and have it available in the new repository.

The good news is that Git makes this sort of thing very easy to do.  Since a repository in Git is just a directed acyclic graph, it’s trivial to glue two graphs together and make one big graph.  The bad news is that there are a few different ways to do it and some of them end up with a less desirable result (at least for our purposes) than others.  For instance, do a web search on this subject and you’ll get a lot of information about git submodules or subtree merges, both of which are kind of complex and are designed for the situation where you’re trying to bring in source code from an external project or library and you want to bring in more changes from that project in the future, or ship your changes back to them.  One side effect of this is that when you import the source code using a subtree merge all of the files show up as newly added files.  You can see the history of commits for those files in aggregate (i.e. you can view the commits in the DAG) but if you try to view the history for a specific file in your sub-project all you’ll get is one commit for that file – the subtree merge.

This is generally not a problem for the “import an external library” scenario but I was trying to do something different.  I wanted to glue to repositories together and have them look as though they had always been one repository all along.  I didn’t need the ability to extract changes and ship them back anywhere because my old repositories would be retired.  Fortunately, after much research and trial-and-error it turned out that it’s actually very easy to do what I was trying to do and it requires just a couple of straightforward git commands.

The basic idea is that we follow these steps:

  1. Create a new empty repository New.
  2. Make an initial commit because we need one before we do a merge.
  3. Add a remote to old repository OldA.
  4. Merge OldA/master to New/master.
  5. Make a subdirectory OldA.
  6. Move all files into subdirectory OldA.
  7. Commit all of the file moves.
  8. Repeat 3-6 for OldB.

A Powershell script for these steps might look like this:

# Assume the current directory is where we want the new repository to be created
# Create the new repository
git init

# Before we do a merge, we have to have an initial commit, so we’ll make a dummy commit
dir > deleteme.txt
git add .
git commit -m “Initial dummy commit”

# Add a remote for and fetch the old repo
git remote add -f old_a <OldA repo URL>

# Merge the files from old_a/master into new/master
git merge old_a/master

# Clean up our dummy file because we don’t need it any more
git rm .\deleteme.txt
git commit -m “Clean up initial file”

# Move the old_a repo files and folders into a subdirectory so they don’t collide with the other repo coming later
mkdir old_a
dir –exclude old_a | %{git mv $_.Name old_a}

# Commit the move
git commit -m “Move old_a files into subdir”

# Do the same thing for old_b
git remote add -f old_b <OldB repo URL>
git merge old_b/master
mkdir old_b
dir –exclude old_a,old_b | %{git mv $_.Name old_b}
git commit -m “Move old_b files into subdir”

Very simple.  Now we have all the files from OldA and OldB in repository New, sitting in separate subdirectories, and we have both the commit history and the individual file history for all files.  (Since we did a rename, you have to do “git log –follow <file>” to see that history, but that’s true for any file rename operation, not just for our repo-merge.)

Obviously you could instead merge old_b into old_a (which becomes the new combined repo) if you’d rather do that – modify the script to suit.

If we have in-progress feature branches in the old repositories that also need to come over to the new repository, that’s also quite easy:

# Bring over a feature branch from one of the old repos
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress

This is the only non-obvious part of the whole operation.  We’re doing a normal recursive merge here (the “-s recursive” part isn’t strictly necessary because that’s the default) but we’re passing an argument to the recursive merge that tells Git that we’ve renamed the target and that helps Git line them up correctly.  This is not the same thing as the thing called a “subtree merge“.

So, if you’re simply trying to merge two repositories together into one repository and make it look like it was that way all along, don’t mess with submodules or subtree merges.  Just do a few regular, normal merges and you’ll have what you want.

Advertisements

The Windows Experience Index for your system could not be computed

I recently had a problem with a Windows 8 computer where I couldn’t run the Windows Experience Index.  I had previously run the WEI just fine on this computer but all of a sudden it started giving me this error message:

image

A cursory search online didn’t yield a solution but I did find out that WEI writes a log file to C:\Windows\Performance\WinSAT\winsat.log.  Looking in that log file showed me this at the end of the log:

338046 (4136) – exe\syspowertools.cpp:0983: > Read the active power scheme as ‘381b4222-f694-41f0-9685-ff5bb260df2e’
338046 (4136) – exe\main.cpp:2925: > power policy saved.
338078 (4136) – exe\syspowertools.cpp:1018: ERROR: Cannot set the current power scheme to ‘8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c’: The instance name passed was not recognized as valid by a WMI data provider.
338078 (4136) – exe\main.cpp:2942: ERROR: Can’t set high power state.
338078 (4136) – exe\processwinsaterror.cpp:0298: Unspecified error 29 occured.
338078 (4136) – exe\processwinsaterror.cpp:0319: Writing exit code, cant msg and why msg to registry
338078 (4136) – exe\syspowertools.cpp:1015: > Set the active power scheme to 381b4222-f694-41f0-9685-ff5bb260df2e’
338078 (4136) – exe\main.cpp:2987: > Power state restored.
338078 (4136) – exe\main.cpp:3002: > Successfully reenabled EMD.
338109 (4136) – exe\watchdog.cpp:0339: Watch dog system shutdown
338109 (4136) – exe\main.cpp:5341: > exit value = 29.

Ah! This computer is used by my entire family and I had been using the Local Group Policy Editor to lock down some settings that I didn’t want people to change, including the power management policy.  Apparently, if users can’t change the power management policy then WEI can’t change it either, and it gets grumpy about that.

The solution was to turn off enforcement of the active power management plan, run WEI (which now worked fine), then re-enable enforcement.

Fixing the error “Unable to launch the IIS Express Web server. Failed to register URL. Access is denied.”

I had a Visual Studio web project that was configured to use IIS Express on port 8080.  That should normally be fine and IIS Express is supposed to be able to run without administrator privileges, but when I would try to run the app I would get this error:

Unable to launch the IIS Express Web server.

Failed to register URL "http://localhost:8080/" for site "MySite" application "/". Error description: Access is denied. (0x80070005).

Launching Visual Studio with administrator credentials would cause it to be able to run the application successfully, but that kind of defeated the purpose of using IIS Express in the first place.

It turns out that the problem in my case was that something else had previously created a URL reservation for port http://localhost:8080/.  I have no idea what did it, but running this command in Powershell showed the culprit:

[C:\Users\Eric] 5/3/2012 2:39 PM
14> netsh http show urlacl | select-string "8080"

    Reserved URL            : http://+:8080/

The solution was to run this command in an elevated shell:

[C:\Users\Eric] 5/3/2012 2:39 PM
2> netsh http delete urlacl
http://+:8080/

URL reservation successfully deleted

Now I can run my web app from VS without elevation.

Fixing “Could not load file or assembly ‘Microsoft.SqlServer.Management.SqlParser’”

It’s incredibly frustrating that this still happens, but it turns out that if you have Visual Studio 2010 and SQL Server Express 2008 (or R2) on your machine, and you uninstall SQL Server Express 2008 and install SQL Server Express 2012 instead, you’ll get an error trying to load database projects in Visual Studio 2010: “Could not load file or assembly ‘Microsoft.SqlServer.Management.SqlParser’”.

Why can’t SQL Server 2012 install the stuff it knows Visual Studio requires?  Fine, whatever.

The fix for this problem is the same as the last time I posted something like this:

  1. Locate your Visual Studio 2010 installation media.
  2. In the \WCU\DAC folder, you’ll find three MSIs: DACFramework_enu.msi, DACProjectSystemSetup_enu.msi, and TSqlLanguageService_enu.msi. Run and install each of them. (Possibly just the third one is required in this case.  I’m not sure.)
  3. Reapply Visual Studio 2010 SP1.

You should be back to a working state.

I Hate #Regions

This is mostly a note to myself so I don’t lose this awesome Visual Extension.

I hate C# #region tags in Visual Studio.  I hate opening a class file and then having to expand all the regions in order to read the code.  If you feel the need to use regions to make your class file more manageable, then it’s quite likely that your class is actually too big and desperately needs to be refactored.  Just . . . no.

But since not everyone is as enlightened as myself (ahem), sometimes I have to work with code that uses lots of #region tags.  In those cases where I can’t delete them all with extreme prejudice, I can at least install the excellent I Hate #Regions Visual Studio extension that will auto-expand regions whenever I open a file and will also display the #region tags in very small type so that they’re not as obnoxious.  Ah, relief!

Eliminating Friction, Part 3: The Build Server

Once we had a build script for Jumala we could do something about setting up a continuous integration build server.

Before the build server got going, we had several chronic problems that sapped time and morale from our team:

  • People would forget to include a new file with a commit or break the build in various other ways. This wouldn’t be discovered until someone else updated from source control and couldn’t build any more. That’s never fun.
  • Because there wasn’t any official build, the process of deploying builds to the live web sites was pretty much having someone build on his/her local machine then manually copy the resulting binaries to the web server. Every once in a while a bad build would go out with personal configuration options or uncommitted code changes that shouldn’t have been made public.  That’s never fun either.
  • As a result of the previous point, we could never tell exactly what build was running on the web servers at any given moment.  The best we could do was inspect timestamps on the files but that didn’t tell us exactly what commits were included in a build.

Needless to say that wasn’t a great experience for our development team. The solution was to set up a continuous integration build server that would automatically sync, build, and report automatically after each commit.

Solving the problem

I downloaded and installed TeamCity from JetBrains.  TeamCity is one of those rare tools that manages to combine both ridiculously awesome ease of use and ridiculously powerful configurability into one package.  I highly recommend it if you’re looking for a CI build server.  JetBrains offers very generous licensing terms for TeamCity which makes it free to use for most small teams.  By the time your team is large enough to need more than what the free license provides, you won’t mind paying for it.

I set up a build configuration for our main branch in SVN, set it to build and run unit tests after every commit, and configured TeamCity to send email to all users on build failures. This way when someone breaks the build we know about it instantly and we know exactly what caused it.  Sure, we still have occasional build breaks but they’re not a big deal any more.

I extended our Rake build script to copy all binaries to a designated output folder in the source tree then configured TeamCity to grab that output folder and store it as the build artifacts.  Now when we want to deploy a build to our web servers, we can go to the TeamCity interface, download the build artifacts for a particular build, and deploy them without fear of random junk polluting them.  No more deploying personal dev builds!

I also set up a build numbering scheme for the TeamCity builds so that every build is stamped with a <major>.<minor>.<TeamCity build counter>.<SVN revision number> scheme.  This lets us unambiguously trace back any build binary to exactly where it came from and what it contains.

Finally, once the TeamCity build was working smoothly I created a template from the main branch build and use that template to quickly define new build configurations for our release branches as we create them. When we create a new release branch I copy the version number information from the main branch build definition to the new release branch build definition, then bump the minor version number and reset the TeamCity build counter on the main branch.  This way all builds from all branches are unambiguous and we know when and where they came from.

A dedicated build server has been considered a best practice for decades now, and a CI build server doing builds on every single commit has been a best practice for a long time as well, but small startup teams can sometimes get so caught up in building the product that they forget to attend to the necessary engineering infrastructure.  Setting up a build server isn’t a very sexy job but it pays huge dividends in reduced friction over time.  These days it’s amazingly easy to do it, too.  If your team doesn’t have a CI build server, you should get one right away.  Once you do you’ll wonder how you ever got along without it.

Fixing “The ‘VSTS for Database Professionals Sql Server Data-tier Application’ package did not load correctly”

Visual Studio 2010 and SQL Server Express have an uneasy alliance, at best.  When you install Visual Studio 2010 it installs SQL Server Express 2008 for you, but only the database engine, not SQL Server Management Studio.  If you mess with SQL Server Express in order to install the management tools, or upgrade to 2008 R2, or install the Advanced Services version, things break and you can no longer reliably use the Visual Studio database projects.

In particular, if you remove SQL Server Express 2008 and install SQL Server Express 2008 R2, you’ll probably run into an issue where if you try to open a schema object in a Visual Studio database project you’ll get an error that says:

The ‘VSTS for Database Professionals Sql Server Data-tier Application’ package did not load correctly.

The fix for this problem can be found here.  Here’s the short version:

  1. Locate your Visual Studio 2010 installation media.
  2. In the \WCU\DAC folder, you’ll find three MSIs: DACFramework_enu.msi, DACProjectSystemSetup_enu.msi, and TSqlLanguageService_enu.msi.  Run and install each of them.
  3. Reapply Visual Studio 2010 SP1.

You should be back to a working state.