Home > Uncategorized > Merging Two Git Repositories Into One Repository Without Losing File History

Merging Two Git Repositories Into One Repository Without Losing File History

January 22, 2013 Leave a comment Go to comments

A while ago my team had code for our project spread out in two different Git repositories.  Over time we realized that there was no good reason for this arrangement and was just a general hassle and source of friction, so we decided to combine our two repositories into one repository containing both halves of the code base, with each of the old repositories in its own subdirectory.  However, we wanted to preserve all of the change history from each repo and have it available in the new repository.

The good news is that Git makes this sort of thing very easy to do.  Since a repository in Git is just a directed acyclic graph, it’s trivial to glue two graphs together and make one big graph.  The bad news is that there are a few different ways to do it and some of them end up with a less desirable result (at least for our purposes) than others.  For instance, do a web search on this subject and you’ll get a lot of information about git submodules or subtree merges, both of which are kind of complex and are designed for the situation where you’re trying to bring in source code from an external project or library and you want to bring in more changes from that project in the future, or ship your changes back to them.  One side effect of this is that when you import the source code using a subtree merge all of the files show up as newly added files.  You can see the history of commits for those files in aggregate (i.e. you can view the commits in the DAG) but if you try to view the history for a specific file in your sub-project all you’ll get is one commit for that file – the subtree merge.

This is generally not a problem for the “import an external library” scenario but I was trying to do something different.  I wanted to glue to repositories together and have them look as though they had always been one repository all along.  I didn’t need the ability to extract changes and ship them back anywhere because my old repositories would be retired.  Fortunately, after much research and trial-and-error it turned out that it’s actually very easy to do what I was trying to do and it requires just a couple of straightforward git commands.

The basic idea is that we follow these steps:

  1. Create a new empty repository New.
  2. Make an initial commit because we need one before we do a merge.
  3. Add a remote to old repository OldA.
  4. Merge OldA/master to New/master.
  5. Make a subdirectory OldA.
  6. Move all files into subdirectory OldA.
  7. Commit all of the file moves.
  8. Repeat 3-6 for OldB.

A Powershell script for these steps might look like this:

# Assume the current directory is where we want the new repository to be created
# Create the new repository
git init

# Before we do a merge, we have to have an initial commit, so we’ll make a dummy commit
dir > deleteme.txt
git add .
git commit -m “Initial dummy commit”

# Add a remote for and fetch the old repo
git remote add -f old_a <OldA repo URL>

# Merge the files from old_a/master into new/master
git merge old_a/master

# Clean up our dummy file because we don’t need it any more
git rm .\deleteme.txt
git commit -m “Clean up initial file”

# Move the old_a repo files and folders into a subdirectory so they don’t collide with the other repo coming later
mkdir old_a
dir –exclude old_a | %{git mv $_.Name old_a}

# Commit the move
git commit -m “Move old_a files into subdir”

# Do the same thing for old_b
git remote add -f old_b <OldB repo URL>
git merge old_b/master
mkdir old_b
dir –exclude old_a,old_b | %{git mv $_.Name old_b}
git commit -m “Move old_b files into subdir”

Very simple.  Now we have all the files from OldA and OldB in repository New, sitting in separate subdirectories, and we have both the commit history and the individual file history for all files.  (Since we did a rename, you have to do “git log –follow <file>” to see that history, but that’s true for any file rename operation, not just for our repo-merge.)

Obviously you could instead merge old_b into old_a (which becomes the new combined repo) if you’d rather do that – modify the script to suit.

If we have in-progress feature branches in the old repositories that also need to come over to the new repository, that’s also quite easy:

# Bring over a feature branch from one of the old repos
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress

This is the only non-obvious part of the whole operation.  We’re doing a normal recursive merge here (the “-s recursive” part isn’t strictly necessary because that’s the default) but we’re passing an argument to the recursive merge that tells Git that we’ve renamed the target and that helps Git line them up correctly.  This is not the same thing as the thing called a “subtree merge“.

So, if you’re simply trying to merge two repositories together into one repository and make it look like it was that way all along, don’t mess with submodules or subtree merges.  Just do a few regular, normal merges and you’ll have what you want.

About these ads
Categories: Uncategorized
  1. Donal
    February 15, 2013 at 1:36 am

    Hi – Thank you for this. I am new to git. I am testing this as a method to use to move SNV repo into a GIT repo. I have applied the steps in this page. After the commit i can view differences between files’ commits using gitk, However TortoiseGit “diff with previous” doe not work on the same file. It tells me “Could not determine the last commited revision” . Why does this no longer work? Have I lost history during the merge?

  2. June 20, 2013 at 2:38 pm

    This solutions is perfect, just merged two fat repos, not any problem !
    Done that in linux. Just replaced your power shell move by a mv * oldA.

    Thanks to share !

  3. Nilesh
    October 8, 2013 at 10:40 am

    This is a life saver. Out of all those links on this problem out there, this worked for me! Most of the others are linus based commands. I wanted the Windows. This one suits perfect.

  4. October 16, 2013 at 4:34 am

    I would recommend doing the move on the old repository before merging it into the new repo. Then you can simply merge the repos without subtree.

  5. Jorge
    February 16, 2014 at 11:55 am

    Awesome, and very helpful! Thanks!
    Note that you don’t really need to create another new repo, i.e., if you want to merge repo_small into repo_big, you can just add repo_small as a remote for repo_big, fetch it, and then merge. It helps if you first restructure (and commit) the files in repo_small to where you want them to be in repo_big.

  6. Marc Towersap
    April 16, 2014 at 3:31 pm

    questions though. Lets say I want to merge repo’s A and B into new repo AB. I do a git init –bare AB.git, then clone AB, then do these remote/merge/move/commits for A and B per your instruction. I can push origin master, and push this all to the new AB shared repo. But the tags and remote branches stay behind. Questions I have are:
    1) If A and B has the tag REL2.2 in it. post-consolidation, I have a singe REL2.2.
    I need to bugfix REL2.2. If I do a simple git checkout -b fixbug REL2.2, all the moves disappear and mkdirs disappear, and I’m left with one of the repos I remoted in.

    Is that to be expected? Because I thought a tag was attached to a commit, and if post-merge, that tag exists across both A and B, creating a branch off a tag would shift me back, and I’m not sure if the REL2.2 tag is for A or B, but it’s definitely not both.

    As for remote branches, if I do a git branch -a, I’d see
    * master
    remotes/A/branch1
    remotes/A/branch2
    remotes/A/master
    remote/B/branch1
    remotes/B/mybranch
    remotes/B/master
    remotes/origin/master

    Creating a tracking branch off any of those remote branches would also send me back as if only A (if creating tracking branch branch2), B (if mybranch), or if checking out branch1, it fails on pathspec.

    If it truly behaved as if it was always one combined repo, I would be able to check out any of those branches and it would work.

    Would I need to create a local branch1 and merge each branch in like I did master, for all remote branches that are non-origin?

    And back to tags, uh, how would that work since a typical tag is on a particular commit?

  7. Marc Towersap
    April 16, 2014 at 4:13 pm

    And as for the creating a local branch then merging each branch, would taht really work, since creating a local branch is likely initially off master’s HEAD, and would result in one or more merge conflict since it’s likely those branches are for older releases. I tried it, and I had to manually fix the merge conflict, which can be a pain depending on how many files are conflicting, and I am not sure it’s going to pick the right versions of each file/directory. I suppose I can go back and loop (something like find . -type f | xargs diff in subdirectory vs original repo (assuming a tracking branch was created in original repo). Probably want to diff on directories too.

  8. Marc Towersap
    April 21, 2014 at 8:26 am

    Also, I get merge conflicts in merging repo_b/master with common files. For example, if both repo_A and repo_b contain .gitignore and pom.xml at root level (repo_A/.gitignore, repo_B/.gitignore, repo_A/pom.xml, repo_B/pom.xml).

    I pull in repo_a first. Obviously, there is nothing to conflict with when merging repo_a/master to master.

    Your process is:
    init
    add deleteme.txt file
    commit (to create master branch, no problem)
    git remote (importing in repo_A)
    git merge repo_A/master (repo_A’s pom.xml and .gitignore are in root
    remove deleteme.txt file
    commit
    then you mkdir/move/commit.

    Then, when merging repo_B/master, it conflicts over .gitignore and pom.xml because it was repo_A’s files, and now it’s conflicting with the merged repo_B’s .gitignore and pom.xml

  1. March 31, 2014 at 3:53 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: