Merging Two Git Repositories Into One Repository Without Losing File History

A while ago my team had code for our project spread out in two different Git repositories.  Over time we realized that there was no good reason for this arrangement and was just a general hassle and source of friction, so we decided to combine our two repositories into one repository containing both halves of the code base, with each of the old repositories in its own subdirectory.  However, we wanted to preserve all of the change history from each repo and have it available in the new repository.

The good news is that Git makes this sort of thing very easy to do.  Since a repository in Git is just a directed acyclic graph, it’s trivial to glue two graphs together and make one big graph.  The bad news is that there are a few different ways to do it and some of them end up with a less desirable result (at least for our purposes) than others.  For instance, do a web search on this subject and you’ll get a lot of information about git submodules or subtree merges, both of which are kind of complex and are designed for the situation where you’re trying to bring in source code from an external project or library and you want to bring in more changes from that project in the future, or ship your changes back to them.  One side effect of this is that when you import the source code using a subtree merge all of the files show up as newly added files.  You can see the history of commits for those files in aggregate (i.e. you can view the commits in the DAG) but if you try to view the history for a specific file in your sub-project all you’ll get is one commit for that file – the subtree merge.

This is generally not a problem for the “import an external library” scenario but I was trying to do something different.  I wanted to glue to repositories together and have them look as though they had always been one repository all along.  I didn’t need the ability to extract changes and ship them back anywhere because my old repositories would be retired.  Fortunately, after much research and trial-and-error it turned out that it’s actually very easy to do what I was trying to do and it requires just a couple of straightforward git commands.

The basic idea is that we follow these steps:

  1. Create a new empty repository New.
  2. Make an initial commit because we need one before we do a merge.
  3. Add a remote to old repository OldA.
  4. Merge OldA/master to New/master.
  5. Make a subdirectory OldA.
  6. Move all files into subdirectory OldA.
  7. Commit all of the file moves.
  8. Repeat 3-6 for OldB.

A Powershell script for these steps might look like this:

# Assume the current directory is where we want the new repository to be created
# Create the new repository
git init

# Before we do a merge, we have to have an initial commit, so we’ll make a dummy commit
dir > deleteme.txt
git add .
git commit -m “Initial dummy commit”

# Add a remote for and fetch the old repo
git remote add -f old_a <OldA repo URL>

# Merge the files from old_a/master into new/master
git merge old_a/master

# Clean up our dummy file because we don’t need it any more
git rm .\deleteme.txt
git commit -m “Clean up initial file”

# Move the old_a repo files and folders into a subdirectory so they don’t collide with the other repo coming later
mkdir old_a
dir –exclude old_a | %{git mv $_.Name old_a}

# Commit the move
git commit -m “Move old_a files into subdir”

# Do the same thing for old_b
git remote add -f old_b <OldB repo URL>
git merge old_b/master
mkdir old_b
dir –exclude old_a,old_b | %{git mv $_.Name old_b}
git commit -m “Move old_b files into subdir”

Very simple.  Now we have all the files from OldA and OldB in repository New, sitting in separate subdirectories, and we have both the commit history and the individual file history for all files.  (Since we did a rename, you have to do “git log –follow <file>” to see that history, but that’s true for any file rename operation, not just for our repo-merge.)

Obviously you could instead merge old_b into old_a (which becomes the new combined repo) if you’d rather do that – modify the script to suit.

If we have in-progress feature branches in the old repositories that also need to come over to the new repository, that’s also quite easy:

# Bring over a feature branch from one of the old repos
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress

This is the only non-obvious part of the whole operation.  We’re doing a normal recursive merge here (the “-s recursive” part isn’t strictly necessary because that’s the default) but we’re passing an argument to the recursive merge that tells Git that we’ve renamed the target and that helps Git line them up correctly.  This is not the same thing as the thing called a “subtree merge“.

So, if you’re simply trying to merge two repositories together into one repository and make it look like it was that way all along, don’t mess with submodules or subtree merges.  Just do a few regular, normal merges and you’ll have what you want.


41 thoughts on “Merging Two Git Repositories Into One Repository Without Losing File History”

  1. Hi – Thank you for this. I am new to git. I am testing this as a method to use to move SNV repo into a GIT repo. I have applied the steps in this page. After the commit i can view differences between files’ commits using gitk, However TortoiseGit “diff with previous” doe not work on the same file. It tells me “Could not determine the last commited revision” . Why does this no longer work? Have I lost history during the merge?

  2. This is a life saver. Out of all those links on this problem out there, this worked for me! Most of the others are linus based commands. I wanted the Windows. This one suits perfect.

  3. Awesome, and very helpful! Thanks!
    Note that you don’t really need to create another new repo, i.e., if you want to merge repo_small into repo_big, you can just add repo_small as a remote for repo_big, fetch it, and then merge. It helps if you first restructure (and commit) the files in repo_small to where you want them to be in repo_big.

  4. questions though. Lets say I want to merge repo’s A and B into new repo AB. I do a git init –bare AB.git, then clone AB, then do these remote/merge/move/commits for A and B per your instruction. I can push origin master, and push this all to the new AB shared repo. But the tags and remote branches stay behind. Questions I have are:
    1) If A and B has the tag REL2.2 in it. post-consolidation, I have a singe REL2.2.
    I need to bugfix REL2.2. If I do a simple git checkout -b fixbug REL2.2, all the moves disappear and mkdirs disappear, and I’m left with one of the repos I remoted in.

    Is that to be expected? Because I thought a tag was attached to a commit, and if post-merge, that tag exists across both A and B, creating a branch off a tag would shift me back, and I’m not sure if the REL2.2 tag is for A or B, but it’s definitely not both.

    As for remote branches, if I do a git branch -a, I’d see
    * master

    Creating a tracking branch off any of those remote branches would also send me back as if only A (if creating tracking branch branch2), B (if mybranch), or if checking out branch1, it fails on pathspec.

    If it truly behaved as if it was always one combined repo, I would be able to check out any of those branches and it would work.

    Would I need to create a local branch1 and merge each branch in like I did master, for all remote branches that are non-origin?

    And back to tags, uh, how would that work since a typical tag is on a particular commit?

  5. And as for the creating a local branch then merging each branch, would taht really work, since creating a local branch is likely initially off master’s HEAD, and would result in one or more merge conflict since it’s likely those branches are for older releases. I tried it, and I had to manually fix the merge conflict, which can be a pain depending on how many files are conflicting, and I am not sure it’s going to pick the right versions of each file/directory. I suppose I can go back and loop (something like find . -type f | xargs diff in subdirectory vs original repo (assuming a tracking branch was created in original repo). Probably want to diff on directories too.

  6. Also, I get merge conflicts in merging repo_b/master with common files. For example, if both repo_A and repo_b contain .gitignore and pom.xml at root level (repo_A/.gitignore, repo_B/.gitignore, repo_A/pom.xml, repo_B/pom.xml).

    I pull in repo_a first. Obviously, there is nothing to conflict with when merging repo_a/master to master.

    Your process is:
    add deleteme.txt file
    commit (to create master branch, no problem)
    git remote (importing in repo_A)
    git merge repo_A/master (repo_A’s pom.xml and .gitignore are in root
    remove deleteme.txt file
    then you mkdir/move/commit.

    Then, when merging repo_B/master, it conflicts over .gitignore and pom.xml because it was repo_A’s files, and now it’s conflicting with the merged repo_B’s .gitignore and pom.xml

  7. Should there be a space between ‘-X’ and ‘subtree=old_a’ in ‘git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress’? Having trouble with this one…

  8. Hi, I have been fighting with this issue as i have had a couple of projects which need merging into a single repository. I have now managed to create a script thanks to your ideas and others which handles branches and tags of the old repositories.

    Should it interest anyone, then here it is:

      1. Create a file with the list of URLs of the projects you want to merge. e.g.:

        Then call the script giving two arguments: first the name of the new repository to create, then the path to the file containing the URLs:

        ./ my_new_repo repos.list

    1. You are awsome eitch thanks for sharing 🙂 mukul you create a file a name it whatever you want put repos urls one by line and then excute like this: ./ newRepo myrepos.lst

      1. Glad to be of help – who hasn’t had to merge repos and wanted to keep the history, incl. tags and any current branches?

    2. Using eitch’s script with Git 2.11:
      -change in commit_merge() function (now outputs “working tree clean”)
      CHANGES=$(git status | grep “working directory clean”)
      MERGING=$(git status | grep “merging”)
      if [[ “$CHANGES” != “” ]] && [[ “$MERGING” == “” ]] ; then
      CHANGES=$(git status –porcelain)
      if [[ “$CHANGES” = “” ]] ; then

      -when merging branches add to the git merge (as it is no longer the default)

      -If you’re using git bash on Windows, before running make sure the list of repositories has unix line endings:

  9. I works find until we merge back some changes from original `old_a` with `git merge -Xsubtree=old_a`
    The git log result looks weird for single file

    If I use a normal log command, it shows tow commits, one is for added while ‘move to old_a’, and the other is for bulk changes from ‘merge branch … -Xsubtree=old_a’, detail changes are lost, maybe acceptible

    If I use `git log –follow old_a/path/to/any/single/file`, the history after the move commit are totally lost, looks like that file is never changed after the -Xsubtree merge, very strange behavior

    BTW, `git blame` works well in this situation

  10. So I followed the steps, and the commit history is the repository. However, when I try to show log for certain file, all history is wiped, I only see the commit of moving the files into the sub folder. Any help?

  11. This doesn’t merge the repos seamlessly. Your workflow might be broken in a fundamental way, because git log –follow won’t work with folders, but only with files. If you go to old_a and type git log –follow . or gitk –follow . it doesn’t return old_a’s commits. This is likely a current limitation of git log.

  12. Here is how to push changes upstream. It also works for other subtree merge strategies:

    git fetch old_a
    # use an integration branch, to be on the safe side
    git checkout -b old_a-integration old_a/master
    # merge changes from master using subtree strategy
    git cherry-pick -x –strategy=subtree -Xsubtree=old_a/ master

    # Or, if you organized your old_a commits directly
    # into eg old_a-integration, just rebase:

    git branch feature-reb feature-branch
    # new branch needed to rebase without losing the commits in master
    git rebase -s subtree -Xsubtree=old_a –onto old_a-integration master feature-reb
    git checkout -b old_a-integration old_a/master
    git merge feature-reb
    git push old_a HEAD:master
    git checkout master
    git branch -D feature-reb
    git branch -D old_a-integration # I actually keep this one and even push it

    Rebase is nice, you can do it interactively which means that you have a chance to edit the commits which are rebased (inserting move-related info such as the
    origin sha1 etc). You can reorder the commits, and you can
    remove them (weeding out bad or otherwise unwanted patches).

  13. I’m stuck at
    dir –exclude old_a | %{git mv $_.Name old_a}

    I’m using zsh on Mac and keep getting this error
    zsh: parse error near `}’

    Any Mac devs reading that can help?

  14. I’m also stuck at
    dir –exclude old_a | %{git mv $_.Name old_a}
    what shell is this? Is it some Windows cmd magic (not bash that I can see!!)

  15. Thanks for that – I’ll have to think about it as I don’t fancy running a script that contains lines like “INFO: Removing remote”. I’ll want to test drive it all locally before I let it loose on the real world!

    1. I understand. But the script only does local changes. The removal of remotes is local, so that the resulting merged repository does not point to the original repositories anymore.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s