Collating repositories or grafting earlier history with Git

August 5th 2015 Nicola Paolucci in Git

Let's add another arrow to our already full quiver of version control tools and techniques. Do you know that the Linux kernel you clone normally contains only a part of its entire history? If you need access to its uninterrupted evolution since the first commit you have to "graft" a few separate repositories together chronologically. In this post I'd like to show you how it works and why would you want to do that with your projects.

graft git repositories

Why?

There are a few reasons why you might need to collate histories from different repositories. Let me name a few:

  • You are migrating to Git from a different version control system like Subversion. You want your developers to move fast but you also want access to the whole history of your project inside Git. With Grafts your team can start working on a new Git repository (a shallow clone) right away and they can plug the full history later on, once the migration is complete.
  • You have two or more separate repositories that should really be one, and just adding the files and folders from one to the other would lose one of the project's histories.
  • If your job involves performing Subversion merges in a Git branch - may the coding lords have mercy of your soul - you can track integration points using Grafts too.

(You can solidify all the above scenarios later with filter-branch to make the changes permanent).

What are Git grafts?

Git has a local - per-repository - mechanism to change and specify explicitly the parent of existing commits: These are called Grafts. In a repository they live in file ".git/info/grafts" (check the Git repository layout manpage for details).

This feature has been available for a long time in Git: it has the drawback that you have to always setup Grafts locally for each repository. To overcome this problem - and more - a new command is available since version 1.6.5: git replace, which as the name implies is capable to replace any object with any other object. This command has the added benefit to track these swaps via refs which you can be push and pull between repositories.

How is the Linux kernel split?

From the Git Wiki:

When Linus started using Git for maintaining his kernel tree there didn't exist any tools to convert the old kernel history. Later, when the old kernel history was imported into Git from the bkcvs gateway, grafts was created as a method for making it possible to tie the two different repositories together.

To re-assemble the complete kernel history you need these three repositories:

The syntax of the Grafts file in ".git/info/grafts" is simple: each line lists a commit and it's fake parent using the SHA-1 identifiers. So to re-assemble the full history of the Linux kernel add the following grafts to .git/info/grafts:

1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
7a2deb32924142696b8174cdf9b38cd72a11fc96 379a6be1eedb84ae0d476afbc4b4070383681178

With these grafts, you can get a complete and continuous history of the kernel since 0.01. More info on the process here and here.

How to do it yourself?

So now that you have a bit of background let me guide you through the process using a sample repository, so that you can replicate it yourself:

  • Clone the sample repository:
git clone git@bitbucket.org:nicolapaolucci/starwars-summary.git
  • On this sample repository I have setup a couple of branches that represent different remotes covering two parts of the project's history.
  • legacy represents the branch where I stored the earlier history of the project.
  • restarted represents the more recent repository which started as shallow clone and on to which we will collate legacy. The git log of the restarted branch looks like this:
9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci]
08431c2 Add README.md [Nicola Paolucci]
e287529 Add chapter markdown files [Nicola Paolucci]
451b911 Word wrap summary at 80 characters [Nicola Paolucci]
6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci]
a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci]
7795c6a Restarted repository [Nicola Paolucci]
56eacfe (tag: initial) [initial] empty commit [Nicola Paolucci]
  • Let's switch to the legacy branch so that we can find the commit id we need (SHA-1):
git checkout -b legacy origin/legacy
git rev-parse --verify legacy
84abb39d9aab234dfba2e41f13f693fa5edbfe22

The resulting id is the last commit of the legacy branch. Now let's retrieve the first commit of the restarted project in the restarted branch:

git checkout -b restarted origin/restarted
git rev-list master | tail -n 1
56eacfe37267edd674fba5ceb66395891a34f7cc

This id is the first commit in branch restarted. Now we want to "graft" the last commit in legacy to the restarted branch replacing the first commit there:

git replace "56eacfe37267edd674fba5ceb66395891a34f7cc" "84abb39d9aab234dfba2e41f13f693fa5edbfe22"

To verify it worked you can check that folder .git/refs/replace contains the correct graft:

cat .git/refs/replace/56eacfe37267edd674fba5ceb66395891a34f7cc
84abb39d9aab234dfba2e41f13f693fa5edbfe22

And in fact git log now shows the entire collated history:

9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci]
08431c2 Add README.md [Nicola Paolucci]
e287529 Add chapter markdown files [Nicola Paolucci]
451b911 Word wrap summary at 80 characters [Nicola Paolucci]
6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci]
a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci]
7795c6a Restarted repository [Nicola Paolucci]
56eacfe (tag: initial, replaced) Episode III: Revenge of the Sith [Nicola Paolucci]
75b32cf Episode II: Attack of the Clones (2) [Nicola Paolucci]
5aa055b Episode II: Attack of the Clones [Nicola Paolucci]
d10384f Episode I: The Phantom Menace [Nicola Paolucci]
70df805 Outline of the story [Nicola Paolucci]

Because this replacement is stored in a ref, we can push it and share it with the team!

git push origin 'refs/replace/*'

Simply fantastic. Credits for helping me put together the instructions go to this fantastic SO post.

If you're into more Git materials, before I let you go let me suggest a couple further readings:

Conclusions

That's it for now, I hope you found this technique interesting or useful for your projects! In any case if you liked this why not follow me at @durdn or my awesome team at @atlassiandev? (Clip icon credit Thomas Helbig from the Noun Project).