Let's add another arrow to our already full quiver of version control tools and techniques. Do you know that the Linux kernel you clone normally contains only a part of its entire history? If you need access to its uninterrupted evolution since the first commit you have to "graft" a few separate repositories together chronologically. In this post I'd like to show you how it works and why would you want to do that with your projects.
There are a few reasons why you might need to collate histories from different repositories. Let me name a few:
- You are migrating to Git from a different version control system like Subversion. You want your developers to move fast but you also want access to the whole history of your project inside Git. With Grafts your team can start working on a new Git repository (a shallow clone) right away and they can plug the full history later on, once the migration is complete.
- You have two or more separate repositories that should really be one, and just adding the files and folders from one to the other would lose one of the project's histories.
- If your job involves performing Subversion merges in a Git branch - may the coding lords have mercy of your soul - you can track integration points using Grafts too.
(You can solidify all the above scenarios later with
filter-branch to make
the changes permanent).
What are Git grafts?
Git has a local - per-repository - mechanism to change and specify explicitly
the parent of existing commits: These are called Grafts. In a repository they
live in file "
.git/info/grafts" (check the Git repository layout manpage for
This feature has been available for a long time in Git: it has the drawback
that you have to always setup Grafts locally for each repository. To overcome
this problem - and more - a new command is available since version
git replace, which as the name implies is capable to replace any object
with any other object. This command has the added benefit to track these swaps
refs which you can be push and pull between repositories.
How is the Linux kernel split?
From the Git Wiki:
When Linus started using Git for maintaining his kernel tree there didn't exist any tools to convert the old kernel history. Later, when the old kernel history was imported into Git from the bkcvs gateway, grafts was created as a method for making it possible to tie the two different repositories together.
To re-assemble the complete kernel history you need these three repositories:
The syntax of the Grafts file in "
.git/info/grafts" is simple: each line
lists a commit and it's fake parent using the
SHA-1 identifiers. So to
re-assemble the full history of the Linux kernel add the following grafts to
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 e7e173af42dbf37b1d946f9ee00219cb3b2bea6a 7a2deb32924142696b8174cdf9b38cd72a11fc96 379a6be1eedb84ae0d476afbc4b4070383681178
How to do it yourself?
So now that you have a bit of background let me guide you through the process using a sample repository, so that you can replicate it yourself:
- Clone the sample repository:
git clone firstname.lastname@example.org:nicolapaolucci/starwars-summary.git
- On this sample repository I have setup a couple of branches that represent different remotes covering two parts of the project's history.
legacyrepresents the branch where I stored the earlier history of the project.
restartedrepresents the more recent repository which started as shallow clone and on to which we will collate
git logof the
restartedbranch looks like this:
9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci] 08431c2 Add README.md [Nicola Paolucci] e287529 Add chapter markdown files [Nicola Paolucci] 451b911 Word wrap summary at 80 characters [Nicola Paolucci] 6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci] a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci] 7795c6a Restarted repository [Nicola Paolucci] 56eacfe (tag: initial) [initial] empty commit [Nicola Paolucci]
- Let's switch to the
legacybranch so that we can find the commit id we need (
git checkout -b legacy origin/legacy git rev-parse --verify legacy 84abb39d9aab234dfba2e41f13f693fa5edbfe22
The resulting id is the last commit of the
legacy branch. Now let's retrieve
the first commit of the restarted project in the
git checkout -b restarted origin/restarted git rev-list master | tail -n 1 56eacfe37267edd674fba5ceb66395891a34f7cc
This id is the first commit in branch
restarted. Now we want to "graft" the
last commit in
legacy to the
restarted branch replacing the first commit
git replace "56eacfe37267edd674fba5ceb66395891a34f7cc" "84abb39d9aab234dfba2e41f13f693fa5edbfe22"
To verify it worked you can check that folder
.git/refs/replace contains the correct graft:
cat .git/refs/replace/56eacfe37267edd674fba5ceb66395891a34f7cc 84abb39d9aab234dfba2e41f13f693fa5edbfe22
And in fact
git log now shows the entire collated history:
9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci] 08431c2 Add README.md [Nicola Paolucci] e287529 Add chapter markdown files [Nicola Paolucci] 451b911 Word wrap summary at 80 characters [Nicola Paolucci] 6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci] a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci] 7795c6a Restarted repository [Nicola Paolucci] 56eacfe (tag: initial, replaced) Episode III: Revenge of the Sith [Nicola Paolucci] 75b32cf Episode II: Attack of the Clones (2) [Nicola Paolucci] 5aa055b Episode II: Attack of the Clones [Nicola Paolucci] d10384f Episode I: The Phantom Menace [Nicola Paolucci] 70df805 Outline of the story [Nicola Paolucci]
Because this replacement is stored in a
ref, we can push it and share it
with the team!
git push origin 'refs/replace/*'
Simply fantastic. Credits for helping me put together the instructions go to this fantastic SO post.
Related Git articles
If you're into more Git materials, before I let you go let me suggest a couple further readings:
That's it for now, I hope you found this technique interesting or useful for your projects! In any case if you liked this why not follow me at @durdn or my awesome team at @atlassiandev? (Clip icon credit Thomas Helbig from the Noun Project).