The power of Git subtree

May 18th 2015 Nicola Paolucci in Git, Subtree, Vim

Git subtree allows you to insert any repository as a sub-directory of another one. It is one of several ways Git projects can manage project dependencies. People with good memory will remember I wrote about the usage and the advantages of the command in an earlier piece on Git submodule alternatives.

The basics of Git subtree

Let's review the basics so that you can decide if git subtree is useful for you. Imagine you want to add some external project to your own repository but you do not want to add too much to your daily process and the one of your peers. The subtree command works well in this case.

For example to inject a vim extension in a repository that stores your vim setup you could do:

git subtree add --prefix .vim/bundle/fireplace https://github.com/tpope/vim-fireplace.git master --squash 

This command will squash the entire history of the vim-fireplace project into your folder .vim/bundle/fireplace, recording the SHA-1 of master at the time for future reference. The result of a squashed "git subtree add" is two commits:

commit 8d6089b3faea64e1e31f8d7eb5e1bc82e3876e07
Merge: 96fa982 ce87dab
Author: Bob Marley <bob@mahrleey.com>
Date:   Tue May 12 13:37:03 2015 +0200

    Merge commit 'ce87dab198fecdff6043d88a26c55d7cd95e8bf9' as '.vim/bundle/fireplace'
commit ce87dab198fecdff6043d88a26c55d7cd95e8bf9
Author: Bob Marley <bob@mahrleey.com>
Date:   Tue May 12 13:37:03 2015 +0200

    Squashed '.vim/bundle/fireplace/' content from commit b999b09

    git-subtree-dir: .vim/bundle/fireplace
    git-subtree-split: b999b09cd9d69f359fa5668e81b09dcfde455cca

If after a while you want to update that sub-folder to the latest version of the child repository, you can issue a "subtree pull" with the same parameters:

git subtree pull --prefix .vim/bundle/fireplace https://github.com/tpope/vim-fireplace.git master --squash

That's it for the basic usage. If you want to be more careful and structured you can add or pull only tagged revisions (e.g. v1.0) of your child project. This prevents you from importing code from a master that might not be stable yet.

Note: git-subtree stores sub-project commit ids and not refs in the meta-data. But that's not an issue since given a commit id (sha-1), you can find the symbolic name associated with a commit with a command like ls-remote:

git ls-remote https://github.com/tpope/vim-fireplace.git | grep <sha-1>

Git subtree aliases

If you use subtree commands often, you can shorten and streamline them with a couple of simple aliases in your $HOME/.gitconfig:

[alias]
    # the acronym stands for "subtree add"
    sba = "!f() { git subtree add --prefix $2 $1 master --squash; }; f"
    # the acronym stands for "subtree update"
    sbu = "!f() { git subtree pull --prefix $2 $1 master --squash; }; f"

The alias I use flips the original order of parameters because I like to think of adding a subtree a little bit like a scp command (scp <remote src> <dest>). You use them like this:

git sba <repository uri> <destination folder>

git sba https://bitbucket.org/vim-plugins-mirror/vim-surround.git .vim/bundle/tpope-vim-surround

Under the hood of git subtree

I recently had a look at the implementation of git-subtree and boy is it clever! The first insight - deep I know - is that Git subtree is implemented as shell script and it's nicely readable.

The core technique of the command is the following: git-subtree stores extra meta-data about the code it is importing directly in the commits. For squashed pulls for example it stores these two values in the commit message before the merge:

git-subtree-dir: .vim/bundle/scrooloose-nerdcommenter
git-subtree-split: 0b3d928dce8262dedfc2f83b9aeb59a94e4f0ae4

The "git-subtree-split" field records the commit id (sha-1) of the subproject that has been injected at folder "git-subtree-dir". Simple enough! Using this information the subsequent git subtree pull can retrieve the previous integration point as base for the next squash/merge.

Rebase after a git subtree

How do you rebase a repository with sub-trees mixed in? From what I could derive from this Stack Overflow discussion, there is no silver bullet.

A workable process seems to be just to basically do a manual rebase --interactive and remove the git subtree add commits, rebase --continue and re execute the git subtree add command after the rebase is done.

Hacking on git-subtree

One tiny thing that I found missing from the defaults of the command is that it does not store the URL of the original repository you are adding. I was reminded of this recently as I was trying to update all the vim extensions I track. I forgot all source repository URLs I had previously injected using git subtree add.

Since attending Git Merge 2015 I've been energized to find ways to contribute to the project and so I said to myself: "instead of complaining about this, I can fix it!".

So I've started tweaking the git-subtree.sh script to do something extra.

I changed git subtree add to annotate the squash commit with an extra field git-subtree-repo. So issuing:

git-subtree.sh add --prefix .vim/bundle/fireplace https://github.com/tpope/vim-fireplace.git master --squash

Results in a commit with that extra field:

commit ce87dab198fecdff6043d88a26c55d7cd95e8bf9
Author: Bob Marley <bob@mahrleey.com>
Date:   Tue May 12 13:37:03 2015 +0200

    Squashed '.vim/bundle/fireplace/' content from commit b999b09

    git-subtree-dir: .vim/bundle/fireplace
    git-subtree-split: b999b09cd9d69f359fa5668e81b09dcfde455cca
    git-subtree-repo: https://github.com/tpope/vim-fireplace.git

With this relatively small addition I can now write a new subtree command to list all the folders which have been injected from other repositories:

git subtree list

Which helpfully outputs:

.vim/bundle/fireplace https://github.com/tpope/vim-fireplace.git b999b0

Update 11th March 2016: As the "list" command finds commit ids for subtrees injected into the checked out branch the --resolve flag tries to look up the repositories at git-subtree-repo and retrieve the symbolic refs associated with the commit ids found.Example:

$ git subtree list --resolve
vim-airline  https://repo/bling/vim-airline.git 4fa37e5e[...]
vim-airline  https://repo/bling/vim-airline.git HEAD
vim-airline  https://repo/bling/vim-airline.git refs/heads/master

The above changes and the "list" command implementation have been submitted to the Git mailing list for review and are currently sitting on my Git fork if you want to try them out.

Conclusions

As soon as I have proper and solid tests to this change I'll submit a patch to the core git mailing list and see if they find this addition useful. Hopefully yes! In any case I hope you enjoyed the above knowledge dump and ping me @durdn and @atlassiandev for more Git shenanigans.

 


You might also enjoy our ebook, "Hello World! A new grad's guide to coding as a team" – a collection of essays designed to help new programmers succeed in a team setting. Grab it for yourself, your team, or the new computer science graduate in your life. Even seasoned coders might learn a thing or two.

Read it online now

Click here to download for your Kindle