A better pull request

January 22nd 2015 Tim Pettersen in Git, Bitbucket

If you're using Git, you're probably using pull requests. They've been around in some form or other since the dawn of DVCS. Back before Bitbucket and GitHub built fancy web UIs, a pull request might've simply been an email from Alice asking you to pull some changes from her repo. If that sounded like a good idea, you could run a few commands to pull the changes into your master branch:

$ git remote add alice git://bitbucket.org/alice/bleak.git
$ git checkout master
$ git pull alice master

Of course, randomly pulling Alice's changes into master isn't a great idea. master represents the code that you're intending to ship to your customers, so you typically want to keep a close eye on what gets merged in. Rather than pulling into master, a better pattern is to pull them down into a separate branch and inspect the changes before merging them in:

$ git fetch alice
$ git diff master...alice/master

Using git diff's "triple dot" syntax shows us the changes between the tip of alice/master and its merge base (or common ancestor) with our local master branch. This effectively shows us all of the changes that Alice wants us to pull.

A simple branch git diff master...alice/master is equivalent to git diff A B

At first glance, this seems like a reasonable way to review the changes involved in a pull request. In fact, at time of writing, this appears to be how most git hosting tools have implemented their pull request diffing algorithms.

However there are a couple of problems with using the "triple dot" diff approach to generate a diff for a pull request. In a real project, the master branch is going to significantly diverge from any given feature branch. Other developers will be working on their own branches and merging them in to master. Once master has progressed, a simple git diff from the feature branch tip back to its merge base is no longer adequate to show the real differences between the two branches. You're only seeing the difference between the branch tip, and some older version of master.

Master progresses The "triple-dot" git diff master...alice/master doesn't take into account changes to master

Why is not seeing these changes in the pull request diff a problem? Two reasons.

Merge conflicts

The first problem is something you probably run into fairly regularly: merge conflicts. If you modify a file on your feature branch that has also been modified on master, git diff is still just going to show you the changes that have been made on your feature branch. git merge on the other hand will spit out an error and spew conflict markers all over your working copy, showing that your branches have irreconcilable differences. Or at least differences beyond the capabilities of git's sophisticated merge strategies.

Merge conflict

No-one enjoys resolving merge conflicts, but they're a fact of life for all version control systems. At least, version control systems that don't support file-level locking, which has its own problems.

But merge conflicts are far preferred to the second problem you can run into if you use a "triple dot" git diff for pull requests: a special type of logical conflict that will merge cleanly, but can introduce subtle bugs into your codebase.

Logical conflicts that merge cleanly

If developers modify different parts of the same file on different branches, you might be in for some trouble. In some cases, different changes that work independently and appear to merge happily without conflicts, can actually create a logic bug when combined.

This can happen in a few different ways, but one common way is when two or more developers incidentally notice and fix the same bug on two different branches. Consider the following javascript for calculating the ticket price for an airfare:

// flat fees and taxes
var customsFee          = 5.5;
var immigrationFee      = 7;
var federalTransportTax = .025;

function calculateAirfare(baseFare) {
    var fare = baseFare;                
    fare += immigrationFee;
    fare *= (1 + federalTransportTax);
    return fare;
}

There's a clear bug here - the author has neglected to include the customs fee in the calculation!

Now imagine two different developers, Alice and Bob, each notice this bug and fix it independently on two different branches.

Alice adds the customsFee before the immigrationFee:

function calculateAirfare(baseFare) {
    var fare = baseFare;                
+++ fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
    fare += immigrationFee;
    fare *= (1 + federalTransportTax);
    return fare;
}

And Bob makes a similar fix, but on the line after the immigrationFee:

function calculateAirfare(baseFare) {
    var fare = baseFare;                
    fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
    fare *= (1 + federalTransportTax);
    return fare;
}

Because different lines were modified on each branch, these two branches will both merge cleanly into master, one after the other. However, master will then have both lines. And, a serious bug that will cause customers to be double-charged the customs fee:

function calculateAirfare(baseFare) {
    var fare = baseFare;                
    fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
    fare += immigrationFee;
    fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
    fare *= (1 + federalTransportTax);
    return fare;
}

(This is obviously a contrived example, but duplicated code or logic can cause pretty serious problems: goto fail; anyone?)

Assuming you merged Alice's pull request into master first, here's what Bob's pull request would look like if you used a "triple-dot" git diff from the branch tip to the common ancestor:

function calculateAirfare(baseFare) {
    var fare = baseFare;                
    fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
    fare *= (1 + federalTransportTax);
    return fare;
}

Because you're reviewing a diff against the ancestor, there is no warning of the impending doom that will occur when you hit the merge button.

What you really want to see in a pull request is how master will change when you merge Bob's branch in:

function calculateAirfare(baseFare) {
    var fare = baseFare;                
    fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
    fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
    fare *= (1 + federalTransportTax);
    return fare;
}

This diff clearly shows the problem. A pull request reviewer will spot the duplicated line (hopefully) and let Bob know that the code needs some rework, thus preventing a serious bug from reaching master and eventually production.

This is how we decided to implement pull request diffs in Bitbucket and Stash. When you view a pull request, you're seeing what the resultant merge commit will actually look like. We do this by actually creating a merge commit behind the scenes, and showing you the difference between it and the tip of the target branch:

Merge result git diff C D where D is a merge commit shows all differences between the two branches

If you're curious, I've pushed the same repository to a few different hosting providers so you can see the different diff algorithms in action:

The "merge commit" diff used in Bitbucket and Stash shows the actual changes that will be applied when you merge. The catch is that it's trickier to implement, and more expensive to execute.

Moving targets

The first problem is that the merge commit D doesn't actually exist yet, and creating a merge commit is a relatively expensive process. The second problem is that you can't simply create D and be done with it. B and C, the parents of our merge commit, could change at any time. We call a change to one of these parents rescoping the pull request, because it effectively changes the diff that will be applied when the pull request is merged. If your pull request is targeting a busy branch like master, your pull request is likely being rescoped very frequently.

Rescoping Merge commits are created any time either branch changes.

In fact, every time someone pushes to or merges a branch into master or your feature branch, Bitbucket or Stash is potentially going to need to calculate a new merge in order to show you an accurate diff.

Handling merge conflicts

The other problem with performing merges to generate pull request diffs is that, every now and then, you're going to have to handle a merge conflict. Since your git server is running non-interactively, there won't be anyone around to resolve them. This makes things a bit more complicated, but actually turns out to be an advantage. In Bitbucket and Stash, we actually commit the conflict markers as part of the merge commit D, and then mark them up in the diff to show you how your pull request is conflicting:

Conflicts on Bitbucket In Bitbucket and Stash diffs: green lines are added, red lines are removed, and orange lines are conflicting.

This means we can not only detect ahead of time that your pull request is conflicting, we can also let reviewers discuss how the conflict should be resolved. Since conflicts always involve at least two parties, we feel that the pull request is the best place to determine an appropriate resolution.

Despite the additional complexity and cost, I believe the approach we've taken in Stash and Bitbucket provides the most accurate and useful pull request diff. If you have questions or feedback, please let me know in the comments or on Twitter. If you like, you can follow me (@kannonboy) for occasional updates on Git, Bitbucket, and other neat stuff.