A better pull request
January 22nd 2015
Tim Pettersen
in Git, Bitbucket
If you're using Git, you're probably using pull requests. They've been around in
some form or other since the dawn of DVCS. Back before Bitbucket and GitHub
built fancy web UIs, a pull request might've simply been an email from Alice
asking you to pull some changes from her repo. If that sounded like a good
idea, you could run a few commands to pull the changes into your master
branch:
$ git remote add alice git://bitbucket.org/alice/bleak.git
$ git checkout master
$ git pull alice master
Of course, randomly pulling Alice's changes into master isn't a great idea.
master represents the code that you're intending to ship to your customers, so
you typically want to keep a close eye on what gets merged in. Rather than
pulling into master, a better pattern is to pull them down into a separate
branch and inspect the changes before merging them in:
$ git fetch alice
$ git diff master...alice/master
Using git diff's "triple dot" syntax shows us the changes between the tip of
alice/master and its merge base (or common ancestor) with our local master
branch. This effectively shows us all of the changes that Alice wants us to
pull.
git diff master...alice/master is equivalent to
git diff A B
At first glance, this seems like a reasonable way to review the changes involved in a pull request. In fact, at time of writing, this appears to be how most git hosting tools have implemented their pull request diffing algorithms.
However there are a couple of problems with using the "triple dot" diff approach
to generate a diff for a pull request. In a real project, the master branch is
going to significantly diverge from any given feature branch. Other developers
will be working on their own branches and merging them in to master.
Once master has progressed, a simple git diff from the feature branch tip
back to its merge base is no longer adequate to show the real differences
between the two branches. You're only seeing the difference between the branch
tip, and some older version of master.
The "triple-dot" git diff master...alice/master
doesn't take into account changes to master
Why is not seeing these changes in the pull request diff a problem? Two reasons.
Merge conflicts
The first problem is something you probably run into fairly regularly:
merge conflicts. If you modify a file on your feature branch that has also
been modified on master, git diff is still just going to show you the
changes that have been made on your feature branch. git merge on the other
hand will spit out an error and spew conflict markers all over your working
copy, showing that your branches have irreconcilable differences. Or at least
differences beyond the capabilities of git's sophisticated merge strategies.
No-one enjoys resolving merge conflicts, but they're a fact of life for all version control systems. At least, version control systems that don't support file-level locking, which has its own problems.
But merge conflicts are far preferred to the second problem you can run into if
you use a "triple dot" git diff for pull requests: a special type of logical
conflict that will merge cleanly, but can introduce subtle bugs into your
codebase.
Logical conflicts that merge cleanly
If developers modify different parts of the same file on different branches, you might be in for some trouble. In some cases, different changes that work independently and appear to merge happily without conflicts, can actually create a logic bug when combined.
This can happen in a few different ways, but one common way is when two or more developers incidentally notice and fix the same bug on two different branches. Consider the following javascript for calculating the ticket price for an airfare:
// flat fees and taxes
var customsFee = 5.5;
var immigrationFee = 7;
var federalTransportTax = .025;
function calculateAirfare(baseFare) {
var fare = baseFare;
fare += immigrationFee;
fare *= (1 + federalTransportTax);
return fare;
}
There's a clear bug here - the author has neglected to include the customs fee in the calculation!
Now imagine two different developers, Alice and Bob, each notice this bug and fix it independently on two different branches.
Alice adds the customsFee before the immigrationFee:
function calculateAirfare(baseFare) {
var fare = baseFare;
+++ fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
fare += immigrationFee;
fare *= (1 + federalTransportTax);
return fare;
}
And Bob makes a similar fix, but on the line after the immigrationFee:
function calculateAirfare(baseFare) {
var fare = baseFare;
fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
fare *= (1 + federalTransportTax);
return fare;
}
Because different lines were modified on each branch, these two branches will
both merge cleanly into master, one after the other. However, master
will then have both lines. And, a serious bug that will cause customers to be
double-charged the customs fee:
function calculateAirfare(baseFare) {
var fare = baseFare;
fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
fare += immigrationFee;
fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
fare *= (1 + federalTransportTax);
return fare;
}
(This is obviously a contrived example, but duplicated code or logic can cause
pretty serious problems: goto fail; anyone?)
Assuming you merged Alice's pull request into master first, here's what
Bob's pull request would look like if you used a "triple-dot" git diff from
the branch tip to the common ancestor:
function calculateAirfare(baseFare) {
var fare = baseFare;
fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
fare *= (1 + federalTransportTax);
return fare;
}
Because you're reviewing a diff against the ancestor, there is no warning of the impending doom that will occur when you hit the merge button.
What you really want to see in a pull request is how master will change when
you merge Bob's branch in:
function calculateAirfare(baseFare) {
var fare = baseFare;
fare += customsFee; // Fixed it! Phew. Glad we didn't ship that! - Alice
fare += immigrationFee;
+++ fare += customsFee; // Fixed it! Gee, lucky I caught that one. - Bob
fare *= (1 + federalTransportTax);
return fare;
}
This diff clearly shows the problem. A pull request reviewer will spot the
duplicated line (hopefully) and let Bob know that the code needs some rework,
thus preventing a serious bug from reaching master and eventually production.
This is how we decided to implement pull request diffs in Bitbucket and Stash. When you view a pull request, you're seeing what the resultant merge commit will actually look like. We do this by actually creating a merge commit behind the scenes, and showing you the difference between it and the tip of the target branch:
git diff C D where D is
a merge commit shows all differences between the two branches
If you're curious, I've pushed the same repository to a few different hosting providers so you can see the different diff algorithms in action:
- a pull request with GitHub's "triple-dot" diff
- a pull request with Bitbucket's "merge commit" diff
- a pull request with GitLab's "triple-dot" diff
The "merge commit" diff used in Bitbucket and Stash shows the actual changes that will be applied when you merge. The catch is that it's trickier to implement, and more expensive to execute.
Moving targets
The first problem is that the merge commit D doesn't actually exist yet, and
creating a merge commit is a relatively expensive process. The second problem is
that you can't simply create D and be done with it. B and C, the parents
of our merge commit, could change at any time. We call a change to one of these
parents rescoping the pull request, because it effectively changes the diff
that will be applied when the pull request is merged. If your pull request is
targeting a busy branch like master, your pull request is likely being
rescoped very frequently.
Merge commits are created any time either branch changes.
In fact, every time someone pushes to or merges a branch into master or
your feature branch, Bitbucket or Stash is potentially going to need to
calculate a new merge in order to show you an accurate diff.
Handling merge conflicts
The other problem with performing merges to generate pull request diffs is that,
every now and then, you're going to have to handle a merge conflict. Since your
git server is running non-interactively, there won't be anyone around to resolve
them. This makes things a bit more complicated, but actually turns out to be an
advantage. In Bitbucket and Stash, we actually commit the conflict markers as
part of the merge commit D, and then mark them up in the diff to show you how
your pull request is conflicting:
In Bitbucket and Stash diffs:
green lines are added,
red lines are removed, and
orange lines are conflicting.
This means we can not only detect ahead of time that your pull request is conflicting, we can also let reviewers discuss how the conflict should be resolved. Since conflicts always involve at least two parties, we feel that the pull request is the best place to determine an appropriate resolution.
Despite the additional complexity and cost, I believe the approach we've taken in Stash and Bitbucket provides the most accurate and useful pull request diff. If you have questions or feedback, please let me know in the comments or on Twitter. If you like, you can follow me (@kannonboy) for occasional updates on Git, Bitbucket, and other neat stuff.
