Aliasing authors in Git

June 27th 2017 Ian Buchanan in Git, Bitbucket

So you've got a Git repository where the same person has used multiple emails. Maybe some commits were created with a mistake in the Git configuration. Maybe you used a home email address by mistake. Maybe you are merging with a new company so your old email address isn't valid anymore. However it happened, you want to fix it the Git way. If you don't, then you are losing the portability of the Git repository. In other words, you want the mapping of emails to work just as well on the repository manager, like Bitbucket, as you do with local tools like git log. The Git solution is the .mailmap file.

Why .mailmap?

The .mailmap file is a way to tell Git about user aliases. You might find that Git repository managers, like Bitbucket Cloud and GitHub, allow you to add multiple email addresses to your account. While that may change the web view of which user made commits, it doesn't change the information stored within Git. So if you ever move your repo to another repo manager, you have to setup all those aliases again. Moreover, your local history will still look like commits came from multiple people.

Why not rewrite Git history? This can create a number of problems. Rewriting history, even for just authors, changes the commit hash. This invalidates any signed commits. It also requires everyone with a clone to need to rebase their branches, and maybe reclone. Atlassian generally recommends not changing Git history for commits that have been shared with other people.

The advantage of .mailmap is that it's a built-in capability for Git. Even if not a perfect solution, it avoids the problems of rewriting history, while providing a standard way to signal that different names or email addresses are really the same committer.

How to create a .mailmap

To start, let's check our repository for duplicates with git shortlog. The shortlog command is a special version of git log intended for creating release announcements. This is an easy way to see who's been working on what. In this case, we'll use -s to summarize by author, providing a count of commits, and we'll use -e to show the email of each author. We'll only need to make a fix if we see duplicate names or emails.

git shortlog -se

This is the result for one of Atlassian's repositories:

 1  Adam Saint-Prix [Atlassian] <asaintprix@example.com>
 5  Alex Yakovlev <alyakovlev@example.com>
 2  Brent Plump <bplump@example.com>
 1  Dan Radigan <dan@danradigan.com>
 1  Dan Radigan <dradigan@example.com>
13  David Jenkins <djenkins@example.com>
 2  David Jenkins <djenkins@cli-4902.office.example.com>
 1  EC2 Default User <ec2-user@ip-172-31-43-152.ec2.internal>
 5  Krystian Brazulewicz <kbrazulewicz@example.com>
 1  Krystian Brazulewicz [Atlassian] <kbrazulewicz@example.com>
 2  Marcin Oles <moles@example.com>
 2  Marcin Oleś [Atlassian] <moles@example.com>
 1  Martin Suntinger [Atlassian] <msuntinger@example.com>
 2  Matt Shelton <mshelton@example.com>
 2  Neal RIley <demo@Demonstration-Computer.local>
10  Neal RIley <demo@demonstcomputer.office.example.com>
46  Neal Riley <nriley@example.com>
 2  Neal Riley <nriley@salesengineering.office.example.com>
 5  Norman Ma <nma@example.com>
 1  Pawel Skierczynski <pskierczynski@example.com>
 8  Peter Koczan <pkoczan@example.com>
15  Piotr Święcicki <pswiecicki@example.com>
 1  Tim Wong [Atlassian] <twong@example.com>
 4  dan radigan <dan@danradigan.com>
29  davidglennjenkins <djenkins@example.com>

We can see a number of variations on people's names and email addresses. So, it's worth creating the .mailmap.

Create initial mapping file

Let's use the existing information to create an initial mail mapping file. We'll simply pipe the previous command through a command to strip the leading summary numbers.

git shortlog -se | perl -ple "s/^\s*\d+\t//"  > .mailmap

Which yields a similar list in a file, but without the number of commits per author:

Adam Saint-Prix [Atlassian] <asaintprix@example.com>
Alex Yakovlev <alyakovlev@example.com>
Brent Plump <bplump@example.com>
Dan Radigan <dan@danradigan.com>
Dan Radigan <dradigan@example.com>
David Jenkins <djenkins@example.com>
David Jenkins <djenkins@cli-4902.office.example.com>
EC2 Default User <ec2-user@ip-172-31-43-152.ec2.internal>
Krystian Brazulewicz <kbrazulewicz@example.com>
Krystian Brazulewicz [Atlassian] <kbrazulewicz@example.com>
Marcin Oles <moles@example.com>
Marcin Oleś [Atlassian] <moles@example.com>
Martin Suntinger [Atlassian] <msuntinger@example.com>
Matt Shelton <mshelton@example.com>
Neal RIley <demo@Demonstration-Computer.local>
Neal RIley <demo@demonstcomputer.office.example.com>
Neal Riley <nriley@example.com>
Neal Riley <nriley@salesengineering.office.example.com>
Norman Ma <nma@example.com>
Pawel Skierczynski <pskierczynski@example.com>
Peter Koczan <pkoczan@example.com>
Piotr Święcicki <pswiecicki@example.com>
Tim Wong [Atlassian] <twong@example.com>
dan radigan <dan@radigan.com>
davidglennjenkins <djenkins@example.com>

We'll modify this generated .mailmap file until all the users and emails are aliased the way we want.

Change names

We can remove the [Atlassian] text from peoples' names, by just removing it.

Adam Saint-Prix [Atlassian] <asaintprix@example.com>

So this becomes:

Adam Saint-Prix <asaintprix@example.com>

Effectively, we're telling Git, "Whenever you see the email in angle brackets on the right, use the name you see on the left."

Resolve duplicate names

When the email is the same, we can combine entries to the one we want.

Krystian Brazulewicz <kbrazulewicz@example.com>
Krystian Brazulewicz [Atlassian] <kbrazulewicz@example.com>

So this becomes:

Krystian Brazulewicz <kbrazulewicz@example.com>

Resolve duplicate emails

Where the name is correct, we can map one email to another:

Neal Riley <nriley@example.com>
Neal Riley <nriley@salesengineering.office.example.com>

So this becomes:

Neal Riley <nriley@example.com> <nriley@salesengineering.office.example.com>

Effectively, we're telling Git, "Whenever you see the email in angle brackets on the right, use the name and email you see on the left."

Resolve duplicate name and email

Sometimes we need the email alias and a name change:

Neal RIley <demo@demonstcomputer.office.example.com>
Neal Riley <nriley@example.com>

So this becomes:

Neal Riley <nriley@example.com> Neal RIley <demo@demonstcomputer.office.example.com>

Mulitple duplicate resolutions

And sometimes we need multiple emails and name changes:

Neal RIley <demo@Demonstration-Computer.local>
Neal RIley <demo@demonstcomputer.office.example.com>
Neal Riley <nriley@example.com>
Neal Riley <nriley@salesengineering.office.example.com>

So this becomes:

Neal Riley <nriley@example.com> Neal RIley <demo@Demonstration-Computer.local>
Neal Riley <nriley@example.com> Neal RIley <demo@demonstcomputer.office.example.com>
Neal Riley <nriley@example.com> <nriley@salesengineering.office.example.com>

Putting it all together

After making multiple kinds of changes, our full .mailmap file looks like this:

Adam Saint-Prix <asaintprix@example.com>
Alex Yakovlev <alyakovlev@example.com>
Brent Plump <bplump@example.com>
Dan Radigan <dradigan@example.com> dan radigan <dan@radigan.com>
David Jenkins <djenkins@example.com> <djenkins@cli-4902.office.example.com>
David Jenkins <djenkins@example.com> davidglennjenkins <djenkins@example.com>
David Jenkins <djenkins@example.com> EC2 Default User <ec2-user@ip-172-31-43-152.ec2.internal>
Krystian Brazulewicz <kbrazulewicz@example.com>
Marcin Oleś <moles@example.com>
Martin Suntinger <msuntinger@example.com>
Matt Shelton <mshelton@example.com>
Neal Riley <nriley@example.com> Neal RIley <demo@Demonstration-Computer.local>
Neal Riley <nriley@example.com> Neal RIley <demo@demonstcomputer.office.example.com>
Neal Riley <nriley@example.com> <nriley@salesengineering.office.example.com>
Norman Ma <nma@example.com>
Pawel Skierczynski <pskierczynski@example.com>
Peter Koczan <pkoczan@example.com>
Piotr Święcicki <pswiecicki@example.com>
Tim Wong <twong@example.com>

Now when we use git shortlog -se:

 1  Adam Saint-Prix <asaintprix@example.com>
 5  Alex Yakovlev <alyakovlev@example.com>
 2  Brent Plump <bplump@example.com>
 6  Dan Radigan <dradigan@example.com>
45  David Jenkins <djenkins@example.com>
 6  Krystian Brazulewicz <kbrazulewicz@example.com>
 4  Marcin Oleś <moles@example.com>
 1  Martin Suntinger <msuntinger@example.com>
 2  Matt Shelton <mshelton@example.com>
60  Neal Riley <nriley@example.com>
 5  Norman Ma <nma@example.com>
 1  Pawel Skierczynski <pskierczynski@example.com>
 8  Peter Koczan <pkoczan@example.com>
15  Piotr Święcicki <pswiecicki@example.com>
 1  Tim Wong <twong@example.com>

All of this works similar to the .gitignore file. Namely, we can create this file, edit it, and see the results locally, before we commit or push anything. However, once we're satisfied, we should add, commit, and push the file so that everyone will have consistent behavior.

While this is the Git way to fix the problem, it comes with some wrinkles. While shortlog uses this file automatically, other subcommands don't. For example, if you want to use the full log, you'll have to use the --use-mailmap argument:

git log --use-mailmap

If you found this quick tip useful, you might find more interesting tips on Getting Git Right.