As of Jira Software Data Center 7.3 and Jira Service Desk Data Center 3.6 (now known as Jira Service Management), you can now upgrade with zero downtime for your users. We call this ZDU (zero downtime upgrade) for short. This works by essentially allowing the nodes of your cluster to run on different version of Jira simultaneously while you're upgrading. Upgrading with ZDU does not require re-indexing and the state of the index after ZDU should be the same as before ZDU. The page will give you a brief technical overview of how ZDU works.

So what happens to a Jira cluster during ZDU?

ZDU introduces a cluster state which describes what's going on with your cluster in terms of the upgrade process.

Your cluster can be in one of the following states:

Stable: Stable means that the cluster currently functions as normal and there is no upgrade in progress.
Ready to upgrade: This means the cluster is ready for an upgrade. An upgrade is performed by removing a node from the cluster, upgrading it, and then adding it back to the cluster. This state means that your cluster is ready for this to occur, but so far nothing's happened. At this point, an Admin can remove a node from the load balancer, and perform a graceful shutdown of that node to upgrade it, or just 'kill' it. We never recommend that an admin 'kills' a running Jira node, but technically they could, and as developers you should be aware of it.
Mixed: This state means that at least one node in the cluster has a newer version of Jira running on it, and at least one node has the original version running. At this point, the upgrade hasn't been finalized yet, and Jira hasn't run it's upgrade tasks. If required though, Jira has changed the database schema to suit the newer version of Jira.
Ready to run upgrade tasks: This state means that all the nodes in the cluster are now running the new version of Jira, but upgrade hasn't been finalized yet. Things can happen and we give our admins a chance to stop it right now and rollback. An admin need to approve the upgrade for Jira to run upgrade tasks and enable all the newer features.
Running upgrade tasks: This state means that an admin has just approved the upgrade, and one of Jira nodes is applying all necessary changes it needs to make the cluster up-to date and to enable all the new features.

The process:

What happens to a Jira node during ZDU?

Nothing special to be honest. An admin would upgrade the node as you would a regular Jira instance, and then add it back to the cluster. There's a few important things to note though:

A node receives events whenever other node changes cluster state
A node can go down SUDDENLY
A node can be upgraded to newer version of Jira and switch cluster state to MIXED
An upgraded node can be downgraded back to the original version of Jira and possibly switch cluster state back to READY TO UPGRADE if there are no nodes on a newer version
Once an admin approves the upgrade, a node will start running any required upgrade tasks

What happens to plugins during ZDU?

Currently Atlassian strongly discourages admins from updating plugins during ZDU. However we can't stop people from doing this, so Jira freezes all plugins for all nodes running the original version of Jira. This means that even if an admin upgrades a plugin, Jira nodes with the original version will still run the old version of the plugin.

However, new nodes will pick up the upgraded plugin, so we recommend not making any breaking changes to the database schema.

If your plugin needs to be aware of Jira's cluster upgrade state you can use our public APIs:

1
2
com.atlassian.jira.cluster.zdu.ClusterStateManager

A plugin can recieve events when cluster state changes:

1
2
com.atlassian.jira.cluster.zdu.JiraUpgradeStartedEvent
com.atlassian.jira.cluster.zdu.JiraUpgradeCancelledEvent
com.atlassian.jira.cluster.zdu.JiraUpgradeApprovedEvent
com.atlassian.jira.cluster.zdu.JiraUpgradeFinishedEvent

Gotchas

com.atlassian.jira.cluster.zdu.ClusterStateManager#getUpgradeState ensures that Jira hasn't been stuck in one of the upgrades states which can involve cluster-wide locking therefore it is an expensive operation. Try to avoid it. Use events instead.