Last updatedApr 24, 2019

How do I ensure my app works properly in a cluster?

Clustering in Confluence should work transparently for developers. However, there are a few things to be aware of in more advanced apps (also known as plugins or add-ons).

Before you begin, head to Developing apps for Atlassian Data Center products for an overview of the requirements you'll need to meet to get your app approved for Data Center.

Installing an app in a cluster

Installing an app in a cluster is the same as in a single node instance. Uploading an app through the web interface will store the app in the PLUGINDATA table in the database, and ensure that it's loaded on all nodes of the cluster.

Cluster instances must be homogeneous, so you can assume the same performance characteristics and version of Confluence running on each instance.

Testing your app in a cluster

It's important to test your app in a cluster to make sure it works properly. Setting up a cluster with Confluence is as easy as setting up two new instances on the same machine with a cluster license - it shouldn't take more than ten minutes to test your add-on manually.

If you need access to a cluster license for Confluence, get a timebomb license.  

Using the Confluence Data Center Plugin Validator

You can use the Confluence Data Center Plugin Validator to check your app. The tool finds where apps are attempting to store non Serializable data into an Atlassian Cache. Read more about using the Confluence Data Center Plugin Validator.

Caching in a cluster

In many simple apps, it's common to cache data in a field in your object - typically a ConcurrentMap or WeakHashMap. This caching will not work correctly in a cluster because updating the data on one node will make the cached data on the other node stale.

The solution is to use the caching API provided with Confluence, Atlassian Cache. For example code and a description of how to cache data correctly in Confluence, see How do I cache data in a plugin?

Both keys and values of data stored in a cache in a Confluence cluster must implement Serializable.

Scheduled tasks

Without any intervention, scheduled tasks will execute independently on each Confluence node in a cluster. In some circumstances, this is desirable behaviour. In other situations, you will need to use cluster-wide locking to ensure that jobs are only executed once per cluster.

The easiest way to do this is to use the perClusterJob attribute on your job module declaration, as documented on the Job Module page.

In some cases you may need to implement locking manually to ensure the proper execution of scheduled tasks on different instances. See the locking section below for more information on this.

Cluster-wide locks

The locking primitives provided with Java (java.util.concurrent.Locksynchronized, etc.) will not properly ensure serialised access to data in a cluster. Instead, you need to use the cluster-wide lock that is provided through the Beehive ClusterLockService API.

Confluence 5.5 onwards

Below is an example of using a cluster-wide lock via ClusterLockService.getLockForName() under Confluence 5.5:

1
2
3
4
5
6
7
8
9
10
11
12
13
ClusterLock lock = clusterLockService.getLockForName(getClass().getName() + ".taskExecutionLock");
if (lock.tryLock()) {
    try {
        log.info("Acquired lock to execute task");
        executeTask();
    }
    finally {
        lock.unlock();
    }
}
else {
    log.info("Task is running on another instance");
}

Backward compatability

For compatibility across versions of Confluence prior to 5.5, Beehive provides the compatibility library beehive-compat, which supports the Beehive API across older versions of Confluence.

Event handling

By default, Confluence events are only propagated on the instance on which they occur. This is normally desirable behaviour for apps, which can rely on this to only respond once to a particular event in a cluster. It also ensures that the Hibernate-backed objects which are often included in an event will still be attached to the database session when interacting with them in your plugin code.

If your plugin needs to publish events to other nodes in the cluster, we recommend you do the following:

  1. Ensure the event extends ConfluenceEvent class and implements ClusterEvent interface
  2. Listen for ClusterEventWrapper event and perform instanceof check to wrapper.getEvent() in order to receive the event on remote nodes

Example clustered event listener

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class MyClusterEventListener {
    @EventListener
    public void handleLocalEvent(MyClusterEvent event) {
        // Handle event originating from local node
    }

    @EventListener
    public void handleRemoteEvent(ClusterEventWrapper wrapper) {
        Event event = wrapper.getEvent();
        if (event instanceof MyClusterEvent) {
            // Handle event originating from remote node
        }
    }
}

Like clustered cache data, events which are republished across a cluster can only contain fields which implement Serializable or are marked transient. In some cases, it may be preferable to create a separate event class for cluster events which includes object IDs rather than Hibernate-backed objects. Other instances can then retrieve the data from the database themselves when processing the event.

Confluence will only publish cluster events when the current transaction is committed and complete. This is to ensure that any data you store in the database will be available to other instances in the cluster when the event is received and processed.

Home Directory

In a clustered environment Confluence has both a local home and a shared home. The following table shows examples of what is stored in each.

Local HomeShared Home
  • Logs
  • Lucene Index
  • temp
  • confluence.cfg.xml
  • add-ons
  • Data including attachments and avatars
  • Backup/Restore files
  • temp

The now deprecated method BootstrapManager.getConfluenceHome() will return the shared home. Two new methods getSharedHome() and getLocalHome() return the shared home and the local home respectively.

Apps will need to decide the most appropriate place to store any data they place on the file system, however the shared home should be the correct place in most scenarios.

On standalone environment BootstrapManager.getConfluenceHome() will return the local home whereas getSharedHome() will return "shared-home" directory inside local home directory.

Operation complexity

The difference between Server and Data Center is not only architectural. Our largest customers choose Data Center for its stability and performance characteristics, therefore developing a plugin you should estimate every operation in terms of time and memory complexity.

See Testing your app on a large instance for examples of large datasets for different products. When thinking about performance, it’s vital to always keep those numbers in mind. The ability of your plugin to perform well with large datasets is a major part of Data Center approvals process.

It is also important to think about memory complexity because there’s a risk that large memory allocation will produce an OutOfMemoryError, which causes a node to die. A common approach to achieve a constant memory footprint is to use limits and pagination in every place when data is loaded or processed. If you are facing a tradeoff between operation speed and memory requirements, it is usually better to err on the side of less memory.

Handling network operations

If the plugin implements network communication with the external system, it is always important to consider the following.

Using the connection pool

Make sure that your plugin doesn’t open and close a TCP connection for every communication with an external system. We recommend using Apache HttpClient as an implementation of a generic connection pool.

Timeouts

Every network communication should be protected by meaningful timeouts. The application should be able to gracefully handle network system downtime or other network issues. Make sure you test the scenario and write your plugin code defensively to catch any errors and show them to the user.

Ability to override timeout values

On some instances it helps to have the ability to override timeout values. For example, if the network is slow, it might be useful to increase the timeout. One of the ways to achieve this is to provide a system property for each timeout value with a sensible default value.

Load network data in an asynchronous way

Since network operation could take large time to execute (up to timeout), it’s important to block as few things as possible while waiting for the response. For example, if the plugin displays some information obtained from network call, it shouldn’t block the rest of the page. In some cases where eventual consistency is acceptable, remote data could be cached and updated independently of displaying.

Performance and scale testing

It is essential that your app performs under the types of load typical of large Data Center installations.

See Performance and scale testing your Data Center app to find out how to test your app.

Mark your app as cluster compatible and submit it for review

Finally, you need to mark your app as cluster compatible. This is done in the plugin-info section of your app descriptor (atlassian-plugin.xml).

SeeSubmitting your Data Center app to Atlassian Marketplace for step by step instructions on how to mark your app as compatible, and submit it for approval.

Important note: plugins should not cache licenses or license states as this will prevent license changes being correctly propagated to all nodes in the cluster. UPM will handle any caching and performance improvement. 

How do I cache data in a plugin?
Confluence Data Center Plugin Validator
Technical Overview of Clustering in Confluence
Confluence Clustering Overview