Last updated Oct 28, 2024

Apps operations guide

When you release an app publicly on the Atlassian Marketplace, administrators of a cloud application can install it in their instance, meaning they rely on your service to deliver content to their users. Therefore, ensuring consistent operation of your app is critical. You can avoid many potential service disruptions by planning carefully. There are some important considerations to take into account:

Our customers are all around the world, covering all timezones.
While some instances have a handful of users, others have thousands of users depending on our products to run their business.
We have designed our cloud-based products to be both secure and reliable, boasting 99.9% uptime and 24x7x365 support.

Below, we discuss strategies for running your Atlassian Connect app as scalable, reliable software as a service. Some of these aspects need to be addressed very early in the design, as implementing them after the fact can be really difficult.

Defining your Service Level Agreement (SLA)

You should define your service level targets, which you can validate during performance testing, use as a basis to monitor your apps at runtime, and guarantee by scaling your deployment. The following table lists some examples of indicators you could track:

Category	Indicator	Description
Performance	Uptime	Time during which the app is operational, outside of your documented maintenance windows (e.g., 99%)
	User interface response times	e.g., average response time, mean response time, 90th or 95th percentile response time
	Service calls (e.g., REST) response times	e.g., average response time, mean response time, 90th or 95th percentile response time
Business Continuity	RTO (Recovery Time Objective)	Duration of time within which the service must be restored after a major incident (e.g., 8h)
Business Continuity	RPO (Recovery Point Objective)	Maximum tolerable period in which data might be lost due to a major incident (e.g., 24h)
Support	Availability	Hours of operation for the support team (e.g., 24x7x365, or 8 hours a day/5 days a week in your timezone)
	Initial response time	Time elapsed between the customer's first request and the initial support response. For example: - Level 1: 1 hour - Level 2: 4 hours - Level 3: 8 hours - Level 4: 24 hours
	Resolution time	Time elapsed between the customer's first request and the issue being resolved

You should publish a Service Level Agreement (SLA) outlining your support and service level terms online.

Managing your app performance

Scalability

There are two ways to design your apps to scale with a growing number of installations and users:

Vertical scaling: you scale by adding more resources (e.g., CPU, memory) to existing nodes
Horizontal scaling: you scale by adding more nodes (e.g., servers)

It may be difficult to predict exactly the resources your apps will need. For this reason, and because your apps will operate in a cloud environment targeting thousands of customers, we encourage you to design your apps to scale horizontally.

Existing cloud providers can help you scale your implementations. One example of such providers is Heroku, a cloud application platform that can host applications developed in Java, Node.js, Python, Ruby, Scala, or Clojure. Heroku leverages Amazon AWS (Amazon Web Services) technology, and mostly supports horizontal scaling. Other examples or world-class platforms include the Google Cloud Platform and Salesforce One.

Performance testing

We recommend you run performance tests for your apps. This will help you define the resources your apps require when you first deploy them, and understand how new versions of your apps impact resource utilization. The following classes of tests are particularly useful:

Test Type	Objectives
Load testing	Test the app under the load that is expected when the app is live, to validate that it is behaving as expected.
Stress testing	Identify the limits of the app, and understand how the app behaves when the load is much higher than the expected load.
Soak testing	Identify potential memory leaks, degrading performance because of poor database indexing, etc. A soak test is the equivalent of a load test that runs over a long period of time.
Spike testing	Understand how the app will react to a sudden burst of requests.

You should run performance tests for your apps:

In isolation, using mock implementations of Atlassian products REST APIs. This helps identify any issues (memory leaks, etc.) limited to your implementation.
Using a real-life deployment environment for end-to-end performance tests. For this we recommend you use a performance testing environment that is as close as possible to the production environment. You should set up cloud instances of Atlassian products for this purpose.

There are a number of tools to help you design and run performance tests for your apps. Examples of Load Testing Frameworks include The Grinder and Locust. They help you run distributed tests using many load injector machines.

Monitoring your SLA

You should have tools to monitor your app performance at runtime, and procedures in place to scale resources once specific thresholds are met. At a minimum, you should monitor utilization of resources by your apps (CPU, memory, disk space, etc.). When using a cloud provider, you can look at strategies to automatically scale the resources allocated to your apps based on load.

Maintaining your apps

Versioning and upgrading

We automatically detect updates to apps with a polling service. This way, you can easily release fixes and new features without having to manually create new version entries in the Marketplace. For more information on how to upgrade your apps and manage versions, you should read Upgrading your App.

**Test, test, and when in doubt... Test some more!**

Make sure you not only test new features, but also run regression tests to ensure existing functionality is not broken when releasing new versions.

Maintenance windows

Since your app and Atlassian products are decoupled, you can decide when to upgrade your apps independently from the maintenance windows. Ideally, your solution should be architected in a way that ensures maintenance is transparent to end-users. If this is not possible, make sure you publish your maintenance windows online, and provide a meaningful error message to users trying to access your apps at this time.

Addressing business continuity planning

You should address the following aspects when looking at potential major outages:

Data backups: you should have a data backup strategy that ensures your RPO (Recovery Point Objective) is met. For example, for a RPO of 24h, you should do a full backup of all app data overnight, keeping the backups on a different site to the one that is running the app.
Recovery procedures: you should have procedures in place to restore your apps in the case of a major outage, and we suggest you do a few dry runs. Ideally, you should be testing your disaster recovery procedures regularly. Hope for the best, but plan for the worst!

Note that using an world-class cloud provider minimizes the risk of a major outage impacting the users of your apps. For example when using Heroku with Heroku Postgres, the platform automatically backs up deployed applications and data, and automatically brings the application back online in case of a data center outage, with minimum data loss.

Providing support

First, check out the Atlassian Support Offerings. We are well known for our great support! Here is what we recommend you focus on:

Recommendation	Details
Provide a support URL for all paid-via-Atlassian apps.	Your support URL clearly outlines the avenues a customer can take to get technical support.
Offer support at least 8 hours a day, 5 days a week in your local time zone for all paid-via-Atlassian apps.	Support hours can be any time, relative to your local timezone.
Use an issue tracker like Jira to resolve and track customer-reported bugs and feature requests, for all paid-via-Atlassian apps.	You don't need to use an Atlassian product to track your issues, but use some kind of tracker to keep on top of customer-reported bugs and improvement requests.
Provide Atlassian with 24 hours a day/7 days a week emergency contact information.	Provide an email address or phone number to Atlassian just in case we need to contact you for emergency support issues, such as those involving customer data loss or downtime. If something goes wrong, we should be able to reach you via this contact information 24 hours a day/7 days a week.