Last updated Nov 6, 2017

Cloud app operations guide

When you release an app publicly on the Atlassian Marketplace, administrators of a cloud product can install it in their site, meaning they rely on your service to deliver content to their users. Therefore, ensuring consistent operation of your app is critical. You can avoid many potential service disruptions by planning carefully. There are some important considerations to take into account:

  • Our customers are all around the world, covering all timezones.
  • While some instances have a handful of users, others have thousands of users depending on our products to run their business.
  • We have designed our cloud-based products to be both secure and reliable, boasting 99.9% uptime and 24x7x365 support.

Below, we discuss strategies for running your cloud app as scalable, reliable software as a service. Some of these aspects need to be addressed very early in the design as implementing them after the fact can be really difficult.

Define a service level agreement (SLA)

You should define your service level targets, which you can validate during performance testing, use as a basis to monitor your apps at runtime, and guarantee by scaling your deployment. The following table lists some examples of indicators you could track:

Category Indicator Description
Performance Uptime Time during which the app is operational, outside of your documented maintenance windows (e.g., 99%).
User interface response times For example, average response time, mean response time, 90th or 95th percentile response time.
Service calls response times (e.g., REST) For example, average response time, mean response time, 90th or 95th percentile response time.
Business continuity Recovery time objective (RTO) Duration of time within which the service must be restored after a major incident (e.g., 8h).
Recovery point objective (RPO) Maximum tolerable period in which data might be lost due to a major incident (e.g., 24h).
Support Availability Hours of operation for the support team (e.g., 24x7x365, or 8 hours a day / 5 days a week in your timezone).
Initial response time Time elapsed between the customer's first request and the initial support response. For example:
- Level 1: 1 hour
- Level 2: 4 hours
- Level 3: 8 hours
- Level 4: 24 hours
Resolution time Time elapsed between the customer's first request and the issue being resolved.

Publish your SLA

You should publish a SLA outlining your support and service level terms online.

Manage your app performance

Scalability

There are two ways to design your apps to scale with a growing number of installations and users:

  • Vertical scaling: You scale by adding more resources (e.g., CPU, memory) to existing nodes.
  • Horizontal scaling: You scale by adding more nodes (e.g., servers).

It may be difficult to predict exactly the resources your apps will need. For this reason, and because your apps will operate in a cloud environment targeting thousands of customers, we encourage you to design your apps to scale horizontally.

Existing cloud providers can help you scale your implementations. For example, Heroku is a cloud application platform that can host applications developed in Java, Node.js, Python, Ruby, Scala, or Clojure. Heroku leverages Amazon AWS, and mostly supports horizontal scaling. Other examples include Google Cloud Platform and Salesforce One.

Performance testing

We recommend you run performance tests for your apps. This will help you define the resources your apps require when you first deploy them, and understand how new versions of your apps impact resource utilization. The following classes of tests are particularly useful:

Test typeObjectives
Load testing Test the app under the load that is expected when the app is live, to validate that it is behaving as expected.
Stress testing Identify the limits of the app, and understand how the app behaves when the load is much higher than the expected load.
Soak testing Identify potential memory leaks, degrading performance because of poor database indexing, etc. A soak test is the equivalent of a load test that runs over a long period of time.
Spike testing Understand how the app will react to a sudden burst of requests.

You should run performance tests for your apps:

  • In isolation, using mock implementations of Atlassian products REST APIs. This helps identify any issues (memory leaks, etc.) limited to your implementation.
  • Using a real-life deployment environment for end-to-end performance tests. For this we recommend you use a performance testing environment that is as close as possible to the production environment. You should set up cloud instances of Atlassian products for this purpose.

There are a number of tools to help you design and run performance tests for your apps. Examples of load testing frameworks include The Grinder and Locust. They help you run distributed tests using many load injector machines.

Monitor your SLA

You should have tools to monitor your app performance at runtime, and procedures in place to scale resources once specific thresholds are met. At a minimum, you should monitor utilization of resources by your apps (CPU, memory, disk space, etc.). When using a cloud provider, you can look at strategies to automatically scale the resources allocated to your apps based on load.

Maintain your apps

Versioning and upgrading

We automatically detect updates to apps with a polling service. This way, you can easily release fixes and new features without having to manually create new version entries in the Marketplace. For more information on how to upgrade your apps and manage versions, see Upgrading and versioning cloud apps.

Test, test, and if you're in doubt... Test some more!

Make sure test new features and run regression tests to ensure existing functionality is not broken when you release new versions.

Deprecating or terminating service

We understand that occasionally services need to be deprecated, and they can no longer be supported. There are a number of considerations to be made when an app service is deprecated:

Business continuity planning

You should address the following aspects when looking at potential major outages:

  • Data backups: you should have a data backup strategy that ensures your RPO (Recovery Point Objective) is met. For example, for a RPO of 24h, you should do a full backup of all app data overnight, keeping the backups on a different site to the one that is running the app.
  • Recovery procedures: you should have procedures in place to restore your apps in the case of a major outage, and we suggest you do a few dry runs. Ideally, you should be testing your disaster recovery procedures regularly. Hope for the best, plan for the worst!

Note that using an world-class cloud provider minimizes the risk of a major outage impacting the users of your apps. For example when using Heroku with Heroku Postgres, the platform automatically backs up deployed applications and data, and automatically brings the application back online in case of a data center outage, with minimum data loss.

Provide support

First, check out the Atlassian support offerings. We are well known for our great support! Here is what we recommend you focus on:

RecommendationDetails
Provide a support URL for all paid-via-Atlassian apps.Your support URL clearly outlines the avenues a customer can take to get technical support.
Offer support at least 8 hours a day, 5 days a week in your local time zone for all paid-via-Atlassian apps.Support hours can be any time, relative to your local timezone.
Use an issue tracker like Jira to resolve and track customer-reported bugs and feature requests, for all paid-via-Atlassian apps.You don't need to use an Atlassian product to track your issues, but use some kind of tracker to keep on top of customer-reported bugs and improvement requests.
Provide Atlassian with 24x7 emergency contact information.Provide an email address or phone number to Atlassian just in case we need to contact you for emergency support issues, such as those involving customer data loss or downtime. If something goes wrong, we should be able to reach you via this contact information 24x7.

Rate this page: