Add-ons operations guide
When you release an add-on publicly on the Atlassian Marketplace, administrators of a cloud application can install it in their instance, meaning they rely on your service to deliver content to their users. Therefore, ensuring consistent operation of your add-on is critical. You can avoid many potential service disruptions by planning carefully. There are some important considerations to take into account:
- Our customers are all around the world, covering all timezones.
- While some instances have a handful of users, others have thousands of users depending on our products to run their business.
- We have designed our cloud-based products to be both secure and reliable, boasting 99.9% uptime and 24x7x365 support.
Below, we discuss strategies for running your Atlassian Connect add-on as scalable, reliable software as a service. Some of these aspects need to be addressed very early in the design, as implementing them after the fact can be really difficult.
Table of contents
- Defining your Service Level Agreement (SLA)
- Managing your add-on performance
- Maintaining your add-ons
- Addressing business continuity planning
- Providing support
1. Defining your Service Level Agreement (SLA)
You should define your service level targets, which you can validate during performance testing, use as a basis to monitor your add-ons at runtime, and guarantee by scaling your deployment. The following table lists some examples of indicators you could track:
|Performance||Uptime||Time during which the add-on is operational, outside of your documented maintenance windows (e.g. 99%)|
|User interface response times||e.g. average response time, mean response time, 90th or 95th percentile response time|
|Service calls (e.g. REST) response times||e.g. average response time, mean response time, 90th or 95th percentile response time|
|Business Continuity||RTO (Recovery Time Objective)||Duration of time within which the service must be restored after a major incident (e.g. 8h)|
|RPO (Recovery Point Objective)||Maximum tolerable period in which data might be lost due to a major incident (e.g. 24h)|
|Support||Availability||Hours of operation for the support team (e.g. 24x7x365, or 8 hours a day / 5 days a week in your timezone)|
|Initial response time||
Time elapsed between the customer's first request and the initial support response. For example:
- Level 1: 1 hour
- Level 2: 4 hours
- Level 3: 8 hours
- Level 4: 24 hours
|Resolution time||Time elapsed between the customer's first request and the issue being resolved.|
2. Managing your add-on performance
There are two ways to design your add-ons to scale with a growing number of installations and users:
- Vertical scaling: you scale by adding more resources (e.g. CPU, memory) to existing nodes
- Horizontal scaling: you scale by adding more nodes (e.g. servers)
It may be difficult to predict exactly the resources your add-ons will need. For this reason, and because your add-ons will operate in a cloud environment targeting thousands of customers, we encourage you to design your add-ons to scale horizontally.
Existing cloud providers can help you scale your implementations. One example of such providers is Heroku, a cloud application platform that can host applications developed in Java, Node.js, Pyton, Ruby, Scala or Clojure. Heroku leverages Amazon AWS (Amazon Web Services) technology, and mostly supports horizontal scaling. Other examples or world-class platforms include the Google Cloud Platform and Salesforce One.
We recommend you run performance tests for your add-ons. This will help you define the resources your add-ons require when you first deploy them, and understand how new versions of your add-ons impact resource utilization. The following classes of tests are particularly useful:
|Load testing||Test the add-on under the load that is expected when the add-on is live, to validate that it is behaving as expected.|
|Stress testing||Identify the limits of the add-on, and understand how the add-on behaves when the load is much higher than the expected load.|
|Soak testing||Identify potential memory leaks, degrading performance because of poor database indexing, etc. A soak test is the equivalent of a load test that runs over a long period of time.|
|Spike testing||Understand how the add-on will react to a sudden burst of requests.|
You should run performance tests for your add-ons:
- In isolation, using mock implementations of Atlassian products REST APIs. This helps identify any issues (memory leaks, etc.) limited to your implementation.
- Using a real-life deployment environment for end-to-end performance tests. For this we recommend you use a performance testing environment that is as close as possible to the production environment. You should set up cloud instances of Atlassian products for this purpose.
There are a number of tools to help you design and run performance tests for your add-ons. Examples of Load Testing Frameworks include The Grinder and Locust. They help you run distributed tests using many load injector machines.
Monitoring your SLA
You should have tools to monitor your add-on performance at runtime, and procedures in place to scale resources once specific thresholds are met. At a minimum, you should monitor utilization of resources by your add-ons (CPU, memory, disk space, etc.). When using a cloud provider, you can look at strategies to automatically scale the resources allocated to your add-ons based on load.
3. Maintaining your add-ons
Versioning and upgrading
We automatically detect updates to add-ons with a polling service. This way, you can easily release fixes and new features without having to manually create new version entries in the Marketplace. For more information on how to upgrade your add-ons and manage versions, you should read the Upgrading your Add-on section.
Since your add-on and Atlassian products are decoupled, you can decide when to upgrade your add-ons independently from the maintenance windows. Ideally, your solution should be architected in a way that ensures maintenance is transparent to end-users. If this is not possible, make sure you publish your maintenance windows online, and provide a meaningful error message to users trying to access your add-ons at this time.
Deprecating or terminating service
We understand that occasionally services need to be deprecated, and they can no longer be supported. There are a number of considerations to be made when an add-on service is deprecated.
Free add-ons: Give a two month deprecation warning to your customers. You must do this in a manner which is appropriate to your add-on; for example by placing a banner within UI modules which are rendered in the page or by emailing the customer contact. This information can be retrieved from the add-ons REST API.
Paid add-ons: You must maintain paid add-ons as long as a customer has paid for service, in accordance with the Marketplace Vendor Agreement. Please reach out to the Atlassian Marketplace team for assistance. If a scope increase requires administrator approval, then you must continue support for the previous version of your add-on until all customers have upgraded to the newer version.
4. Addressing business continuity planning
You should address the following aspects when looking at potential major outages:
- Data backups: you should have a data backup strategy that ensures your RPO (Recovery Point Objective) is met. For example, for a RPO of 24h, you should do a full backup of all add-on data overnight, keeping the backups on a different site to the one that is running the add-on.
- Recovery procedures: you should have procedures in place to restore your add-ons in the case of a major outage, and we suggest you do a few dry runs. Ideally, you should be testing your disaster recovery procedures regularly. Hope for the best, plan for the worst!
Note that using an world-class cloud provider minimizes the risk of a major outage impacting the users of your add-ons. For example when using Heroku with Heroku Postgres, the platform automatically backs up deployed applications and data, and automatically brings the application back online in case of a data center outage, with minimum data loss.
5. Providing support
First, check out the Atlassian Support Offerings. We are well known for our great support! Here is what we recommend you focus on:
|Provide a support URL for all paid-via-Atlassian add-ons||Your support URL clearly outlines the avenues a customer can take to get technical support.|
|Offer support at least 8 hours a day, 5 days a week in your local time zone for all paid-via-Atlassian add-ons||Support hours can be any time, relative to your local timezone.|
|Use an issue tracker like JIRA to resolve and track customer-reported bugs and feature requests, for all paid-via-Atlassian add-ons.||You don't need to use an Atlassian product to track your issues, but use some kind of tracker to keep on top of customer-reported bugs and improvement requests.|
|Provide Atlassian with 24/7 emergency contact information||Provide an email address or phone number to Atlassian just in case we need to contact you for emergency support issues, such as those involving customer data loss or downtime. If something goes wrong, we should be able to reach you via this contact information 24/7.|