Last updated Jun 6, 2023

Cloud Fortified Apps Program reliability requirements

Key principles

The reliability requirements for Cloud Fortified apps are built around the following principles.

Creating a reliable experience for customers

The Cloud Fortified Apps Program measures the reliability of apps using Service Level Indicators (SLIs) and Service Level Objectives (SLO).

Service Level Indicator (SLI) is the measurement you use to track your app's capabilities (such as uptime or response time).
Service Level Objective (SLO) is the target declared about a specific SLI (for example, 99.95% uptime). In the Cloud Fortified program, SLOs are measured over 28 days.

Example:

SLI (metric)	SLO (target value)
App availability success rate	99.9%

A Cloud Fortified app is considered reliable when it consistently meets the SLO of each SLI.

Detecting incidents before customers

The program reduces the Mean Time To Detect (MTTD) by sending alerts when a metric is breached. It drives patterns and behaviors than enable partners to detect issues by monitoring, rather than relying on customers to report issues.

How to comply

1. Familiarize yourself with metrics to measure your app's reliability

For Connect apps, see Connect metrics and Monitor your Connect apps.
For Forge apps, see Forge metrics. Note, we don't yet have SLIs and SLOs defined for Forge apps.

2. Implement synthetic tests

Synthetic tests are automated tests that simulate real user interactions to validate core app capabilities and experiences. They are usually implemented with emulated web browsers or recorded web requests.

In this context, we suggest you implement automated tests that simulate users interacting with your app through Jira or Confluence, and run them regularly against your Developer First Rollout instance.

Synthetic tests let you spot cases where product changes quickly degrade your app's core capabilities.

To implement synthetic tests:

Identify 1-3 core capabilities that must be validated for your app. Here's an Atlassian example.
Write tests to exercise those capabilities through Jira or Confluence using your chosen framework (for example, WebDriver).
Run tests regularly (for example, hourly). Consistent testing ensures you'll detect failures caused by dependencies.

3. Identify or implement a health check resource (for Connect apps):

To give us a baseline measurement of whether your Connect app is up, you must provide the URL of a health check resource for your app. For more details, see App availability success rate.

4. Include the following information when you apply for the program

Information	Type of information	Why do we need it
Your app's scalability characteristics	Explain your app's scaling factors (for example, database accesses, concurrent request processing, queues, bulk operations, non-linear operations, pagination, or N+1 API calls for additional data)? Explain how you expect your app to handle: thousands of concurrent users large datasets distributed users Explain how you'll respond to rate limits from Atlassian APIs. Explain the testing you'll undertake to assess your app's ability to work at scale.	We want to make sure you have considered your app's scalability characteristics and have validated them to a reasonable degree.
Testing against Developer First Rollout instances	Sign up for a Developer First Rollout instance to validate your app. See the sign up form: Jira and Confluence. Describe any pre- or post-deployment testing you do against the Developer First Rollout instance.	We expect Cloud Fortified apps to use Developer First Rollout instances to detect unexpected behavior when product changes are rolled out as early as possible.
Your service recovery plan	Define the recovery plan for your app. Describe the process you follow when you determine a new deployment of your app has a severe issue(for example, rollback, rollforward, or similar mechanism) Describe the typical duration from where the fix is merged into the code to where it is fully deployed to production. If your app stores data, describe how you would recover from lost or corrupt data stores. Define your backup frequency and retention. Tell us how fast you can perform a data restore. Describe testing of your data restore process, if there is any.	We want to ensure you have developed a straightforward approach for rolling out fixes to production and managing data restores if necessary.
Your existing incident management process	Describe your incident management process: Define the time during which your team is available to respond to issues. Tell us how they are notified of issues if they respond during business hours. Describe how you typically discover production issues with your app (for example, support tickets, monitoring, or other means). If you have automated alerting on metrics in production, please summarize the types of metrics you monitor and the alert thresholds. Describe how your team members communicate with each other during incidents. Describe how your team communicates with customers during incidents.	We're looking to understand your approach to incident management so we can collaborate on improving the incident response.

5. Familiarize yourself with our incident management process

6. Monitor via the developer console (for Connect apps)

Once we've given you access, validate the access and familiarize yourself with the developer console.
View app metrics and confirm they match your expectations. Contact us if there is any unexpected behavior (for example, suspicious or missing data).
Make sure you receive email notifications when SLIs breach their SLOs.

7. Handle deprecations

Act on deprecation notices within the deprecation period.