Last updated Nov 14, 2022

Rate this page:

Cloud Fortified Apps Program reliability requirements

Key principles

The reliability requirements for Cloud Fortified apps are built around these principles:

Creating a reliable experience for customers

The Cloud Fortified program measures the reliability of apps using specific metrics – Service Level Indicators (SLIs). Each SLI has its target value – Service Level Objective (SLO).

  • Service Level Indicator (SLI) is the measurement you use to track your app's capabilities (such as uptime or response time).

  • Service Level Objective (SLO) is the target declared about a specific SLI (for example, 99.95% uptime). In the Cloud Fortified program, SLOs are measured over 28 days.

Example:

SLI (metric)SLO (target value)
App availability success rate99.9%

A Cloud Fortified app is considered reliable when it consistently meets the SLO of each SLI.

Detecting incidents before customers

The Cloud Fortified program reduces the Mean Time To Detect (MTTD) by sending alerts when a metric is breached. It drives patterns and behaviors than enable partners to detect issues by monitoring rather than relying on customers to report issues.

How to comply

1. Familiarize yourself with metrics to measure your app's reliability:

2. Implement synthetic tests:

Synthetic tests are automated tests that simulate real user interactions to validate core app capabilities and experiences. They are usually implemented with emulated web browsers or recorded web requests.

In this context, we suggest you implement automated tests that simulate users interacting with your app through Jira or Confluence and run them regularly against your Developer First Rollout instance.

Synthetic tests let you spot cases where product changes quickly degrade your app's core capabilities.

To implement synthetic tests:

The Metrics publish API has been deprecated and will be removed after April 24, 2023.

Metrics publish API reference

PUT - /rest/atlassian-connect/latest/addons-metrics/${addon_key}/publish

This API is used by Cloud Fortified apps to:

  • Refine iframe success rate metrics by submitting custom success/failure events.
  • Publish synthetic test results

These metrics were removed from the program on October 24, 2022.

Headers

HeaderDescription
AuthorizationJWT ${token}

See Connect JWT documentation
Content-Typeapplication/json

Parameters

NameDescription
appKey *required
(Path parameter)
App Key
body *required
(Body)
Array<AddonMetrics>
List of metrics data to publish


AddonMetrics:
metricsType *required
enum: IFRAME or SYNTHETIC

Metrics type which will be used to construct metrics name. (eg. metrics.external.connect.synthetic.successful)


moduleType
string

Module type of the metric. Only applicable to IFRAME metrics. This value must be a valid Connect module type for the product, a valid key in the "modules" field in the app descriptor, e.g. "adminPages".

durationInMillis
long

Duration of the successful checks that will be used for Capability metrics. Metrics with failed status will not publish this latency data.

status *required
enum: SUCCESS or FAIL

Indicates if the check was successful. This will be used when calculating reliability metrics.

Responses

CodeDescription
200
Success
Successfully published metrics
403
Forbidden
Request not allowed for appKey
Wrong authentication signature
Request is only allowed from the app server with a valid installation
408
Bad Request
Latency should not be a negative value
Metrics type not supported
Unknown module type

3. Identify or implement a health check resource (for Connect apps):

To enable Atlassian to obtain a baseline measurement of whether a Connect app is up, you must provide the URL of a health check resource for your app.

See App availability success rate for detailed requirements for health checks.

4. Submit this information in your approval ticket:

Information

Type of information

Why do we need it

Your app's scalability characteristics

  • Explain your app's scaling factors (for example, database accesses, concurrent request processing, queues, bulk operations, non-linear operations, pagination, or N+1 API calls for additional data)?

  • Explain what do you expect your app to do in the presence of:

    • thousands of concurrent users

    • large datasets

    • distributed users

  • Explain how you'll respond to rate limits from Atlassian APIs.

  • Explain the testing you'll undertake to assess your app’s ability to work at scale.

We want to make sure you have considered your app's scalability characteristics and have validated them to a reasonable degree.

Testing against Developer First Rollout instances

Sign up for a Developer First Rollout instance to validate your app. See the sign up form: Jira and Confluence.

Describe any pre- or post-deployment testing you do against the Developer First Rollout instance.

We expect Cloud Fortified apps to use Developer First Rollout instances to detect unexpected behavior when product changes are rolled out as early as possible.

Your service recovery plan

Define the recovery plan for your app.

  • Describe the process you follow when you determine a new deployment of your app has a severe issue(for example, rollback, rollforward, or similar mechanism)

  • Describe the typical duration from where the fix is merged into the code to where it is fully deployed to production.

  • If your app stores data, describe how you would recover from lost or corrupt data stores.

  • Define your backup frequency and retention. Tell us how fast you can perform a data restore.

  • Describe testing of your data restore process, if there is any.

We want to ensure you have developed a straightforward approach for rolling out fixes to production and managing data restores if necessary.

Your existing incident management process

Describe your incident management process:

  • Define the time during which your team is available to respond to issues. Tell us how they are notified of issues if they respond during business hours.

  • Describe how you typically discover production issues with your app (for example, support tickets, monitoring, or other means).

  • If you have automated alerting on metrics in production, please summarize the types of metrics you monitor and the alert thresholds.

  • Describe how your team members communicate with each other during incidents.

  • Describe how your team communicates with customers during incidents.

We're looking to understand your approach to incident management so we can collaborate on improving the incident response.

5. Familiarize yourself with our incident management process.

6. Final steps

Once we have granted you access, validate the access and familiarize yourself with the developer console (for Connect apps).

Confirm the metrics on the developer console match your expectations. Please contact us if there is any unexpected behavior (for example, suspicious or missing data).

Make sure you receive email notifications when SLIs breach their SLOs.

Deprecation

Act on deprecation notices within the deprecation period.

Rate this page: