Last updatedNov 17, 2019

Data Center App Performance Toolkit User Guide For Confluence

To use the Data Center App Performance Toolkit, you'll need to first clone its repo.

1
git clone git@github.com:atlassian/dc-app-performance-toolkit.git

Follow installation instructions described in the dc-app-performance-toolkit/README.md file.

If you need performance testing results at a production level, follow instructions in this chapter to set up Confluence Data Center with the corresponding dataset.

For spiking, testing, or developing, your local Confluence instance would work well. Thus, you can skip this chapter and proceed with Testing scenarios. Still, script adjustments for your local dataset may be required.

Setting up Confluence Data Center

We recommend that you use the AWS Quick Start for Confluence Data Center to deploy a Confluence Data Center testing environment. This Quick Start will allow you to deploy Confluence Data Center with a new Atlassian Standard Infrastructure (ASI) or into an existing one.

The ASI is a Virtual Private Cloud (VPC) consisting of subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components required by all Atlassian applications, and then deploys Confluence into this new VPC. Deploying Confluence with a new ASI takes around 50 minutes. With an existing one, it'll take around 30 minutes.

Using the AWS Quick Start for Confluence

If you are a new user, perform an end-to-end deployment. This involves deploying Confluence into a new ASI.

If you have already deployed the ASI separately by using the ASI Quick StartASI Quick Start or by deploying another Atlassian product (Jira, Bitbucket, or Confluence Data Center), deploy Confluence into your existing ASI.

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional price for using this Quick Start. For more information, go to aws.amazon.com/pricing.

To reduce costs, we recommend you to keep your deployment up and running only during the performance runs.

Quick Start parameters

All important parameters are listed and described in this section. For all other remaining parameters, we recommend using the Quick Start defaults.

Confluence setup

ParameterRecommended Value
Collaborative editing modesynchrony-local
Confluence Version6.13.8

The Data Center App Performance Toolkit officially supports:

  • The latest Confluence Platform Release version: 7.0.x (Coming soon)
  • The latest Confluence Enterprise Release: 6.13.8

Cluster nodes

ParameterRecommended Value
Cluster node instance typec5.4xlarge
Maximum number of cluster nodes1
Minimum number of cluster nodes1
Cluster node instance volume size200

We recommend c5.4xlarge to strike the balance between cost and hardware we see in the field for our enterprise customers.

The Data Center App Performance Toolkit framework is also set up for concurrency we expect on this instance size. As such, underprovisioning will likely show a larger performance impact than expected.

Database

ParameterRecommended Value
Database instance classdb.m4.xlarge
RDS Provisioned IOPS1000
Master (admin) passwordPassword1!
Enable RDS Multi-AZ deploymenttrue
Application user database passwordPassword1!
Database storage200

The Master (admin) password will be used later when restoring the SQL database dataset. If password value is not set to default, you'll need to change DB_PASS value manually in the restore database dump script (later in Preloading your Confluence deployment with an enterprise-scale dataset).

Networking (for new ASI)

ParameterRecommended Value
Trusted IP range0.0.0.0/0 (for public access) or your own trusted IP range
Availability ZonesSelect two availability zones in your region. Both zones must support EFS (see Supported AWS regions for details).
Permitted IP range0.0.0.0/0 (for public access) or your own trusted IP range
Make instance internet facingtrue
Key NameThe EC2 Key Pair to allow SSH access. See Amazon EC2 Key Pairs for more info.

Networking (for existing ASI)

ParameterRecommended Value
Make instance internet facingtrue
Permitted IP range0.0.0.0/0 (for public access) or your own trusted IP range
Key NameThe EC2 Key Pair to allow SSH access. See Amazon EC2 Key Pairs for more info.

Running the setup wizard

After successfully deploying Confluence Data Center in AWS, you'll need to configure it:

  1. In the AWS console, go to Services > CloudFormation > Stack > Stack details > Select your stack.
  2. On the Outputs tab, copy the value of the LoadBalancerURL key.
  3. Open LoadBalancerURL in your browser. This will take you to the Confluence setup wizard.
  4. On the Get apps page, do not select addition apps, just click Next.
  5. On the next page, populate the Your License Key field by either:
    • Using your existing license, or
    • Generating an evaluation license, or
    • Contacting Atlassian to be provided two time-bomb licenses for testing. Ask for it in your DCHELP ticket. Click Next.
  6. On the Load Content page, click on the Empty Site.
  7. On the Configure User Management page, click on the Mane users and groups within Confluence.
  8. On the Configure System Administrator Account page, populate the following fields:
    • Username: admin (recommended)
    • Name: admin (recommended)
    • Email Address: email address of the admin user
    • Password: admin (recommended)
    • Confirm Password: admin (recommended) Click Next.
  9. On the Setup Successful page, click on the Start.
  10. After going through the welcome setup, enter any Space name to create an initial space and click Continue.
  11. Enter the first page title and click Publish.

After Preloading your Confluence deployment with an enterprise-scale dataset, the admin user will have admin/admin credentials.

Preloading your Confluence deployment with an enterprise-scale dataset

Data dimensions and values for an enterprise-scale dataset are listed and described in the following table.

Data dimensionsValue for an enterprise-scale dataset
Pages~900 000
Blogposts~100 000
Attachments~2 300 000
Comments~6 000 000
Spaces~5 000
Users~5 000

All the datasets use the standard admin/admin credentials.

Pre-loading the dataset is a three-step process:

  1. Importing the main dataset. To help you out, we provide an enterprise-scale dataset you can import either via the populate_db.sh script.
  2. Restoring attachments. We also provide attachments, which you can pre-load via an upload_attachments.sh script.
  3. Re-indexing Confluence Data Center. For more information, go to Re-indexing Confluence.

The following subsections explain each step in greater detail.

Importing the main dataset

You can load this dataset directly into the database (via a populate_db.sh script).

Loading the dataset via populate_db.sh script (~90 min)

We recommend doing this via the CLI.

To populate the database with SQL:

  1. In the AWS console, go to Services > EC2 > Instances.
  2. On the Description tab, do the following:
    • Copy the Public IP of the Bastion instance.
    • Copy the Private IP Confluence node instance.
  3. Using SSH, connect to the Confluence node via the Bastion instance:

    For Windows, use Putty to connect to the Confluence node over SSH. For Linux or MacOS:

    1
    2
    3
    4
    ssh-add path_to_your_private_key_pem
    export BASTION_IP=bastion_instance_public_ip
    export NODE_IP=node_private_ip
    ssh -o "proxycommand ssh -W %h:%p ec2-user@$BASTION_IP" ec2-user@${NODE_IP}

    For more information, go to Connecting your nodes over SSH.

  4. Download the populate_db.sh script and make it executable:

    1
    wget https://raw.githubusercontent.com/atlassian/dc-app-performance-toolkit/master/app/util/confluence/populate_db.sh && chmod +x populate_db.sh
  5. Review the following Variables section of the script:

    1
    2
    3
    4
    5
    6
    7
    8
    INSTALL_PSQL_CMD="amazon-linux-extras install -y postgresql10"
    DB_CONFIG="/var/atlassian/application-data/confluence/confluence.cfg.xml"
    CONFLUENCE_CURRENT_DIR="/opt/atlassian/confluence/current"
    CONFLUENCE_DB_NAME="confluence"
    CONFLUENCE_DB_USER="postgres"
    CONFLUENCE_DB_PASS="Password1!"
    CONFLUENCE_VERSION_FILE="/media/atl/confluence/shared-home/confluence.version"
    DATASETS_AWS_BUCKET="https://centaurus-datasets.s3.amazonaws.com/confluence"
  6. Run the script:

    1
    ./populate_db.sh | tee -a populate_db.log

Do not close or interrupt the session. It will take some time to restore SQL database. When SQL restoring is finished, an admin user will have admin/admin credentials.

In case of a failure, check the Variables section and run the script one more time.

Restoring attachments (~3 hours)

After Importing the main dataset, you'll now have to pre-load an enterprise-scale set of attachments.

  1. Using SSH, connect to the Confluence node via the Bastion instance:

    For Windows, use Putty to connect to the Confluence node over SSH. For Linux or MacOS:

    1
    2
    3
    4
    ssh-add path_to_your_private_key_pem
    export BASTION_IP=bastion_instance_public_ip
    export NODE_IP=node_private_ip
    ssh -o "proxycommand ssh -W %h:%p ec2-user@$BASTION_IP" ec2-user@${NODE_IP}

    For more information, go to Connecting your nodes over SSH.

  2. Download the upload_attachments.sh script and make it executable:

    1
    wget https://raw.githubusercontent.com/atlassian/dc-app-performance-toolkit/master/app/util/confluence/upload_attachments.sh && chmod +x upload_attachments.sh
  3. Review the following Variables section of the script:

    1
    2
    3
    4
    5
    DATASETS_AWS_BUCKET="https://centaurus-datasets.s3.amazonaws.com/confluence"
    ATTACHMENTS_TAR="attachments.tar.gz"
    ATTACHMENTS_DIR="attachments"
    TMP_DIR="/tmp"
    EFS_DIR="/media/atl/confluence/shared-home"
  4. Run the script:

    1
    ./upload_attachments.sh | tee -a upload_attachments.log

Do not close or interrupt the session. It will take some time to upload attachments to Elastic File Storage (EFS).

Re-indexing Confluence Data Center (~2-4 hours)

Before re-index, go to cog icon > General configuration > General configuration, click Edit for Site Configuration and set Base URL to LoadBalancerURL value.

For more information, go to Re-indexing Confluence.

  1. Log in as a user with the Confluence System Administrators global permission.
  2. Go to cog icon > General Configuration > Content Indexing.
  3. Click Rebuild and wait until re-indexing is completed.

Confluence will be unavailable for some time during the re-indexing process.

Create Index Snapshot (~30 min)

For more information, go to Administer your Data Center search index.

  1. Log in as a user with the Confluence System Administrators global permission.
  2. Create any new page with a random content (without a new page index snapshot job will not be triggered).
  3. Go to cog icon > General Configuration > Scheduled Jobs.
  4. Find Clean Journal Entries job and click Run.
  5. Make sure that Confluence index snapshot was created. To do that, use SSH to connect to the Confluence node via Bastion (where NODE_IP is the IP of the node):

    1
    2
    3
    4
    ssh-add path_to_your_private_key_pem
    export BASTION_IP=bastion_instance_public_ip
    export NODE_IP=node_private_ip
    ssh -o "proxycommand ssh -W %h:%p ec2-user@$BASTION_IP" ec2-user@${NODE_IP}
  6. Download the index-snapshot.sh file. Then, make it executable and run it:

    1
    2
    wget https://raw.githubusercontent.com/atlassian/dc-app-performance-toolkit/master/app/util/confluence/index-snapshot.sh && chmod +x index-snapshot.sh
    ./index-snapshot.sh | tee -a index-snapshot.log

    Index snapshot creation time is about 20-30 minutes. When index snapshot is successfully created, the following will be displayed in console output:

    1
    Snapshot was created successfully.

Testing scenarios

Using the Data Center App Performance Toolkit for Performance and scale testing your Data Center app involves two test scenarios:

Each scenario will involve multiple test runs. The following subsections explain both in greater detail.

Scenario 1: Performance regression

This scenario helps to identify basic performance issues without a need to spin up a multi-node Confluence DC. Make sure the app does not have any performance impact when it is not exercised.

Run 1 (~50 min)

To receive performance baseline results without an app installed:

  1. On the computer where you cloned the Data Center App Performance Toolkit, navigate to dc-app-performance-toolkit/app folder.
  2. Open the confluence.yml file and fill in the following variables:
    • application_hostname: your_dc_confluence_instance_hostname without protocol
    • application_protocol: HTTP or HTTPS
    • application_port: for HTTP - 80, for HTTPS - 443, or your instance-specific port. The self-signed certificate is not supported.
    • admin_login: admin user username
    • admin_password: admin user password
    • concurrency: number of concurrent users for JMeter scenario - 200 by default
    • test_duration: duration of the performance run - 45min by default
  3. Run bzt.

    1
    bzt confluence.yml
  4. View the following main results of the run in the dc-app-performance-toolkit/app/results/confluence/YY-MM-DD-hh-mm-ss folder:

    • results.csv: aggregated .csv file with all actions and timings
    • bzt.log: logs of the Taurus tool execution
    • jmeter.*: logs of the JMeter tool execution
    • pytest.*: logs of Pytest-Selenium execution

When the execution is successfully completed, the INFO: Artifacts dir: line with the full path to results directory will be displayed in console output. Save this full path to the run results folder. Later you will have to insert it under runName: "without app" for report generation.

Run 2 (~50 min)

To receive performance results with an app installed:

  1. Install the app you want to test.
  2. Run bzt.

    1
    bzt confluence.yml

When the execution is successfully completed, the INFO: Artifacts dir: line with the full path to results directory will be displayed in console output. Save this full path to the run results folder. Later you will have to insert it under runName: "with app" for report generation.

Generating a performance regression report

To generate a performance regression report:

  1. Navigate to the dc-app-performance-toolkit/app/reports_generation folder.
  2. Edit the performance_profile.yml file:
    • Under runName: "without app", in the fullPath key, insert the full path to results directory of Run 1.
    • Under runName: "with app", in the fullPath key, insert the full path to results directory of Run 2.
  3. Run the following command:

    1
    python csv_chart_generator.py performance_profile.yml
  4. In the dc-app-performance-toolkit/app/results/reports/YY-MM-DD-hh-mm-ss folder, view the .csv file (with consolidated scenario results) and the .png file.

Analyzing report

Once completed, you will be able to review the action timings with and without your app to see its impact on the performance of the instance. If you see a significant impact (>10%) on any action timing, we recommend taking a look into the app implementation to understand the root cause of this delta.

Scenario 2: Scalability testing

The purpose of scalability testing is to reflect the impact on the customer experience when operating across multiple nodes. For this, you have to run scale testing on your app.

For many apps and extensions to Atlassian products, there should not be a significant performance difference between operating on a single node or across many nodes in Confluence DC deployment. To demonstrate performance impacts of operating your app at scale, we recommend testing your Confluence DC app in a cluster.

Extending the base action

Extension scripts, which extend the base JMeter (confluence.jmx) and Selenium (confluence-ui.py) scripts, are located in a separate folder (dc-app-performance-toolkit/app/extension/confluence). You can modify these scripts to include their app-specific actions.

Modifying JMeter

JMeter is written in XML and requires JMeter GUI to view and make changes. You can launch JMeter GUI by running the ~/.bzt/jmeter-taurus/<jmeter_version>/bin/jmeter command.

Make sure you run this command inside the dc-app-performance-toolkit/app directory. The main jmeter/confluence.jmx file contains relative paths to other scripts and will throw errors if run and loaded elsewhere.

Here's a snippet of the base JMeter script (confluence.jmx):

Base JMeter script

For every base action, there is an extension script executed after the base script. In most cases, you should modify only the extension.jmx file. For example, if there are additional REST APIs introduced as part of viewing an issue, you can include these calls in the extension.jmx file under the view issue transaction.

Here's a snippet of the extension JMeter script (extension.jmx).

Extended JMeter script

This ensures that these APIs are called as part of the view issue transaction with minimal intrusion (for example, no additional logins). For a fairer comparison, you have to keep the same number of base transactions before and after the plugin is installed.

The controllers in the extension script, which are executed along with the base action, are named after the corresponding base action (for example, extend_search_jql, extend_view_issue).

When debugging, if you want to only test transactions in the extend_view_issue action, you can comment out other transactions in the confluence.yml config file and set the percentage of the base execution to 100. Alternatively, you can change percentages of others to 0.

1
2
3
4
5
#      perc_create_issue: 4
#      perc_search_jql: 16
      perc_view_issue: 100
#      perc_view_project_summary: 4
#      perc_view_dashboard: 8

If multiple actions are affected, add transactions to multiple extension controllers.

Extending a stand-alone transaction

You can run your script independently of the base action under a specific workload if, for example, your plugin introduces a separate URL and has no correlation to the base transactions.

In such a case, you extend the extend_standalone_extension controller, which is also located in the extension.jmx file. With this option, you can define the execution percentage by the perc_standalone_extension parameter in the confluence.yml config file.

The following configuration ensures that extend_standalone_extension controller is executed 10% of the total transactions.

1
      perc_standalone_extension: 10
Using JMeter variables from the base script

Use or access the following variables of the extension script from the base script. They can also be inherited.

  • ${blog_id} - blog post id being viewed or modified (e.g. 23766699)
  • ${blog_space_key} - blog space key (e.g. PFSEK)
  • ${page_id} - page if being viewed or modified (e.g. 360451)
  • ${space_key} - page space key (e.g. TEST)
  • ${file_path} - path of file to upload (e.g. datasets/confluence/static-content/upload/test5.jpg)
  • ${file_type} - type of the file (e.g. image/jpeg)
  • ${file_name} - name of the file (e.g. test5.jpg)
  • ${username} - the logged in username (e.g. admin)

If there are some additional variables from the base script required by the extension script, you can add variables to the base script using extractors. For more information, go to Regular expression extractors.

Modifying Selenium

In addition to JMeter, you can extend Selenium scripts to measure the end-to-end browser timings.

We use Pytest to drive Selenium tests. The confluence-ui.py executor script is located in the app/selenium_ui/ folder. This file contains all browser actions, defined by the test_ functions. These actions are executed one by one during the testing.

In the confluence-ui.py script, view the following block of code:

1
2
# def test_1_selenium_custom_action(webdriver, datasets, screen_shots):
#     custom_action(webdriver, datasets)

This is a placeholder to add an extension action. The custom action can be moved to a different line, depending on the required workflow, as long as it is between the login (test_0_selenium_a_login) and logout (test_2_selenium_z_log_out) actions.

To implement the custom_action function, modify the extension_ui.py file in the extension/confluence/ directory. The following is an example of the custom_action function, where Selenium navigates to a URL, clicks on an element, and waits until an element is visible:

1
2
3
4
5
6
7
8
def custom_action(webdriver, datasets):
    @print_timing
    def measure(webdriver, interaction):
        @print_timing
        def measure(webdriver, interaction):
            webdriver.get(f'{APPLICATION_URL}/plugins/servlet/some-app/reporter')
            WebDriverWait(webdriver, timeout).until(EC.visibility_of_element_located((By.ID, 'plugin-element')))
        measure(webdriver, 'selenium_app_custom_action:view_report')

To view more examples, see the modules.py file in the selenium_ui/confluence directory.

Running tests with your modification

To ensure that the test runs without errors in parallel, run your extension scripts with the base scripts as a sanity check.

Run 3 (~50 min)

To receive scalability benchmark results for one-node Confluence DC with app-specific actions, run bzt:

1
bzt confluence.yml

When the execution is successfully completed, the INFO: Artifacts dir: line with the full path to results directory will be displayed. Save this full path to the run results folder. Later you will have to insert it under runName: "Node 1" for report generation.

Run 4 (~50 min)

To receive scalability benchmark results for two-node Confluence DC with app-specific actions:

  1. In the AWS console, go to CloudFormation > Stack details > Select your stack.
  2. On the Update tab, select Use current template, and then click Next.
  3. Enter 2 in the Maximum number of cluster nodes and the Minimum number of cluster nodes fields.
  4. Click Next > Next > Update stack and wait until stack is updated.
  5. Make sure that Confluence index successfully synchronized to the second node. To do that, use SSH to connect to the second node via Bastion (where NODE_IP is the IP of the second node):

    1
    2
    3
    4
    ssh-add path_to_your_private_key_pem
    export BASTION_IP=bastion_instance_public_ip
    export NODE_IP=node_private_ip
    ssh -o "proxycommand ssh -W %h:%p ec2-user@$BASTION_IP" ec2-user@${NODE_IP}
  6. Once you're in the second node, download the index-sync.sh file. Then, make it executable and run it:

    1
    2
    wget https://raw.githubusercontent.com/atlassian/dc-app-performance-toolkit/master/app/util/confluence/index-sync.sh && chmod +x index-sync.sh
    ./index-sync.sh | tee -a index-sync.log

    Index synchronizing time is about 10-30 minutes. When index synchronizing is successfully completed, the following lines will be displayed in console output:

    1
    2
    3
    Log file: /var/atlassian/application-data/confluence/logs/atlassian-confluence.log
    Index recovery is required for main index, starting now
    main index recovered from shared home directory
  7. Run bzt.

    1
    bzt confluence.yml

When the execution is successfully completed, the INFO: Artifacts dir: line with the full path to results directory will be displayed in console output. Save this full path to the run results folder. Later you will have to insert it under runName: "Node 2" for report generation.

Run 5 (~50 min)

To receive scalability benchmark results for four-node Confluence DC with app-specific actions:

  1. Scale your Confluence Data Center deployment to 4 nodes the same way as in Run 4.
  2. Check Index is synchronized to new nodes the same way as in Run 4.
  3. Run bzt.

    1
    bzt confluence.yml

When the execution is successfully completed, the INFO: Artifacts dir: line with the full path to results directory will be displayed in console output. Save this full path to the run results folder. Later you will have to insert it under runName: "Node 4" for report generation.

Generating a report for scalability scenario

To generate a scalability report:

  1. Navigate to the dc-app-performance-toolkit/app/reports_generation folder.
  2. Edit the scale_profile.yml file:
    • For runName: "Node 1", in the fullPath key, insert the full path to results directory of Run 3.
    • For runName: "Node 2", in the fullPath key, insert the full path to results directory of Run 4.
    • For runName: "Node 4", in the fullPath key, insert the full path to results directory of Run 5.
  3. Run the following command:

    1
    python csv_chart_generator.py scale_profile.yml
  4. In the dc-app-performance-toolkit/app/results/reports/YY-MM-DD-hh-mm-ss folder, view the .csv file (with consolidated scenario results) and the .png file.

Analyzing report

Once completed, you will be able to review action timings on Confluence Data Center with different numbers of nodes. If you see a significant variation in any action timings between configurations, we recommend taking a look into the app implementation to understand the root cause of this delta.

After completing all your tests, delete your Confluence Data Center stacks.