Internal use only

Adding a Task to the EPM Component Workflow Orchestrator

This document outlines the steps required to add a new task to the EPM Component Workflow Orchestrator. Follow the instructions carefully to ensure successful integration.

Pre-Requisite

Before proceeding to add a task to workflow, it is recommended to watch this demo for a quick example of creating a workflow and invoking it end to end.

This guide assumes that the functionality you want to invoke as part of your task are identified and agreed upon with the team.

Step 1: Create an enum representing your task.

Go to the class CriticalWorkflowName and create an ENUM representing your task. For the sake of this guide, we will name our task DEMO_TASK.

Step 2: Create a Runner class that inherits the BaseWorkflowRunner class.

Under the package src/main/kotlin/io/atlassian/micros/responsibility/model/workflow/scheduler/v1/runner/concrete, create a class that while be responsible for invoking the service layer that has your functionality in it.

For this example, we'd create a DemoWorkflowRunner class. It needs to override and implement the runWorkflow function.
In the runWorkflow function, you can define your business logic of the task.

Step 3: Add the task to the Component Workflow Orchestrator

Identify which order you want to task to execute in. Once identified, create an instance of your runner class you created in Step 2 above.

Example:

1
2
    runner = DemoWorkflowRunner(
        // Define your Constructor fields to inject
        epmWorkflowTrackingHistoryRepository,
        meterRegistry
        // Etc.
    )

Step 4: Running and validate the task via Component Workflow

This step will show how to run the workflow orchestrator and validate functionalities of your new task provided by the workflow orchestrator.

Local/Staging with DB Access

Run the workflow orchestrator by invoking the API endpoint /api/asyncJobs/componentWorkflows.
Check that the orchestrator kicked off the workflow in the async_job_tracking_history table. You should see a row with job_type=COMPONENT_WORKFLOW_SCHEDULER showing status IN-PROGRESS.
Check that your task is being executed in the epm_workflow_status_tracking_history table. Depending on the order in which your task is defined, you will not see a row for your task until the previous tasks are completed. In our case, we should see a row where workflow_name=DEMO_WORKFLOW with the appropriate status.
Once the task is complete, look for the status to show SUCCESS.

Staging/Prod without DB Access

Run the workflow orchestrator by invoking the API endpoint /api/asyncJobs/componentWorkflows. The response object will return an id
Invoke the jobStatus GET API via endpoint /api/asyncJobs/{asyncJobId} where the asyncJobId is the id from step 1.
You will see a response object with each task in the workflow as it gets kicked off. Each invocation of this endpoint will add more data as each task gets kicked off in the codebase. Wait for your task to kickoff/complete and the response object will show details of the data. In our case, we can expect to see "workflowName":"DEMO_WORKFLOW" with a "status":"SUCCESS".

Step 5: Metrics in Staging and Prod

Once tested in Staging, it is recommended to create a chart for your task on the WorkflowOrchestrator SignalFX Dashboard. Default metrics provided in the orchestrator should have a chart for your task. These would include job Success/Failure and duration. Reference existing charts on the dashboard for a blueprint on how to create one for your workflow.

Debugging via Splunk

One of the enhancements introduced via the workflow orchestrator is to enable debugging in Splunk with easy lookup. If at any point during your implementation of a new task/workflow you find that an error has occurred, a task status FAILURE is returned or an unknown exception has occurred, you can leverage Splunk to debug and pinpoint the cause.

The way this is achieved is by having EPM backend code use one traceId for the duration of a Workflow.

To get a trail of Workflow execution from Start to Finish, we need to find the traceId pertaining to the workflow or task you executed and want to root cause on.
The easiest way to do this is by querying in Splunk the first log in the workflow code path. This lives in the BaseComponentWorkflowOrchestrator class where we log Creating Workflow Orchestrator Job.
Go to Splunk and query for this log with our service name and environment of interest. Example:

1
2
`micros_responsibility-model` env="prod-east" "Creating Workflow Orchestrator Job."

You should see an output containing a traceId as in the example below.

1
2
{ [-]
   ec2: { [+]
   }
   env: prod-east
   level: INFO
   logger_name: io.atlassian.micros.responsibility.model.service.workflow.scheduler.v1.orchestrator.BaseComponentWorkflowOrchestrator
   m: { [+]
   }
   message: Creating Workflow Orchestrator Job.
   micros_container: responsibility-model
   spanId: 1edce45e4364d14d
   thread_name: boundedElastic-170
   time: 2025-03-18T15:51:57.548643498Z
   traceId: 7cc02e860238ab5dff4543783114fb6b
   traceSampled: false
}

Now you can take that traceId and filter your Splunk search to return only the logs pertaining to it as shown below. What you will see is an end to end flow of the logs. If you sort results in descending order, you will likely see what caused your workflow to fail within the first page of results. This is by design as the orchestrator is written to catch and log any exceptions and stop the workflow when discovered. Meaning, very few logs should show up after the exception.

1
2
`micros_responsibility-model` env="prod-east" "7cc02e860238ab5dff4543783114fb6b"