Last updated Sep 24, 2024

Investigating alerts triggered by your DROID detectors

This page intends to provide guidance on how to investigate alerts generated by the detectors you previously set up using our Setup anomaly detection tutorial. If you haven't done so already, it is recommended to go through the tutorial to set up anomaly detection for your integration first.

Ingestion error alerts

These alerts are generated by the detector you setup here.

Example alert title

Alert title: "Ingestion parsing errors attributed to service my-service (Production)"

Cause

DROID is failing to parse the payload of your service's ingestion events sent via Streamhub.

Although Streamhub validates the schema of the events at the source, there is some additional validation performed by DROID that is not possible via Streamhub, meaning it is still possible for part of the payload to be malformed.

It is recommended to check the following:

To aid with finding the root cause, you can also check the Transformer Service's logs to track down the exact error encountered with the following Splunk query:

1
2
`micros_transformerservice` env=prod* level IN (error, warn) logger_name="*.ExternalRecordParser" contextMap.streamHubEventId="YOUR_STREAMHUB_EVENT_ID"

Transformation error alerts

These alerts are generated by the detector you setup here.

Example alert title

Alert title: "Transformation failures attributed to transformer my-service (Production)"

Cause

DROID is failing to transform the ingestion entity that was sent by your service.

After DROID validates your Streamhub event payload, it will attempt to run the transformer associated with your entity type. By default, your transformer will behave as a pass-through transformer, meaning it's unlikely that you see any transformation errors. However, if your transformer contains additional business logic that goes beyond the default behaviour it is likely that your transformer has failed to find the expected properties sent as part of the content field in one or more of your ingestion entities.

It is recommended to review your transformer's code and ensure that your service is sending data in the expected format. To find the root cause of the transformation error, you can check the Transformer Service's logs with the following Splunk query:

1
2
`micros_transformerservice` env=prod* level IN (error, warn) logger_name="*YOUR_TRANSFORMER_JAVA_CLASS_NAME*"

Ingestion traffic anomalies

These alerts are generated by the detector you setup here.

Example alert title

This detector can generate alerts for both abnormal growth and abnormal drops in ingestion traffic.

For abnormal ingestion traffic growth you may see an alert titled: "Abnormal growth in ingestion traffic (>50%) attributed to my-service (Production)". For abnormal ingestion traffic drops you may see an alert titled: "Abnormal drop in ingestion traffic (<50%) attributed to my-service (Production)".

Cause

The detector checks the number of ingestion entities sent by your service for the past day and compares this to number of entities sent the same day one week ago. If the variance is greater than the threshold set in your detector (e.g. 50%), an alert is generated.

It is important that you verify if the growth or drop in traffic was anticipated.

If the growth or drop in traffic was expected:

  • Communicate this to the DROID team, as it may have cost implications to DROID.
  • It's totally fine if traffic patterns change over time, but it's also important to review thresholds set in your detectors to ensure they are still relevant, and to avoid unnecessary alerts.

If the growth or drop in traffic was not expected:

Additional resources

Here are some additional resources that may help you in your investigation:

Rate this page: