This page intends to provide guidance on how to investigate alerts generated by the detectors you previously set up using our Setup anomaly detection tutorial. If you haven't done so already, it is recommended to go through the tutorial to set up anomaly detection for your integration first.
These alerts are generated by the detector you setup here.
Alert title: "Backing store API fetch errors attributed to service my-service (Production)"
DROID has encountered one or more errors when trying to fetch records from your Read-through service's backing store API.
Errors can occur for a number of reasons, such as DROID being unable to reach your service (e.g. a transient network issue on either end) or your service returning an unexpected response.
It is recommended to:
These alerts are generated by the detector you setup here.
Alert title: "Backing store API latency is too high attributed to service my-service (Production)"
DROID has detected that the mean latency of your Read-through service's backing store API has exceeded the threshold set in your detector.
High latency can be caused by a number of factors, such as high load on your service & cross-region network latency.
It is recommended to:
These alerts are generated by the detector you setup here.
Alert title: "Invalidation event processing errors attributed to service my-service (Production)"
DROID has encountered one or more errors when trying to process cache invalidation events sent by your service via Streamhub.
Although Streamhub validates the schema of the events at the source, there is some additional validation performed by DROID that is not possible via Streamhub, meaning it is still possible for part of the payload to be malformed.
It is recommended to check the following:
To aid with finding the root cause, you can also check the Tenant Context Service's logs to track down the exact error encountered with the following Splunk query:
1 2`micros_tenant-context-service` env=prod* logger_name="*.StreamhubReceiver" contextMap.streamhubEventId="YOUR_STREAMHUB_EVENT_ID"
These alerts are generated by the detector you setup here.
This detector can generate alerts for both abnormal growth and abnormal drops in ingestion traffic.
For abnormal ingestion traffic growth you may see an alert titled: "Abnormal growth in invalidation events received (>50%) attributed to my-service (Production)". For abnormal ingestion traffic drops you may see an alert titled: "Abnormal decrease in invalidation events received (<50%) attributed to my-service (Production)".
The detector checks the number of invalidation events sent by your service for the past day and compares this to number of entities sent the same day one week ago. If the variance is greater than the threshold set in your detector (e.g. 50%), an alert is generated.
It is important that you verify if the growth or drop in traffic was anticipated.
If the growth or drop in traffic was expected:
If the growth or drop in traffic was not expected:
Here are some additional resources that may help you in your investigation:
Rate this page: