Last updated Sep 30, 2024

SLOs and eventual consistency

DROID uses TCS Sidecar for its access layer, which is eventually consistent. Eventual consistency is a trade-off for optimal latency and reliability. Due to its layered architecture, we need several SLOs to communicate and track expected performance.

Read-through DROID integrations have a lower integration and ongoing cost, in exchange for relaxed consistency guarantees. SLOs are tracked separately for the two DROID integration types.

Unsure which type applies to your use-case? See the DROID Registry.

Reliability and performance

TCS Sidecar GET

Tome Capability

Percentage (%) of GET responses from TCS Sidecars returned within <20ms.

Under significant load, a misconfigured TCS sidecar (e.g. poor cache-hit ratio) could exceed this threshold. For troubleshooting performance issues, see our DROID dashboards guide.

TCS Sidecar will perform optimally in supported Micros environments. In other environments, cross-region latency will impact performance on cache-miss.

TCS Sidecar GET returns non-5xx status

Tome Capability

Percentage (%) of GET requests returning valid, non-erroneous responses.

Authorisation failures (401, 403) and rate-limiting (429) responses are not counted.

Consistency

TCS End-to-End (E2E) Invalidation Delay

Tome Capability

Percentage (%) of DROID key updates invalidated within advertised delay tolerance.

End-to-end (E2E) delay is measured as the time period from when a producer publishes to StreamHub, until the TCS sidecar is notified.

Consistent responses from TCS sidecar

Tome Capability

Percentage (%) of sidecar responses matching the latest known version of requested keys.

TCS Sidecar may occasionally return a 'stale' version, e.g. replication delay for an upstream dependency may cause invalidation delay. If your use-case is sensitive to eventual consistency, your client retry behaviour should align with the matching invalidation delay SLO threshold.

Stale response detection depends on the E2E delay of updates. Responses within the delay window will not be marked stale/inconsistent.

Example: Where a key is updated at t=0s, it may take until t=3s for the sidecar to be notified. A request for this key at t=2s will not impact the consistency SLO, but will return the old value. A request at t=4s will return the new value - unless TCS (or its dependencies) are degraded. If a newer version is not returned after the sidecar is notified, a consistency SLO failure is published.

Using dashboards

DROID dashboards can help to verify performance and consistency for your service.

Monitoring for DROID producers

External Ingestion producers should monitor TCS ingestion metrics. (External Ingestion monitoring)

Read-through producers should implement monitoring and SLOs for TCS endpoints. (Read-through monitoring)

DROID dependencies

StreamHub

DROID depends on StreamHub for both integration types. StreamHub's SLOs can be found here.

Transformer Service

DROID's External Ingestion integration type depends on Transformer Service.

Rate this page: