DROID (Distributed Read Optimised Inter-region Datastore) is a highly reliable, available, read-optimised cross-region platform solution for metadata distribution. The DROID sidecar runs alongside your application providing a reliable low latency metadata cache access with invalidation handling.
DROID is an evolution of the Tenant Context Service (TCS) infrastructure to support metadata distribution beyond the original Tenant Platform use case.
With DROID we are building on and evolving the proven TCS infrastructure to allow more teams and systems to leverage the high availability and low latency of TCS as a general read-optimized metadata caching platform.
DROID is an evolution of a current running platform in TCS, not a new one. We are building on a foundational high-scale system that has already solved the problem of metadata distribution at scale for Tenant Platform.
Even without DROID, several key pieces described below have been built (especially sharding) as TCS usage continues to grow. This means the difference between DROID and organic TCS evolution is actually quite low.
The Context team have many years of experience operating a multi-region, tier 0, metadata distribution system at scale. TCS was built to solve a hard problem - reliable distribution and cache invalidation for tenant metadata. This problem is not local to Tenant Platform and Atlassian as a whole has needed a system to solve it. Our vision is to have the DROID sidecar running on every node providing reliable fast access to metadata.
Rather than every team trying to solve this, we aim to open up our metadata distribution platform so teams can easily onboard and benefit as we evolve our system.
Several features we are building as part of DROID (for example TCS sharding, dynamic entity types) are also needed for TCS as our usage more than doubles every 12 months. The gap between what we need to evolve TCS organically and what we need for DROID is actually not that large. Extended spikes have validated the size of this gap. TCS is already a critical piece of the Atlassian cloud platform and is a natural fit to grow to a more general platform offering.
We have been working on DROID focusing on building out the foundations of the system, spiking new ingestion methods and working with early adopters.
DROID has been progressing while the team continues to support TCS as a tier-0 system and delivered other non-DROID programs like Fedramp, DaRes, Sliver, External User Security, UPP and Bitbucket/Trello Admin Hub Integration.
To help us focus on TCS/DROID we formed a new team called Tenant Catalogue supports and evolve the current Cloud Provisioner → Tenant Catalogue → Transformer Service pipeline.
We started this journey with a refactor of the core of TCS’s invalidations engine; moving from SQS to S3. This was necessary due to scaling limit and cost constraints.
A key part of DROID will be the ability to shard by record types. This is needed to limit the blast area of a misbehaving or overloaded record type. To set this up we have separated record types in code.
Prior to DROID the only ingestion path for metadata to TCS was via the pipeline:
with this metadata forming part of the Catalogue Service Record. This pattern has served us well and will continue to be the ingestion pipeline for CSR metadata.
However, this method has the following limitations on wider usage:
For DROID we have built two methods to ingest metadata - Read-through and External Ingestion.
Integration Type | Overview |
---|---|
Read-through | TCS loads metadata from external backing stores (REST endpoints) in case of cache misses. |
External Ingestion | Metadata Providers publish source records to StreamHub for DROID to listen to, transform and store in DynamoDB. Cache misses will be loaded from DynamoDB. |
Which system to use will depend on the needs, data size and access patterns of the systems we are integrating with.
Both of these methods allow us to bypass the CP -> Catalogue Record pipeline. DROID allows TCS to distribute metadata for systems without requiring CP or TC integration which should greatly simplify onboarding.
Both of these are in the early beta phase with selected systems, though the onboarding process is currently a high-touch “white glove” approach while we build and learn from the first couple of integrations.
Rate this page: