Last updated Sep 24, 2024

Integrating with Batch Ingestion

Background

Batch ingestion isn’t done via StreamHub because of some concerns about their ability to handle large scale bulk events that are required in batch ingestion.

Instead, you will be implementing some REST endpoints that DROID can reach out to and fetch all your source data in a paginated fashion.

Step 1: Implement the Scan and Query Batch Ingestion Endpoint

We require that you implement 2 endpoints with specific contracts for each of your entity types: one to provide us with list of Ids (paginated): scan, another to fetch content for a list of Ids: query. Details can be found here: DROID External Ingestion Batch ingestion SPI

(Optional) Step 2: Backfill your initial dataset

External Ingestion Integration type aims to maintain a copy of your source data in DROID. Therefore, a backfill is needed whenever there is a discrepancy in the data between your source and DROID. For example,

if you are a new producer with an existing dataset not in DROID,
when the transformer is updated

Perform a backfill by raising a JSD request in DROID help channel here. (Please select Yes for the question Is this a DROID external transformer to ensure the query is routed to the DROID team.)

Batch ingestion considerations

When doing batch ingestions we will call your endpoint with pre-determined rate limits for scan and query endpoints. Please reach out to us if your service needs lower limits or you think a higher limit is warranted.

Default scan and query rate limits

Scan: 1 worker doing up to 1 request per second
Query: 10 workers doing up to 20 requests per second (across all workers), each request is for a batch of 50 entities