Remove start of run flag from ingest job tracker #4256

patchwork01 · 2025-02-13T12:30:03Z

Background

There's a difference between how the ingest job tracker handles bulk import and standard ingest.

For bulk import, a run of a job starts when it's validated in the bulk import starter lambda. There's then a separate started status update when the Spark driver picks up the job. These are correlated into a single run of the job with a job run ID that's generated in the starter lambda and passed into the Spark driver. The validation and job run ID generation happens in BulkImportExecutor, and the further status updates are done in BulkImportJobDriver.

For standard ingest, a run of a job starts when an ingest task receives the job from the queue and starts running it. The updates were previously correlated by just the ID of the task, but there's now a job run ID, as there is in bulk import. The status updates and job run ID generation happen in IngestTask.

When the updates were correlated by task ID for standard ingest, we were detecting the start of a new run in JobRunsBuilder by checking for whether the status update should start a new run or not. Because the started status update is only the start of a run for standard ingest, there's a flag on that update for whether it starts a run or not. This is no longer used outside of tests, since everything uses the job run ID instead.

Description

We'd like to make the job run ID mandatory in updates to the ingest job tracker, and remove the flag that sets whether an ingest started status update should start a run or not.

This should avoid confusion in tests when no job runs are detected because the flag was not set. We had some trouble with this when we adjusted the logic for how the tracker updates are applied in #4253.

Analysis

There are a number of tests that still correlate the status updates based on the task ID, and don't set a job run ID. We'll need to adjust those tests to set a job run ID.

We can then make the job run ID mandatory in the ingest job events that have it by adding a requireNonNull call in the constructor (e.g. in IngestJobStartedEvent).

We can then remove the code to correlate on task ID in JobRunsBuilder.getBuilderIfCorrelatable, and remove the startOfRun field from IngestJobStartedEvent and IngestJobStartedStatus.

patchwork01 added the core-module label Feb 13, 2025

patchwork01 added this to the 0.29.0 milestone Feb 13, 2025

patchwork01 self-assigned this Feb 19, 2025

This was referenced Feb 19, 2025

Issue 4256 - Remove job tracker start of run flag #4284

Merged

Stop job run model caring about update types #4291

Closed

patchwork01 closed this as completed in #4284 Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove start of run flag from ingest job tracker #4256

Remove start of run flag from ingest job tracker #4256

patchwork01 commented Feb 13, 2025 •

edited

Loading

Remove start of run flag from ingest job tracker #4256

Remove start of run flag from ingest job tracker #4256

Comments

patchwork01 commented Feb 13, 2025 • edited Loading

Background

Description

Analysis

patchwork01 commented Feb 13, 2025 •

edited

Loading