You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a difference between how the ingest job tracker handles bulk import and standard ingest.
For bulk import, a run of a job starts when it's validated in the bulk import starter lambda. There's then a separate started status update when the Spark driver picks up the job. These are correlated into a single run of the job with a job run ID that's generated in the starter lambda and passed into the Spark driver. The validation and job run ID generation happens in BulkImportExecutor, and the further status updates are done in BulkImportJobDriver.
For standard ingest, a run of a job starts when an ingest task receives the job from the queue and starts running it. The updates were previously correlated by just the ID of the task, but there's now a job run ID, as there is in bulk import. The status updates and job run ID generation happen in IngestTask.
When the updates were correlated by task ID for standard ingest, we were detecting the start of a new run in JobRunsBuilder by checking for whether the status update should start a new run or not. Because the started status update is only the start of a run for standard ingest, there's a flag on that update for whether it starts a run or not. This is no longer used outside of tests, since everything uses the job run ID instead.
Description
We'd like to make the job run ID mandatory in updates to the ingest job tracker, and remove the flag that sets whether an ingest started status update should start a run or not.
This should avoid confusion in tests when no job runs are detected because the flag was not set. We had some trouble with this when we adjusted the logic for how the tracker updates are applied in #4253.
Analysis
There are a number of tests that still correlate the status updates based on the task ID, and don't set a job run ID. We'll need to adjust those tests to set a job run ID.
We can then make the job run ID mandatory in the ingest job events that have it by adding a requireNonNull call in the constructor (e.g. in IngestJobStartedEvent).
We can then remove the code to correlate on task ID in JobRunsBuilder.getBuilderIfCorrelatable, and remove the startOfRun field from IngestJobStartedEvent and IngestJobStartedStatus.
The text was updated successfully, but these errors were encountered:
Background
There's a difference between how the ingest job tracker handles bulk import and standard ingest.
For bulk import, a run of a job starts when it's validated in the bulk import starter lambda. There's then a separate started status update when the Spark driver picks up the job. These are correlated into a single run of the job with a job run ID that's generated in the starter lambda and passed into the Spark driver. The validation and job run ID generation happens in BulkImportExecutor, and the further status updates are done in BulkImportJobDriver.
For standard ingest, a run of a job starts when an ingest task receives the job from the queue and starts running it. The updates were previously correlated by just the ID of the task, but there's now a job run ID, as there is in bulk import. The status updates and job run ID generation happen in IngestTask.
When the updates were correlated by task ID for standard ingest, we were detecting the start of a new run in JobRunsBuilder by checking for whether the status update should start a new run or not. Because the started status update is only the start of a run for standard ingest, there's a flag on that update for whether it starts a run or not. This is no longer used outside of tests, since everything uses the job run ID instead.
Description
We'd like to make the job run ID mandatory in updates to the ingest job tracker, and remove the flag that sets whether an ingest started status update should start a run or not.
This should avoid confusion in tests when no job runs are detected because the flag was not set. We had some trouble with this when we adjusted the logic for how the tracker updates are applied in #4253.
Analysis
There are a number of tests that still correlate the status updates based on the task ID, and don't set a job run ID. We'll need to adjust those tests to set a job run ID.
We can then make the job run ID mandatory in the ingest job events that have it by adding a requireNonNull call in the constructor (e.g. in IngestJobStartedEvent).
We can then remove the code to correlate on task ID in JobRunsBuilder.getBuilderIfCorrelatable, and remove the startOfRun field from IngestJobStartedEvent and IngestJobStartedStatus.
The text was updated successfully, but these errors were encountered: