-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Error Handling Fixes [VS-261] #7841
Conversation
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7841 +/- ##
================================================
Coverage ? 86.293%
Complexity ? 35189
================================================
Files ? 2170
Lines ? 164876
Branches ? 17784
================================================
Hits ? 142277
Misses ? 16276
Partials ? 6323 |
@@ -85,63 +87,6 @@ workflow GvsQuickstartIntegration { | |||
} | |||
} | |||
|
|||
task Setup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost completely a cut/paste of a task from the quickstart integration test to GvsUtils.wdl
so it could be reused.
break; | ||
} catch (Exception e) { | ||
@SuppressWarnings("ThrowableNotThrown") | ||
StatusRuntimeException se = handleCausalStatusRuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this method doesn't actually handle anything, maybe rename to extract
instead of handle
or something… but only if you're making another commit to this PR anyways
0c9ef52
to
ebe9b72
Compare
ebe9b72
to
5744b0b
Compare
The main issue was that the
StatusRuntimeException
s that the baseline error handling code was trying to catch in practice always seem to be wrapped in at least one layer of exception of a different type. There was no catch handing for these wrapper exception types so theCreateVariantIngestFiles
tool would simply crash.The changes here also more generally try to follow the recommendations in the BQ Write API documentation, in particularclose
ing theJsonStreamWriter
before retrying error codes not explicitly called out by the documentation.EDIT: actually closing the writer didn't work out too well as we use the writer in
PENDING
mode and closing it seems to lose all pending writes. 😬 So in this circumstance we just throw and let WDL-levelmaxRetries
start the data loading over from the beginning.An exponential backoff was also added before retry attempts.
Parallel logic was also added to load status writing which should reduce (but not eliminate) the possibility of inconsistent sample status writes that require manual intervention. There is still the possibility of an inopportunely timed preemption, which is why VS-262 exists.
All of the WDL changes here are in support of a 2000-sample tieout, a large enough set that intermittent BigQuery errors are almost always observed. The tieout confirms that errors of the two major classes are seen (retryable and non-retryable) and that the number of rows per sample in the tieout dataset matches those in a reference dataset.