-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import/tpch/nodes=8 failed #34694
Comments
Seems like a node crashed but can't find anything in the logs or dmesg. My first instinct is to blame #34241 but don't have much context. |
Blaming #34241 without evidence is a bit unfair to that poor issue. The other instances of #34241 have always had a stack trace. The error message suggests to me a problem reading from GCS. @dt or @mjibson is there anything to be done here. We have retry loops on the reads from cloud providers, right? |
we have a retry around the creation of the |
Yes. This was a failure during read mid file. We can't retry ourselves because we don't know the number of bytes that have been read and can't resume from there. Some SDKs do that automatically, including I think GCS. I'm not sure what to do here besides upgrade to the latest GCS SDK and see if they had any improvements in internal retrying. |
yeah, I'm fine just closing the occasional failure due to external cloud/connectivity issues. If it were frequent and came up in actual usage, I might want add more retries. As is, I'd rather assume the cloud readers aren't too flaky and if we get an error from one, say so right away -- since maybe they should fix their network config or something -- instead of sitting in a retry loop. |
SHA: https://github.com/cockroachdb/cockroach/commits/66f13c1d9a12c31e18a198da4ff5ac0bbe2db781
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1127849&tab=buildLog
The text was updated successfully, but these errors were encountered: