-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent nondeterministic errors in dbt run and dbt seed when using multiple threads #41
Comments
How much threads do you use? Are you sure you don't get throttled by AWS? |
I'm using 5 threads. The service quotas for Athena are at least 20 simultaneous queries: Even if you exceed the quota, what I understand is that what Athena would do is place your queries into a queue. Looking at the adapter's code, it sounds to me like throttling should give a different exception, which the adapter will catch and retry: retry_config=RetryConfig(
attempt=creds.num_retries,
exceptions=(
"ThrottlingException",
"TooManyRequestsException",
"InternalServerException",
), Googling a bit I find this this boto3 issue with a similar error I also see sometimes ( Looking at DBT issues, this one mentions that their intent is to have one connection object per thread. Superficially it seems to me that the Athena adapter is abiding by that, but I'm new to this codebase so maybe I'm missing something. The connection objects being used here, if I understand, are from PyAthena, which adds another layer of complexity—the interaction whereby the boto3 client objects would be created spans three codebases that I'm unfamiliar with. |
@Dandandan Maybe relating, I have noticed on query failures, such as |
Yeah this was something recently noticed too. I guess we have to specify the specific exceptions here as well. |
But I think it's not related to this issue. |
I tried the fix with 20 threads at a time and I didn't see any errors now. Thanks folks! |
Using multiple threads to rebuild models is a huge performance win under Athena because it just scales up like crazy, but when I use it with this adapter I randomly get strange, nondescript errors that say things like
credential_provider
that sound like they're component or variable names in the AWS client library:Workarounds
dbt run --threads 1 --select hhs_hospitals+
)The text was updated successfully, but these errors were encountered: