-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-impala: Handle errors on refresh/invalidate metadata #2511
Conversation
projects/vdk-plugins/vdk-impala/src/vdk/plugin/impala/impala_error_handler.py
Outdated
Show resolved
Hide resolved
I do not fully comprehend the problem and it clashes of my (independent) understanding of the overall problem. Related problems I know about are A)Impala is eventually consistent and sometimes raises AuthorizationException even if the data job user has been authorized recently. So re-trying with back-off makes sense while waiting for new authorization rules to sync B)Impala recovery mechanism when querying a table and it detects that the table metadata is out of sync, tries to execute invalidate metadata on that table (to sync it) before retrying the original query. But if it doesn't have "write" permission to that table, invalidate metadata fails. No amount of re-trying would prevent that since the permission would not change. It still would not have "write" permission. The code seems like a solution for A) but the description seems more like B) So I am confused. Is there option C I am not seeing ? |
Hi, @antoniivanov When an authorization error is raised by impala, the message gives indication what type of operation has been attempted. |
b0f3a7c
to
3247472
Compare
Currently, if a data job executes a select query, which fails due to some metadata-related error, the error handling logic executes refresh/invalidate metadata query to sync the table metadata and retry the initial query. This works if the data job has write access to the table. However, if the job has only read access, the refresh/invalidate metadata query will fail with authorization error. This will lead to overall job execution failure, which is not ideal, as vdk, not the job itself is trying to execute an illegal operation. This change updates the error handling logic, so that when an authorization error on refresh/invalidate metadata is raised, vdk will re-try the query with a backoff (if the job does not have write access, we cannot know when the metadata for the table has been updated, and the best we can do is sleep and re-try). Testing Done: Added test Signed-off-by: Andon Andonov <[email protected]>
Currently, if a data job executes a query, which fails due to metadata issue, the ImpalaErrorHandler logic tries to refresh or invalidate the metadata, and re-try the query. If the job has only read access to the table, an authorization error is raised and the job failes with User Error. This should not happen, as the refresh/invalidate operation is initiated by vdk, not the user. This change updates the error handler logic by adding try/except blocks around refresh/invalidate metadata operations, so that errors that arise from these operations do not cause job failures. Testing Done: Added test Signed-off-by: Andon Andonov <[email protected]>
3247472
to
d963763
Compare
Hi, @antoniivanov I've addressed the feedback and updated the comments and PR description. |
Looks good to me. Thanks. I updated the title. The prefix is just the name of the plugin (vdk-impala) without [vdk-plugins] |
Currently, if a data job executes a query, which fails due to metadata issue, the ImpalaErrorHandler
logic tries to refresh or invalidate the metadata, and re-try the query. If the job has only read
access to the table, an authorization error is raised and the job failes with User Error. This should
not happen, as the refresh/invalidate operation is initiated by vdk, not the user.
This change updates the error handler logic by adding try/except blocks around refresh/invalidate
metadata operations, so that errors that arise from these operations do not cause job failures.
Testing Done: Added test