-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry_from_failure
parameter to DbtCloudRunJobOperator
#38868
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
Hey @josh-fell, I've created this PR based on your comments here. I would really appreciate it if you could take a look at it! |
0696015
to
2ad17e3
Compare
2ad17e3
to
ca504ff
Compare
Hi @josh-fell, did you have a chance to look at this PR? Would appreciate your comments! |
Maybe we can check whether |
Hi @Lee-W, I have also implemented the And also, if |
Yep, just found it
I feel like an error might makes more sense 🤔 I don't personally use dbt that much, but I guess |
Yes, those parameters change the behavior significantly. My only concern is with
For me, it feels like second approach is more suitable as we do not limit the users, but it all depends on the Let me know what you think :) |
@boraberke Just want to confirm the second point you mentioned. Because the first run already uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this PR based on our discussion. 🚀 Thanks!
@Lee-W, I will double-check if rerun uses the overrides, i.e. runs exactly the way the first job run, then add the warning or error accordingly! Thanks for your comments! |
Many thanks! No urgent. Just change the state of this PR so that everyone would have a better understand on the status 🙂 |
I have tested and here how it works: When first run already uses However, if the first run with This means in any case where the operator had failed but dbt job is successful, when try_number > 1, it would run without the I think it is best to raise an error if any of Let me know what you think! |
Indeed, I think this is what we should do. Thanks for the testing and update! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few nitpicks, but overall it looks great!
@Lee-W, thanks for the review! Fixed upon your latest comments as well :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @boraberke !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boraberke Thanks for the contribution, killer stuff! This will be a great addition to the provider.
I'm trying to understand the what happens if a user sets retry_from_failure=True
on the operator and provides either steps_override
, schema_override
, or additional_run_config
initially and the task is naturally retried in Airflow. It seems like with the most recent changes, the task would fail because those args were supplied originally once retry_from_failure()
is called in the DbtCloudHook. Can you clarify that for me?
If yes, maybe it's worth adding the try_number
check to the DbtCloudHook.trigger_job_run()
method using get_current_context()
too and then raise a warning instead of an error? We wouldn't want to have users setup a task correctly only for it to fail because the task retried. Although it might seem redundant, adding the check again, would help keep the same functionality you propose, but applicable to users using DbtCloudHook.trigger_job_run()
directly without using the operator.
Another scenario I'm thinking about, albeit a rare one presumably, relative to the try_number
check: let's say the same task configured with retry_from_failure
and an override previously succeeded, but a user wants to clear the task so the dbt Cloud runs again because of some upstream/downstream issue in their pipelines. I would suspect a user would think that the task isn't being "retried" from a failure context, but just wants to run it again. The overrides wouldn't be used (assuming the logic is updated to a warning from above).
Maybe to alleviate both scenarios, when retry_from_failure=True
, the trigger_job_run()
method actually retrieves the job's status from dbt Cloud, assesses whether or not to call the retry endpoint based on success/failure? This would completely remove using Airflow internals to control how the job triggering behaves.
Hi @josh-fell, thank you for your feedback!
Yes, it is correct.
I agree that an additional check of the previously run jobs and deciding upon the state of the latest job would make it better. In that scenario, there will be 3 cases:
I will also change the error to a warning, and whenever I will make the necessary changes and let you know! |
# Conflicts: # tests/providers/dbt/cloud/hooks/test_dbt.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one small question.
…38868) * Add `retry_from_failure` parameter to DbtCloudRunJobOperator * Use rerun endpoint only when ti.try_number is greater than 1 * Fix docstring links * Do not allow override parameters to be used when retry_from_failure is True * Fix base endpoint url prefix * Split test cases and update docstring * Use `rerun` only if the previous job run has failed
…38868) * Add `retry_from_failure` parameter to DbtCloudRunJobOperator * Use rerun endpoint only when ti.try_number is greater than 1 * Fix docstring links * Do not allow override parameters to be used when retry_from_failure is True * Fix base endpoint url prefix * Split test cases and update docstring * Use `rerun` only if the previous job run has failed
Hi, thanks for implementing this new feature. Do you have an estimate on when this will be released? |
…38868) * Add `retry_from_failure` parameter to DbtCloudRunJobOperator * Use rerun endpoint only when ti.try_number is greater than 1 * Fix docstring links * Do not allow override parameters to be used when retry_from_failure is True * Fix base endpoint url prefix * Split test cases and update docstring * Use `rerun` only if the previous job run has failed
@boraberke : Are we using this I feel, if DAG task calling the DBT Job failed, it should trigger rerun API only if we are clearing DAG from failure task. If we are rerunning the DAG from start, it should start the DBT Job from start, rather than the failure step of previous run? Currently are we only calling rerun API or we are looking into DAG's failure task to decide between run & rerun API? |
This PR adds a new
retry_from_failure
parameter to theDbtRunJobOperator
to retry a failed run of a dbt Cloud job from the point of failure. The implementation uses the new rerun endpoint in thedbt API
which handles the lookup of the last run for a given job itself and decides whether to start a new run of the job or not.New endpoint is only used whenretry_from_failure
is True andtry_number
of the task is greater than 1.New endpoint is only used when previous dbt-cloud run was failed.
It also cannot be used in conjunction with
steps_override
,schema_override
andadditional_run_config
.Closes: #35772
See also: #38001