You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a retry mechanism for the polling process in the HexRunProjectOperator to handle temporary API failures.
Description
Currently, when the HexRunProjectOperator is set to run synchronously, it polls the Hex API at regular intervals to check the status of a project run. If an API call fails during this polling process, the entire task is marked as failed in Airflow. This can lead to unnecessary task failures, especially when using a high polling frequency.
I propose adding a retry mechanism for these API calls to improve the robustness of the operator and reduce false failure reports.
Proposed Changes
Add new parameters to the HexRunProjectOperator:
max_poll_retries: Maximum number of retries for a failed poll (default: 3)
poll_retry_delay: Delay between retries in seconds (default: 5)
Modify the run_and_poll method in the HexHook class to implement the retry logic:
Wrap the run_status call in a retry loop
Use exponential backoff for retry delays
Only raise an AirflowException if all retries are exhausted
Update the operator's documentation to reflect these new parameters and behavior
Implementation Details
Use Airflow's built-in retry utilities if available, or implement a custom retry decorator
Ensure that the total time spent on retries counts towards the overall timeout parameter
Summary
Implement a retry mechanism for the polling process in the
HexRunProjectOperator
to handle temporary API failures.Description
Currently, when the
HexRunProjectOperator
is set to run synchronously, it polls the Hex API at regular intervals to check the status of a project run. If an API call fails during this polling process, the entire task is marked as failed in Airflow. This can lead to unnecessary task failures, especially when using a high polling frequency.I propose adding a retry mechanism for these API calls to improve the robustness of the operator and reduce false failure reports.
Proposed Changes
Add new parameters to the
HexRunProjectOperator
:max_poll_retries
: Maximum number of retries for a failed poll (default: 3)poll_retry_delay
: Delay between retries in seconds (default: 5)Modify the
run_and_poll
method in theHexHook
class to implement the retry logic:run_status
call in a retry loopAirflowException
if all retries are exhaustedUpdate the operator's documentation to reflect these new parameters and behavior
Implementation Details
timeout
parameterExample Usage
Acceptance Criteria
HexRunProjectOperator
acceptsmax_poll_retries
andpoll_retry_delay
parametersAdditional Notes
HexHook
The text was updated successfully, but these errors were encountered: