-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add startup_check_interval_seconds to PodManager's await_pod_start #31008
Add startup_check_interval_seconds to PodManager's await_pod_start #31008
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Would it be possible to add/update a unit test for that one? |
I had trouble getting the unit tests running locally (while the documentation seems to be extensive though). |
With breeze they should work out-of-the-box (same with static checks). Without you fixing them, we cannot do much to merge it because it will break other people's workflows (we have 60-70 commits from about ~50 people a week, so you need to fix those to get them merged. |
Yeah, absolutely understandable. Will try to get them running and report back! |
@potiuk : Please check again. I fixed all failing tests and added a quite naive one for the newly added parameter. |
@stelsemeyer-m60 can you add entry in |
Good idea. Done ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will merge when CI is green
Thanks. Sorry, did not have the chance to trace the reason for the remaining errors yet. |
a7520e3
to
7bc5bc8
Compare
It looks like some intermittent problems were involved. I rebased the PR to re-run the tests. |
@@ -174,6 +174,7 @@ class KubernetesPodOperator(BaseOperator): | |||
during the next try. If False, always create a new pod for each try. | |||
:param labels: labels to apply to the Pod. (templated) | |||
:param startup_timeout_seconds: timeout in seconds to startup the pod. | |||
:param startup_check_interval_seconds: interval in seconds to check if the pod has already started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use startup_timeout_check_interval_seconds
instead.
@@ -595,6 +602,7 @@ def invoke_defer_method(self): | |||
should_delete_pod=self.is_delete_operator_pod, | |||
get_logs=self.get_logs, | |||
startup_timeout=self.startup_timeout_seconds, | |||
startup_check_interval=self.startup_check_interval_seconds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this does anything. The trigger uses poll_interval
instead. Feels a little duplicative. Maybe we deprecate poll_interval
and just use this new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it is not doing anything here. Will fix this.
Moreover, I have no strong opinion in terms of deprecating poll_interval
or renaming the new parameter startup_check_interval_seconds/startup_timeoutcheck_interval_seconds
to poll_interval
in the new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, going through the code base, I came across different occurrences of unparametrized time.sleep
calls (with either 1 or 2 seconds):
airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager.await_pod_start
: the startup polling, set to 1sairflow.providers.cncf.kubernetes.utils.pod_manager.PodManager.fetch_container_logs
: polling that follows the logs, set to 1sairflow.providers.cncf.kubernetes.utils.pod_manager.PodManager.await_container_completion
: polling of container state if log following is deactivated, set to 1sairflow.providers.cncf.kubernetes.utils.pod_manager.PodManager.await_pod_completion
: same as above for container completion, set to 2sairflow.providers.cncf.kubernetes.triggers.pod.KubernetesPodTrigger.run
: set to 2s
I think we want to change 1 but not 2 to 4. I am not sure for 5 though, because it does also check for the status once the pod is running, no? Please correct me if I am wrong.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
…er-m60/airflow into startup_check_interval
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
(Test if comment reopens.) |
Parametrize the interval in which the Kubernetes pod status is polled when launching a new pod.
When using serverless Kubernetes services like Google GKE Autopilot the pod startup time is sometimes expected to be longer due to a cold start. Therefore the logs might be spammed due to the default checks every second (see below), and a lower check frequency might be desired