Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add startup_check_interval_seconds to PodManager's await_pod_start #34231

Merged
merged 24 commits into from
Nov 1, 2023

Conversation

stelsemeyer-m60
Copy link
Contributor


Parametrize the interval in which the Kubernetes pod status is polled when launching a new pod.

When using serverless Kubernetes services like Google GKE Autopilot the pod startup time is sometimes expected to be longer due to a cold start. Therefore the logs might be spammed due to the default checks every second (see below), and a lower check frequency might be desired

[2023-05-02, 05:33:22 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:23 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:24 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:25 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:26 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:27 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:28 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:29 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:30 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:31 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:32 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:33 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:34 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:35 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:36 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
[2023-05-02, 05:33:37 UTC] {pod_manager.py:187} WARNING - Pod not yet started: some-pod-he2j8139
...

@boring-cyborg boring-cyborg bot added area:providers kind:documentation provider:cncf-kubernetes Kubernetes provider related issues provider:google Google (including GCP) related issues labels Sep 9, 2023
@stelsemeyer-m60
Copy link
Contributor Author

Refers to #31008.
Finishing off work in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add an else after

if delta.total_seconds() >= self.startup_timeout:
message = (
f"Pod took longer than {self.startup_timeout} seconds to start. "
"Check the pod events in kubernetes to determine why."
)
yield TriggerEvent(
{
"name": self.pod_name,
"namespace": self.pod_namespace,
"status": "timeout",
"message": message,
}
)
return

and await asyncio.sleep(self.startup_check_interval)

@stelsemeyer-m60 stelsemeyer-m60 marked this pull request as ready for review September 12, 2023 23:02
@eladkal eladkal requested a review from hussein-awala October 1, 2023 09:15
@eladkal eladkal merged commit 2b0bfea into apache:main Nov 1, 2023
Copy link

boring-cyborg bot commented Nov 1, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
…pache#34231)

* add startup_check_interval_seconds

* change default value in method

* fix static checks, add missing param, fix typo

* default is 1s

* fix outdated docs

* add test to check time.sleep is called with specific value

* add more documentation

* rephrase

* add startup_check_interval_seconds

* change default value in method

* fix static checks, add missing param, fix typo

* default is 1s

* fix outdated docs

* add test to check time.sleep is called with specific value

* add more documentation

* rephrase

* add sleep in else clause

* Update airflow/providers/cncf/kubernetes/triggers/pod.py

---------

Co-authored-by: eladkal <[email protected]>
Co-authored-by: Hussein Awala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:documentation provider:cncf-kubernetes Kubernetes provider related issues provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants