Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reference to undead tasks from documentation #43536

Merged
merged 14 commits into from
Jan 26, 2025
Merged
25 changes: 11 additions & 14 deletions docs/apache-airflow/core-concepts/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,25 +167,22 @@ These can be useful if your code has extra knowledge about its environment and w

.. _concepts:zombies:

Zombie/Undead Tasks
-------------------
Zombie Tasks
------------

No system runs perfectly, and task instances are expected to die once in a while. Airflow detects two kinds of task/process mismatch:
No system runs perfectly, and task instances are expected to die once in a while.

* *Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
periodically, clean them up, and either fail or retry the task depending on its settings. Tasks can become zombies for
many reasons, including:
*Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
periodically, clean them up, and either fail or retry the task depending on its settings. Tasks can become zombies for
many reasons, including:

* The Airflow worker ran out of memory and was OOMKilled.
* The Airflow worker failed its liveness probe, so the system (for example, Kubernetes) restarted the worker.
* The system (for example, Kubernetes) scaled down and moved an Airflow worker from one node to another.
* The Airflow worker ran out of memory and was OOMKilled.
* The Airflow worker failed its liveness probe, so the system (for example, Kubernetes) restarted the worker.
* The system (for example, Kubernetes) scaled down and moved an Airflow worker from one node to another.

* *Undead tasks* are tasks that are *not* supposed to be running but are, often caused when you manually edit Task
Instances via the UI. Airflow will find them periodically and terminate them.


Below is the code snippet from the Airflow scheduler that runs periodically to detect zombie/undead tasks.
Below is the code snippet from the Airflow scheduler that runs periodically to detect zombie tasks.

.. exampleinclude:: /../../airflow/jobs/scheduler_job_runner.py
:language: python
Expand Down
3 changes: 3 additions & 0 deletions docs/apache-airflow/redirects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,6 @@ howto/define_extra_link.rst howto/define-extra-link.rst

# Use test config (it's not a howto for users but a howto for developers so we redirect it back to index)
howto/use-test-config.rst index.rst

# Removing reference to undead tasks
core-concepts/tasks.html#zombie-undead-tasks core-concepts/tasks.html#zombie-tasks
Loading