DAG Run Started Time Appears To Move As the DAG executes, which leads to misleading durations #30656
-
Apache Airflow versionOther Airflow 2 version (please specify below) What happenedThis is happening in Airflow 2.3.4, and thus may have been fixed in later versions We've been running DAGs for a while in our production Airflow setup, however we have a problem which is that as DAGs execute the value that displays next to What you think should happen insteadI would expect that the start time would reflect the start of the execution of the first task in the DAG and then the end would be the end time of the latest executing task in the DAG. How to reproduceRun a DAG that runs for a particularly long period of time and watch as the Operating SystemDebian GNU/Linux 11 (bullseye) Versions of Apache Airflow Providersapache-airflow-providers-cncf-kubernetes==4.3.0 DeploymentOther Deployment detailsWe're using a modified version of the official Helm Chart deployed in GKE, to run tasks we're using the CeleryExecutor atop Kubernetes in GKE. Anything elseTried to dig through the code for the UI but I didn't get very far so wanted to raise this as a bug to see if there was any more help to be had here. Are you willing to submit PR?
Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
Can you please show some screenshots of what you observe - Ideally showing the tasks in progress - several of them during the execution at different times? I just want to make absolutely sure that the start date /end date and "running DAGs" are well understood here. |
Beta Was this translation helpful? Give feedback.
-
So here's an example of the kind of thing we see, this DAG is set up to run every three hours, and as we can see from the data interval it should start running at some point after 2pm BST. Now if we look into the actual tasks we can see in the screenshot below that the first task, which has one downstream, which further has everything else in the DAG as a downstream starts at 14:00:14 BST and ran for around 17 minutes which is in-line with what we'd expect. However if we look again at the first screenshot it reports that the DAG Run started at 15:57:15 BST, which is clearly wrong. Now curiously, the times reported by the DAG Run are the same as the start and end times of the last task in the DAG to run chronologically. Could it possibly be that the start and end time are both getting updated when it should only be the end time? |
Beta Was this translation helpful? Give feedback.
-
I wonder if you cleared a task, distinct from the one shown in the second screenshot as well as any of its upstream tasks, at 15:57:15. This action would have resulted in resetting the state and start date of the DagRun. (necessary to reset the timeout) |
Beta Was this translation helpful? Give feedback.
-
Ah now that's interesting, there certainly was a task retried at that time, but it would have been cleared with its downstreams not upstreams. That being said the behaviour seems somewhat unexpected. I would have thought that once the first task had started executing the start time of the run would be fixed, and then any retries would simply extend out the finished time of the DAG Run rather than the start as well, because retrying a task doesn't change the past even if you completely restarted the run. Is this behaviour documented anywhere? |
Beta Was this translation helpful? Give feedback.
-
Let me explain what happens when you clear a task:
When you clear the task t2, we reset the state and the dates of t2 and the dag run, and you will have something like:
Then when they finish:
The run duration is the duration between the start and the end of the DagRun, if we don't reset the start date, the run duration will not be correct, and that affects a lot of things like the DagRun timeout. We can discuss the possibility to add a new column to store the first start_date if there is a real use case, but as a new feature and not a bug fix. |
Beta Was this translation helpful? Give feedback.
Let me explain what happens when you clear a task:
If you have two tasks in your dag:
t1 --> t2
When the first task succeeds but the second fails, the run state will be failed and you will have something like:
When you clear the task t2, we reset the state and the dates of t2 and the dag run, and you will have something like:
Then when they finish: