SPIKE: Get `dag.test` working with Task SDK #45549

kaxil · 2025-01-10T11:51:54Z

dag.test isn't implemented with Task SDK, let's figure out how the developer experience/ DAG developing story would look like with Task SDK

The text was updated successfully, but these errors were encountered:

The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have #45549 to look at that more holistically.

The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have #45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. This change also changes when and where in the TaskSDK exeuction time code we render templates and send RTIF fields to the server. This is needed because calling `render_templates` also expands the Mapped operator. As a result the `startup` call parses the dag, renders templates and performs the runtime checks (currently checking Inlets and Outlets with the API server) and returns the context. This context is important as the `ti.task` _in that context_ is unnmapped if required. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.

The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have #45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.

The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have apache#45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.

kaxil added area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK area:task-sdk labels Jan 10, 2025

kaxil moved this to Icebox in AIP-72 - Task Execution Interface and SDK Jan 10, 2025

kaxil added this to AIP-72 - Task Execution Interface and SDK Jan 10, 2025

kaxil moved this from Icebox to Todo in AIP-72 - Task Execution Interface and SDK Jan 10, 2025

kaxil added this to the Airflow 3.0.0 milestone Jan 17, 2025

ashb mentioned this issue Jan 31, 2025

Add dynamic task mapping into TaskSDK runtime #46032

Merged

eladkal mentioned this issue Feb 21, 2025

Status of testing Providers that were prepared on February 21, 2025 #46973

Closed

Cesar290522 mentioned this issue Feb 23, 2025

Create nice way for provider's own unittests to "run dags" on Airflow 2 and 3 Cesar290522/Chava29#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE: Get `dag.test` working with Task SDK #45549

SPIKE: Get `dag.test` working with Task SDK #45549

kaxil commented Jan 10, 2025

SPIKE: Get dag.test working with Task SDK #45549

SPIKE: Get dag.test working with Task SDK #45549

Comments

kaxil commented Jan 10, 2025

SPIKE: Get `dag.test` working with Task SDK #45549

SPIKE: Get `dag.test` working with Task SDK #45549