Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE: Get dag.test working with Task SDK #45549

Open
kaxil opened this issue Jan 10, 2025 · 0 comments
Open

SPIKE: Get dag.test working with Task SDK #45549

kaxil opened this issue Jan 10, 2025 · 0 comments
Labels
area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK area:task-sdk
Milestone

Comments

@kaxil
Copy link
Member

kaxil commented Jan 10, 2025

dag.test isn't implemented with Task SDK, let's figure out how the developer experience/ DAG developing story would look like with Task SDK

@kaxil kaxil added area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK area:task-sdk labels Jan 10, 2025
@kaxil kaxil added this to the Airflow 3.0.0 milestone Jan 17, 2025
ashb added a commit that referenced this issue Jan 30, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.
ashb added a commit that referenced this issue Jan 30, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.
ashb added a commit that referenced this issue Jan 31, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.
ashb added a commit that referenced this issue Feb 4, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

This change also changes when and where in the TaskSDK exeuction time code we
render templates and send RTIF fields to the server. This is needed because
calling `render_templates` also expands the Mapped operator. As a result the
`startup` call parses the dag, renders templates and performs the runtime
checks (currently checking Inlets and Outlets with the API server) and returns
the context. This context is important as the `ti.task` _in that context_ is
unnmapped if required.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 5, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

This change also changes when and where in the TaskSDK exeuction time code we
render templates and send RTIF fields to the server. This is needed because
calling `render_templates` also expands the Mapped operator. As a result the
`startup` call parses the dag, renders templates and performs the runtime
checks (currently checking Inlets and Outlets with the API server) and returns
the context. This context is important as the `ti.task` _in that context_ is
unnmapped if required.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 5, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ashb added a commit that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
insomnes pushed a commit to insomnes/airflow that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have apache#45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
insomnes pushed a commit to insomnes/airflow that referenced this issue Feb 6, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have apache#45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
niklasr22 pushed a commit to niklasr22/airflow that referenced this issue Feb 8, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have apache#45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
ambika-garg pushed a commit to ambika-garg/airflow that referenced this issue Feb 17, 2025
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.

This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
`DictOfListsExpandInput` it gets turned into an instance of
SchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.

Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have apache#45549 to look at
that more holistically.

To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement `__len__` on the new LazyXComSequence class.

I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK area:task-sdk
Development

No branches or pull requests

1 participant