Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add teardown task #1529

Merged
merged 14 commits into from
Feb 13, 2025
Merged

Add teardown task #1529

merged 14 commits into from
Feb 13, 2025

Conversation

pankajastro
Copy link
Contributor

@pankajastro pankajastro commented Feb 10, 2025

This PR introduces a teardown node in the Cosmos DAG that deletes SQL files from the remote location that were uploaded by the setup task. This ensures proper cleanup of resources after the task execution.

Key Changes:

Teardown Node Addition:

A new teardown task is added to the Cosmos DAG that deletes the SQL files from the remote location. The files deleted are the ones uploaded by the setup task earlier in the DAG.
New Config: enable_teardown_async_task:

A configuration option enable_teardown_async_task, is introduced to enable or disable this feature. This config is enabled by default.
Setting this config to False will skip the teardown step, while setting it to True will ensure the deletion task runs after the setup.

Applicable Only for AIRFLOW_ASYNC Execution Mode:

This feature is only triggered when the DAG is running in AIRFLOW_ASYNC execution mode

Screenshot 2025-02-12 at 6 17 08 PM

closes: #1232

Copy link

netlify bot commented Feb 10, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 81ae672
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/67adfb1a67e46f0008c2cf86

Copy link

cloudflare-workers-and-pages bot commented Feb 10, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 81ae672
Status: ✅  Deploy successful!
Preview URL: https://87afda90.astronomer-cosmos.pages.dev
Branch Preview URL: https://teardown-task.astronomer-cosmos.pages.dev

View logs

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pankajastro some early feedback. Please, could you confirm that the teardown task is not emitting any datasets (outlets)

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.32%. Comparing base (a43a991) to head (81ae672).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1529      +/-   ##
==========================================
+ Coverage   97.30%   97.32%   +0.02%     
==========================================
  Files          80       80              
  Lines        4784     4821      +37     
==========================================
+ Hits         4655     4692      +37     
  Misses        129      129              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pankajastro pankajastro marked this pull request as ready for review February 12, 2025 14:12
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 12, 2025
@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration area:docs Relating to documentation, changes, fixes, improvement area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc labels Feb 12, 2025
@pankajastro
Copy link
Contributor Author

Hi @pankajastro some early feedback. Please, could you confirm that the teardown task is not emitting any datasets (outlets)

Yes, teardown task do not emit any datasets

Copy link
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks like we have a conflict with main that needs to be resolved.

@pankajastro pankajastro merged commit 745ed14 into main Feb 13, 2025
69 checks passed
@pankajastro pankajastro deleted the teardown_task branch February 13, 2025 16:36
@pankajkoti pankajkoti mentioned this pull request Feb 14, 2025
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:docs Relating to documentation, changes, fixes, improvement area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[async] Introduce teardown node when using ExecutionMode.AIRFLOW_ASYNC
3 participants