-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242
Labels
area:config
Related to configuration, like YAML files, environment variables, or executer configuration
dbt:test
Primarily related to dbt test command or functionality
enhancement
New feature or request
stale
Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed
triage-needed
Items need to be reviewed / assigned to milestone
Comments
@ame589 This feels highly related to #959. Would it be an option to use the astronomer-cosmos/cosmos/operators/local.py Line 720 in 4c9b28f
As illustrated in: astronomer-cosmos/tests/operators/test_local.py Lines 458 to 465 in 4c9b28f
|
This was referenced Oct 29, 2024
This issue is stale because it has been open for 30 days with no activity. |
tatiana
added a commit
that referenced
this issue
Dec 30, 2024
If these two circumstances are met: 1. The dbt project has tests that rely on multiple parent models and; 2. The `DbtDag` or `DbtTaskGroup` use `TestBehavior.AFTER_EACH` (default) or `TestBehavior.BUILD` Cosmos 1.8.0 and previous versions would attempt to run the same test multiple times after each parent model run, likely failing if any of the parents hadn't been run yet. This PR aims to fix this behaviour by not running tests with multiple dependencies within each task group / build task - and by adding those tests to run only once and after all parents have run. # Related issues Closes: #978 Closes: #1365 This change also sets the ground for adding support to tests that don't have any dependencies, a problem discussed in the following tickets: * #959 * #1242 * #1279 # How to reproduce There are two steps to reproduce this problem: 1. To create a representative dbt project 2. To create a Cosmos `DbtDag` that uses this dbt project to reproduce the original problem ## Representative dbt project We created a dbt project named `multiple_parents_test` that has a test called`custom_test_combined_model` that depends on two models: * combined_model * model_a The expectation from a user perspective is that, since the `combined_model` depends on `model_a`, that the `multiple_parents_test` will only be run after both models were run, once. Definitions of the test: ``` {% test custom_test_combined_model(model) %} WITH source_data AS ( SELECT id FROM {{ ref('model_a') }} ), combined_data AS ( SELECT id FROM {{ model }} ) SELECT s.id FROM source_data s LEFT JOIN combined_data c ON s.id = c.id WHERE c.id IS NULL {% endtest %} ``` By running the following `dbt build` command, we confirm that the test depends on both models: ``` dbt build --select "+custom_test_combined_model_combined_model_" 11:59:29 Running with dbt=1.8.2 11:59:29 Registered adapter: postgres=1.8.1 11:59:29 Found 3 models, 6 data tests, 414 macros 11:59:29 11:59:30 Concurrency: 4 threads (target='dev') 11:59:30 11:59:30 1 of 9 START sql view model public.model_a ..................................... [RUN] 11:59:30 2 of 9 START sql view model public.model_b ..................................... [RUN] 11:59:30 1 of 9 OK created sql view model public.model_a ................................ [CREATE VIEW in 0.18s] 11:59:30 2 of 9 OK created sql view model public.model_b ................................ [CREATE VIEW in 0.18s] 11:59:30 3 of 9 START test unique_model_a_id ............................................ [RUN] 11:59:30 4 of 9 START test unique_model_b_id ............................................ [RUN] 11:59:30 4 of 9 PASS unique_model_b_id .................................................. [PASS in 0.05s] 11:59:30 3 of 9 PASS unique_model_a_id .................................................. [PASS in 0.06s] 11:59:30 5 of 9 START sql view model public.combined_model .............................. [RUN] 11:59:30 5 of 9 OK created sql view model public.combined_model ......................... [CREATE VIEW in 0.03s] 11:59:30 6 of 9 START test custom_test_combined_model_combined_model_ ................... [RUN] 11:59:30 7 of 9 START test not_null_combined_model_created_at ........................... [RUN] 11:59:30 8 of 9 START test not_null_combined_model_id ................................... [RUN] 11:59:30 9 of 9 START test not_null_combined_model_name ................................. [RUN] 11:59:30 7 of 9 PASS not_null_combined_model_created_at ................................. [PASS in 0.07s] 11:59:30 9 of 9 PASS not_null_combined_model_name ....................................... [PASS in 0.07s] 11:59:30 8 of 9 PASS not_null_combined_model_id ......................................... [PASS in 0.07s] 11:59:30 6 of 9 PASS custom_test_combined_model_combined_model_ ......................... [PASS in 0.08s] 11:59:30 11:59:30 Finished running 3 view models, 6 data tests in 0 hours 0 minutes and 0.50 seconds (0.50s). 11:59:30 11:59:30 Completed successfully 11:59:30 11:59:30 Done. PASS=9 WARN=0 ERROR=0 SKIP=0 TOTAL=9 ``` This is what the pipeline topology looks like: <img width="1020" alt="Screenshot 2024-12-27 at 11 39 31" src="https://github.com/user-attachments/assets/d8a8e628-2fd7-4959-b13f-3d289e7250ed" /> The source code structure for this dbt project: ``` ├── dbt_project.yml ├── macros │ └── custom_test_combined_model.sql ├── models │ ├── combined_model.sql │ ├── model_a.sql │ ├── model_b.sql │ └── schema.yml └── profiles.yml ``` When running `dbt ls`, it displays: ``` dbt ls 11:40:58 Running with dbt=1.8.2 11:40:58 Registered adapter: postgres=1.8.1 11:40:58 Unable to do partial parsing because saved manifest not found. Starting full parse. 11:40:59 [WARNING]: Deprecated functionality The `tests` config has been renamed to `data_tests`. Please see https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more information. 11:40:59 Found 3 models, 6 data tests, 414 macros my_dbt_project.combined_model my_dbt_project.model_a my_dbt_project.model_b my_dbt_project.custom_test_combined_model_combined_model_ my_dbt_project.not_null_combined_model_created_at my_dbt_project.not_null_combined_model_id my_dbt_project.not_null_combined_model_name my_dbt_project.unique_model_a_id my_dbt_project.unique_model_b_id ``` ## Behavior in Cosmos The DAG `example_multiple_parents_test` uses this new dbt project: ``` import os from datetime import datetime from pathlib import Path from cosmos import DbtDag, ProfileConfig, ProjectConfig from cosmos.profiles import PostgresUserPasswordProfileMapping DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt" DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH)) profile_config = ProfileConfig( profile_name="default", target_name="dev", profile_mapping=PostgresUserPasswordProfileMapping( conn_id="example_conn", profile_args={"schema": "public"}, disable_event_tracking=True, ), ) example_multiple_parents_test = DbtDag( # dbt/cosmos-specific parameters project_config=ProjectConfig( DBT_ROOT_PATH / "multiple_parents_test", ), profile_config=profile_config, # normal dag parameters start_date=datetime(2023, 1, 1), dag_id="example_multiple_parents_test", ) ``` When trying to run it using: ``` airflow dags test example_multiple_parents_test ``` Users face the original error because the test is being attempted to be run after `model_a` was run but before `combined_model` is run: <img width="861" alt="Screenshot 2024-12-27 at 12 10 36" src="https://github.com/user-attachments/assets/33ea7b71-ba49-4418-b194-4d3590fff1b8" /> Excerpt from the logs of the failing task: ``` [2024-12-27T12:07:33.564+0000] {taskinstance.py:2905} ERROR - Task failed with exception Traceback (most recent call last): File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable return execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 401, in wrapper return func(self, *args, **kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 796, in execute result = self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 654, in build_and_run_cmd result = self.run_command(cmd=dbt_cmd, env=env, context=context) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 509, in run_command self.handle_exception(result) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 237, in handle_exception_dbt_runner raise AirflowException(f"dbt invocation completed with errors: {error_message}") airflow.exceptions.AirflowException: dbt invocation completed with errors: custom_test_combined_model_combined_model_: Database Error in test custom_test_combined_model_combined_model_ (models/schema.yml) relation "public.combined_model" does not exist LINE 12: SELECT id FROM "postgres"."public"."combined_model" ^ compiled Code at target/run/my_dbt_project/models/schema.yml/custom_test_combined_model_combined_model_.sql ``` ## Behaviour after this change With this change, when running the DAG mentioned above, it results in: <img width="1264" alt="Screenshot 2024-12-27 at 15 44 17" src="https://github.com/user-attachments/assets/e0395a4d-dbae-4b63-a3c3-69ca79ad0b04" /> And it can successfully be run. ## Breaking Change? This PR slightly changes the behaviour of Cosmos DAG rendering when using `TestBeahavior.AFTER_EACH` or `TestBeahavior.BUILD` when there are tests with multiple parents. Some may consider it a breaking change, but a bug fix is a better classification since Cosmos did not support rendering many dbt projects that met these circumstances. The behaviour change in those cases is that we're isolating tests that depend on multiple parents and running them outside of the `TestBehaviour.AFTER_EACH` dbt node Cosmos TaskGroup or `TestBehaviour.BUILD`. This change will likely highlight any tests that depended on multiple models and were not failing previously but running as part of the tests of both models.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:config
Related to configuration, like YAML files, environment variables, or executer configuration
dbt:test
Primarily related to dbt test command or functionality
enhancement
New feature or request
stale
Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed
triage-needed
Items need to be reviewed / assigned to milestone
Description
I would like to propose a feature that allows users to execute only the tests, defined at yaml level of a specific dbt model, without triggering the execution of the model itself in an DbtTaskGroup Airflow pipeline.
Use case/motivation
This could be achieved by adding a configuration that allows users to specify a model in the select parameter and set the test behavior to ONLY_TEST. Here’s an example of how this could look:
With this configuration:
This would make the testing process more flexible and tailored to specific use cases.
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: