Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

ame589 · 2024-10-04T10:34:47Z

Description

I would like to propose a feature that allows users to execute only the tests, defined at yaml level of a specific dbt model, without triggering the execution of the model itself in an DbtTaskGroup Airflow pipeline.

Use case/motivation

This could be achieved by adding a configuration that allows users to specify a model in the select parameter and set the test behavior to ONLY_TEST. Here’s an example of how this could look:

    return DbtTaskGroup(
        group_id=f"{model_name}",
        project_config=project_config,
        profile_config=profile_config,
        execution_config=execution_config,
        render_config=RenderConfig(
            select=[mod],  # Select the specific flow_name to run
            exclude=[],  # Exclude the dependency if provided
            test_behavior=TestBehavior.ONLY_TEST # Run only tests, not the model itself
        ),
        default_args={"retries": 2, "trigger_rule": "all_success"},
        operator_args={
            "install_deps": True
        }
    )

With this configuration:

The select parameter will focus the dbt task on the specific model.
The test_behavior=TestBehavior.ONLY_TEST setting will ensure that only the tests for the model are executed, while the actual data transformations are skipped.

This would make the testing process more flexible and tailored to specific use cases.

Related issues

No response

Are you willing to submit a PR?

Yes, I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

tatiana · 2024-10-29T13:02:51Z

@ame589 This feels highly related to #959.

Would it be an option to use the DbtTestLocalOperator (

astronomer-cosmos/cosmos/operators/local.py

Line 720 in 4c9b28f

class DbtTestLocalOperator(DbtTestMixin, DbtLocalBaseOperator):

) or an equivalent directly to run tests only?

As illustrated in:

astronomer-cosmos/tests/operators/test_local.py

Lines 458 to 465 in 4c9b28f

    
           test_operator = DbtTestLocalOperator( 
        
               profile_config=real_profile_config, 
        
               project_dir=DBT_PROJ_DIR, 
        
               task_id="test", 
        
               dbt_cmd_flags=["--models", "stg_customers"], 
        
               install_deps=True, 
        
               append_env=True, 
        
           )

github-actions · 2024-12-04T11:04:16Z

This issue is stale because it has been open for 30 days with no activity.

If these two circumstances are met: 1. The dbt project has tests that rely on multiple parent models and; 2. The `DbtDag` or `DbtTaskGroup` use `TestBehavior.AFTER_EACH` (default) or `TestBehavior.BUILD` Cosmos 1.8.0 and previous versions would attempt to run the same test multiple times after each parent model run, likely failing if any of the parents hadn't been run yet. This PR aims to fix this behaviour by not running tests with multiple dependencies within each task group / build task - and by adding those tests to run only once and after all parents have run. # Related issues Closes: #978 Closes: #1365 This change also sets the ground for adding support to tests that don't have any dependencies, a problem discussed in the following tickets: * #959 * #1242 * #1279 # How to reproduce There are two steps to reproduce this problem: 1. To create a representative dbt project 2. To create a Cosmos `DbtDag` that uses this dbt project to reproduce the original problem ## Representative dbt project We created a dbt project named `multiple_parents_test` that has a test called`custom_test_combined_model` that depends on two models: * combined_model * model_a The expectation from a user perspective is that, since the `combined_model` depends on `model_a`, that the `multiple_parents_test` will only be run after both models were run, once. Definitions of the test: ``` {% test custom_test_combined_model(model) %} WITH source_data AS ( SELECT id FROM {{ ref('model_a') }} ), combined_data AS ( SELECT id FROM {{ model }} ) SELECT s.id FROM source_data s LEFT JOIN combined_data c ON s.id = c.id WHERE c.id IS NULL {% endtest %} ``` By running the following `dbt build` command, we confirm that the test depends on both models: ``` dbt build --select "+custom_test_combined_model_combined_model_" 11:59:29 Running with dbt=1.8.2 11:59:29 Registered adapter: postgres=1.8.1 11:59:29 Found 3 models, 6 data tests, 414 macros 11:59:29 11:59:30 Concurrency: 4 threads (target='dev') 11:59:30 11:59:30 1 of 9 START sql view model public.model_a ..................................... [RUN] 11:59:30 2 of 9 START sql view model public.model_b ..................................... [RUN] 11:59:30 1 of 9 OK created sql view model public.model_a ................................ [CREATE VIEW in 0.18s] 11:59:30 2 of 9 OK created sql view model public.model_b ................................ [CREATE VIEW in 0.18s] 11:59:30 3 of 9 START test unique_model_a_id ............................................ [RUN] 11:59:30 4 of 9 START test unique_model_b_id ............................................ [RUN] 11:59:30 4 of 9 PASS unique_model_b_id .................................................. [PASS in 0.05s] 11:59:30 3 of 9 PASS unique_model_a_id .................................................. [PASS in 0.06s] 11:59:30 5 of 9 START sql view model public.combined_model .............................. [RUN] 11:59:30 5 of 9 OK created sql view model public.combined_model ......................... [CREATE VIEW in 0.03s] 11:59:30 6 of 9 START test custom_test_combined_model_combined_model_ ................... [RUN] 11:59:30 7 of 9 START test not_null_combined_model_created_at ........................... [RUN] 11:59:30 8 of 9 START test not_null_combined_model_id ................................... [RUN] 11:59:30 9 of 9 START test not_null_combined_model_name ................................. [RUN] 11:59:30 7 of 9 PASS not_null_combined_model_created_at ................................. [PASS in 0.07s] 11:59:30 9 of 9 PASS not_null_combined_model_name ....................................... [PASS in 0.07s] 11:59:30 8 of 9 PASS not_null_combined_model_id ......................................... [PASS in 0.07s] 11:59:30 6 of 9 PASS custom_test_combined_model_combined_model_ ......................... [PASS in 0.08s] 11:59:30 11:59:30 Finished running 3 view models, 6 data tests in 0 hours 0 minutes and 0.50 seconds (0.50s). 11:59:30 11:59:30 Completed successfully 11:59:30 11:59:30 Done. PASS=9 WARN=0 ERROR=0 SKIP=0 TOTAL=9 ``` This is what the pipeline topology looks like: <img width="1020" alt="Screenshot 2024-12-27 at 11 39 31" src="https://github.com/user-attachments/assets/d8a8e628-2fd7-4959-b13f-3d289e7250ed" /> The source code structure for this dbt project: ``` ├── dbt_project.yml ├── macros │ └── custom_test_combined_model.sql ├── models │ ├── combined_model.sql │ ├── model_a.sql │ ├── model_b.sql │ └── schema.yml └── profiles.yml ``` When running `dbt ls`, it displays: ``` dbt ls 11:40:58 Running with dbt=1.8.2 11:40:58 Registered adapter: postgres=1.8.1 11:40:58 Unable to do partial parsing because saved manifest not found. Starting full parse. 11:40:59 [WARNING]: Deprecated functionality The `tests` config has been renamed to `data_tests`. Please see https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more information. 11:40:59 Found 3 models, 6 data tests, 414 macros my_dbt_project.combined_model my_dbt_project.model_a my_dbt_project.model_b my_dbt_project.custom_test_combined_model_combined_model_ my_dbt_project.not_null_combined_model_created_at my_dbt_project.not_null_combined_model_id my_dbt_project.not_null_combined_model_name my_dbt_project.unique_model_a_id my_dbt_project.unique_model_b_id ``` ## Behavior in Cosmos The DAG `example_multiple_parents_test` uses this new dbt project: ``` import os from datetime import datetime from pathlib import Path from cosmos import DbtDag, ProfileConfig, ProjectConfig from cosmos.profiles import PostgresUserPasswordProfileMapping DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt" DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH)) profile_config = ProfileConfig( profile_name="default", target_name="dev", profile_mapping=PostgresUserPasswordProfileMapping( conn_id="example_conn", profile_args={"schema": "public"}, disable_event_tracking=True, ), ) example_multiple_parents_test = DbtDag( # dbt/cosmos-specific parameters project_config=ProjectConfig( DBT_ROOT_PATH / "multiple_parents_test", ), profile_config=profile_config, # normal dag parameters start_date=datetime(2023, 1, 1), dag_id="example_multiple_parents_test", ) ``` When trying to run it using: ``` airflow dags test example_multiple_parents_test ``` Users face the original error because the test is being attempted to be run after `model_a` was run but before `combined_model` is run: <img width="861" alt="Screenshot 2024-12-27 at 12 10 36" src="https://github.com/user-attachments/assets/33ea7b71-ba49-4418-b194-4d3590fff1b8" /> Excerpt from the logs of the failing task: ``` [2024-12-27T12:07:33.564+0000] {taskinstance.py:2905} ERROR - Task failed with exception Traceback (most recent call last): File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable return execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 401, in wrapper return func(self, *args, **kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 796, in execute result = self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 654, in build_and_run_cmd result = self.run_command(cmd=dbt_cmd, env=env, context=context) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 509, in run_command self.handle_exception(result) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 237, in handle_exception_dbt_runner raise AirflowException(f"dbt invocation completed with errors: {error_message}") airflow.exceptions.AirflowException: dbt invocation completed with errors: custom_test_combined_model_combined_model_: Database Error in test custom_test_combined_model_combined_model_ (models/schema.yml) relation "public.combined_model" does not exist LINE 12: SELECT id FROM "postgres"."public"."combined_model" ^ compiled Code at target/run/my_dbt_project/models/schema.yml/custom_test_combined_model_combined_model_.sql ``` ## Behaviour after this change With this change, when running the DAG mentioned above, it results in: <img width="1264" alt="Screenshot 2024-12-27 at 15 44 17" src="https://github.com/user-attachments/assets/e0395a4d-dbae-4b63-a3c3-69ca79ad0b04" /> And it can successfully be run. ## Breaking Change? This PR slightly changes the behaviour of Cosmos DAG rendering when using `TestBeahavior.AFTER_EACH` or `TestBeahavior.BUILD` when there are tests with multiple parents. Some may consider it a breaking change, but a bug fix is a better classification since Cosmos did not support rendering many dbt projects that met these circumstances. The behaviour change in those cases is that we're isolating tests that depend on multiple parents and running them outside of the `TestBehaviour.AFTER_EACH` dbt node Cosmos TaskGroup or `TestBehaviour.BUILD`. This change will likely highlight any tests that depended on multiple models and were not failing previously but running as part of the tests of both models.

ame589 added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Oct 4, 2024

dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:test Primarily related to dbt test command or functionality labels Oct 4, 2024

This was referenced Oct 29, 2024

[Feature] Add functionality to only run tests #1279

Open

Support rendering test tasks even when they are not attached to models/seeds/snapshots #959

Open

github-actions bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 4, 2024

tatiana mentioned this issue Dec 27, 2024

Fix rendering dbt tests with multiple parents #1433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

ame589 commented Oct 4, 2024

tatiana commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024

Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

Comments

ame589 commented Oct 4, 2024

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

tatiana commented Oct 29, 2024 • edited Loading

github-actions bot commented Dec 4, 2024

tatiana commented Oct 29, 2024 •

edited

Loading