Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Run Only Tests for a Specific dbt Model Without Executing the Model #1242

Open
1 task
ame589 opened this issue Oct 4, 2024 · 2 comments
Open
1 task
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:test Primarily related to dbt test command or functionality enhancement New feature or request stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed triage-needed Items need to be reviewed / assigned to milestone

Comments

@ame589
Copy link

ame589 commented Oct 4, 2024

Description

I would like to propose a feature that allows users to execute only the tests, defined at yaml level of a specific dbt model, without triggering the execution of the model itself in an DbtTaskGroup Airflow pipeline.

Use case/motivation

This could be achieved by adding a configuration that allows users to specify a model in the select parameter and set the test behavior to ONLY_TEST. Here’s an example of how this could look:

    return DbtTaskGroup(
        group_id=f"{model_name}",
        project_config=project_config,
        profile_config=profile_config,
        execution_config=execution_config,
        render_config=RenderConfig(
            select=[mod],  # Select the specific flow_name to run
            exclude=[],  # Exclude the dependency if provided
            test_behavior=TestBehavior.ONLY_TEST # Run only tests, not the model itself
        ),
        default_args={"retries": 2, "trigger_rule": "all_success"},
        operator_args={
            "install_deps": True
        }
    )

With this configuration:

  1. The select parameter will focus the dbt task on the specific model.
  2. The test_behavior=TestBehavior.ONLY_TEST setting will ensure that only the tests for the model are executed, while the actual data transformations are skipped.

This would make the testing process more flexible and tailored to specific use cases.

Related issues

No response

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@ame589 ame589 added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Oct 4, 2024
@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:test Primarily related to dbt test command or functionality labels Oct 4, 2024
@tatiana
Copy link
Collaborator

tatiana commented Oct 29, 2024

@ame589 This feels highly related to #959.

Would it be an option to use the DbtTestLocalOperator (

class DbtTestLocalOperator(DbtTestMixin, DbtLocalBaseOperator):
) or an equivalent directly to run tests only?

As illustrated in:

test_operator = DbtTestLocalOperator(
profile_config=real_profile_config,
project_dir=DBT_PROJ_DIR,
task_id="test",
dbt_cmd_flags=["--models", "stg_customers"],
install_deps=True,
append_env=True,
)

Copy link

github-actions bot commented Dec 4, 2024

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 4, 2024
tatiana added a commit that referenced this issue Dec 30, 2024
If these two circumstances are met:
1. The dbt project has tests that rely on multiple parent models and;
2. The `DbtDag` or `DbtTaskGroup` use `TestBehavior.AFTER_EACH`
(default) or `TestBehavior.BUILD`

Cosmos 1.8.0 and previous versions would attempt to run the same test
multiple times after each parent model run, likely failing if any of the
parents hadn't been run yet.

This PR aims to fix this behaviour by not running tests with multiple
dependencies within each task group / build task - and by adding those
tests to run only once and after all parents have run.

# Related issues

Closes: #978
Closes: #1365

This change also sets the ground for adding support to tests that don't
have any dependencies, a problem discussed in the following tickets:

* #959
* #1242
* #1279

# How to reproduce

There are two steps to reproduce this problem:

1. To create a representative dbt project
2. To create a Cosmos `DbtDag` that uses this dbt project to reproduce
the original problem

## Representative dbt project

We created a dbt project named `multiple_parents_test` that has a test
called`custom_test_combined_model` that depends on two models:
* combined_model
* model_a

The expectation from a user perspective is that, since the
`combined_model` depends on `model_a`, that the `multiple_parents_test`
will only be run after both models were run, once.

Definitions of the test:
```
{% test custom_test_combined_model(model) %}
WITH source_data AS (
    SELECT id FROM {{ ref('model_a') }}
),
combined_data AS (
    SELECT id FROM {{ model }}
)
SELECT
    s.id
FROM
    source_data s
LEFT JOIN
    combined_data c
    ON s.id = c.id
WHERE
    c.id IS NULL
{% endtest %}
```

By running the following `dbt build` command, we confirm that the test
depends on both models:

```
dbt build --select "+custom_test_combined_model_combined_model_"
11:59:29  Running with dbt=1.8.2
11:59:29  Registered adapter: postgres=1.8.1
11:59:29  Found 3 models, 6 data tests, 414 macros
11:59:29  
11:59:30  Concurrency: 4 threads (target='dev')
11:59:30  
11:59:30  1 of 9 START sql view model public.model_a ..................................... [RUN]
11:59:30  2 of 9 START sql view model public.model_b ..................................... [RUN]
11:59:30  1 of 9 OK created sql view model public.model_a ................................ [CREATE VIEW in 0.18s]
11:59:30  2 of 9 OK created sql view model public.model_b ................................ [CREATE VIEW in 0.18s]
11:59:30  3 of 9 START test unique_model_a_id ............................................ [RUN]
11:59:30  4 of 9 START test unique_model_b_id ............................................ [RUN]
11:59:30  4 of 9 PASS unique_model_b_id .................................................. [PASS in 0.05s]
11:59:30  3 of 9 PASS unique_model_a_id .................................................. [PASS in 0.06s]
11:59:30  5 of 9 START sql view model public.combined_model .............................. [RUN]
11:59:30  5 of 9 OK created sql view model public.combined_model ......................... [CREATE VIEW in 0.03s]
11:59:30  6 of 9 START test custom_test_combined_model_combined_model_ ................... [RUN]
11:59:30  7 of 9 START test not_null_combined_model_created_at ........................... [RUN]
11:59:30  8 of 9 START test not_null_combined_model_id ................................... [RUN]
11:59:30  9 of 9 START test not_null_combined_model_name ................................. [RUN]
11:59:30  7 of 9 PASS not_null_combined_model_created_at ................................. [PASS in 0.07s]
11:59:30  9 of 9 PASS not_null_combined_model_name ....................................... [PASS in 0.07s]
11:59:30  8 of 9 PASS not_null_combined_model_id ......................................... [PASS in 0.07s]
11:59:30  6 of 9 PASS custom_test_combined_model_combined_model_ ......................... [PASS in 0.08s]
11:59:30  
11:59:30  Finished running 3 view models, 6 data tests in 0 hours 0 minutes and 0.50 seconds (0.50s).
11:59:30  
11:59:30  Completed successfully
11:59:30  
11:59:30  Done. PASS=9 WARN=0 ERROR=0 SKIP=0 TOTAL=9
```

This is what the pipeline topology looks like:

<img width="1020" alt="Screenshot 2024-12-27 at 11 39 31"
src="https://github.com/user-attachments/assets/d8a8e628-2fd7-4959-b13f-3d289e7250ed"
/>

The source code structure for this dbt project:

```
├── dbt_project.yml
├── macros
│   └── custom_test_combined_model.sql
├── models
│   ├── combined_model.sql
│   ├── model_a.sql
│   ├── model_b.sql
│   └── schema.yml
└── profiles.yml
```

When running `dbt ls`, it displays:

```
dbt ls
11:40:58  Running with dbt=1.8.2
11:40:58  Registered adapter: postgres=1.8.1
11:40:58  Unable to do partial parsing because saved manifest not found. Starting full parse.
11:40:59  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
11:40:59  Found 3 models, 6 data tests, 414 macros
my_dbt_project.combined_model
my_dbt_project.model_a
my_dbt_project.model_b
my_dbt_project.custom_test_combined_model_combined_model_
my_dbt_project.not_null_combined_model_created_at
my_dbt_project.not_null_combined_model_id
my_dbt_project.not_null_combined_model_name
my_dbt_project.unique_model_a_id
my_dbt_project.unique_model_b_id
```

## Behavior in Cosmos

The DAG `example_multiple_parents_test` uses this new dbt project:

```
import os
from datetime import datetime
from pathlib import Path

from cosmos import DbtDag, ProfileConfig, ProjectConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping

DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt"
DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH))

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="example_conn",
        profile_args={"schema": "public"},
        disable_event_tracking=True,
    ),
)

example_multiple_parents_test = DbtDag(
    # dbt/cosmos-specific parameters
    project_config=ProjectConfig(
        DBT_ROOT_PATH / "multiple_parents_test",
    ),
    profile_config=profile_config,
    # normal dag parameters
    start_date=datetime(2023, 1, 1),
    dag_id="example_multiple_parents_test",
)
```

When trying to run it using:

```
airflow dags test example_multiple_parents_test
```

Users face the original error because the test is being attempted to be
run after `model_a` was run but before `combined_model` is run:

<img width="861" alt="Screenshot 2024-12-27 at 12 10 36"
src="https://github.com/user-attachments/assets/33ea7b71-ba49-4418-b194-4d3590fff1b8"
/>

Excerpt from the logs of the failing task:

```
[2024-12-27T12:07:33.564+0000] {taskinstance.py:2905} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 401, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 796, in execute
    result = self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 654, in build_and_run_cmd
    result = self.run_command(cmd=dbt_cmd, env=env, context=context)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 509, in run_command
    self.handle_exception(result)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 237, in handle_exception_dbt_runner
    raise AirflowException(f"dbt invocation completed with errors: {error_message}")
airflow.exceptions.AirflowException: dbt invocation completed with errors: custom_test_combined_model_combined_model_: Database Error in test custom_test_combined_model_combined_model_ (models/schema.yml)
  relation "public.combined_model" does not exist
  LINE 12:     SELECT id FROM "postgres"."public"."combined_model"
                              ^
  compiled Code at target/run/my_dbt_project/models/schema.yml/custom_test_combined_model_combined_model_.sql
```

## Behaviour after this change

With this change, when running the DAG mentioned above, it results in:
<img width="1264" alt="Screenshot 2024-12-27 at 15 44 17"
src="https://github.com/user-attachments/assets/e0395a4d-dbae-4b63-a3c3-69ca79ad0b04"
/>

And it can successfully be run.

## Breaking Change?

This PR slightly changes the behaviour of Cosmos DAG rendering when
using `TestBeahavior.AFTER_EACH` or `TestBeahavior.BUILD` when there are
tests with multiple parents. Some may consider it a breaking change, but
a bug fix is a better classification since Cosmos did not support
rendering many dbt projects that met these circumstances.

The behaviour change in those cases is that we're isolating tests that
depend on multiple parents and running them outside of the
`TestBehaviour.AFTER_EACH` dbt node Cosmos TaskGroup or
`TestBehaviour.BUILD`.

This change will likely highlight any tests that depended on multiple
models and were not failing previously but running as part of the tests
of both models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:test Primarily related to dbt test command or functionality enhancement New feature or request stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants