Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add functionality to only run tests #1279

Open
1 task
luis-fnogueira opened this issue Oct 23, 2024 · 4 comments
Open
1 task

[Feature] Add functionality to only run tests #1279

luis-fnogueira opened this issue Oct 23, 2024 · 4 comments
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:test Primarily related to dbt test command or functionality enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone

Comments

@luis-fnogueira
Copy link

Description

Hi y'all!

I'd like to periodically run only the dbt tests of my project using cosmos, using an approach similar to:


        render_config=RenderConfig(                               
            test_behavior=TestBehavior.ONLY_TESTS

Would that be possible for future releases? Or is already there a workaround to do that? I could not figure it out at the documentation.

Use case/motivation

I'd like to periodically have an alert on Slack about how my dbt tests are. Like a summary.

Related issues

No response

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@luis-fnogueira luis-fnogueira added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Oct 23, 2024
@dosubot dosubot bot added area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:test Primarily related to dbt test command or functionality labels Oct 23, 2024
@tatiana
Copy link
Collaborator

tatiana commented Oct 29, 2024

@luis-fnogueira This feels like a duplicate of #1242 and potentially related to #959.

Would it be an option to use the DbtTestLocalOperator (

class DbtTestLocalOperator(DbtTestMixin, DbtLocalBaseOperator):
) or an equivalent directly to run tests only? Would this solve your use-case?

As illustrated in:

test_operator = DbtTestLocalOperator(
profile_config=real_profile_config,
project_dir=DBT_PROJ_DIR,
task_id="test",
dbt_cmd_flags=["--models", "stg_customers"],
install_deps=True,
append_env=True,
)

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 29, 2024
@luis-fnogueira
Copy link
Author

DbtTestLocalOperator

Hi! I'd have to test it because I haven't found this documentation at https://astronomer.github.io/astronomer-cosmos/. Thanks a lot for the suggestion!

@luis-fnogueira luis-fnogueira closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2024
Copy link

dosubot bot commented Dec 2, 2024

Thank you for closing the issue, luis-fnogueira! We appreciate your contribution to the Cosmos project.

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 2, 2024
@tatiana tatiana reopened this Dec 27, 2024
tatiana added a commit that referenced this issue Dec 30, 2024
If these two circumstances are met:
1. The dbt project has tests that rely on multiple parent models and;
2. The `DbtDag` or `DbtTaskGroup` use `TestBehavior.AFTER_EACH`
(default) or `TestBehavior.BUILD`

Cosmos 1.8.0 and previous versions would attempt to run the same test
multiple times after each parent model run, likely failing if any of the
parents hadn't been run yet.

This PR aims to fix this behaviour by not running tests with multiple
dependencies within each task group / build task - and by adding those
tests to run only once and after all parents have run.

# Related issues

Closes: #978
Closes: #1365

This change also sets the ground for adding support to tests that don't
have any dependencies, a problem discussed in the following tickets:

* #959
* #1242
* #1279

# How to reproduce

There are two steps to reproduce this problem:

1. To create a representative dbt project
2. To create a Cosmos `DbtDag` that uses this dbt project to reproduce
the original problem

## Representative dbt project

We created a dbt project named `multiple_parents_test` that has a test
called`custom_test_combined_model` that depends on two models:
* combined_model
* model_a

The expectation from a user perspective is that, since the
`combined_model` depends on `model_a`, that the `multiple_parents_test`
will only be run after both models were run, once.

Definitions of the test:
```
{% test custom_test_combined_model(model) %}
WITH source_data AS (
    SELECT id FROM {{ ref('model_a') }}
),
combined_data AS (
    SELECT id FROM {{ model }}
)
SELECT
    s.id
FROM
    source_data s
LEFT JOIN
    combined_data c
    ON s.id = c.id
WHERE
    c.id IS NULL
{% endtest %}
```

By running the following `dbt build` command, we confirm that the test
depends on both models:

```
dbt build --select "+custom_test_combined_model_combined_model_"
11:59:29  Running with dbt=1.8.2
11:59:29  Registered adapter: postgres=1.8.1
11:59:29  Found 3 models, 6 data tests, 414 macros
11:59:29  
11:59:30  Concurrency: 4 threads (target='dev')
11:59:30  
11:59:30  1 of 9 START sql view model public.model_a ..................................... [RUN]
11:59:30  2 of 9 START sql view model public.model_b ..................................... [RUN]
11:59:30  1 of 9 OK created sql view model public.model_a ................................ [CREATE VIEW in 0.18s]
11:59:30  2 of 9 OK created sql view model public.model_b ................................ [CREATE VIEW in 0.18s]
11:59:30  3 of 9 START test unique_model_a_id ............................................ [RUN]
11:59:30  4 of 9 START test unique_model_b_id ............................................ [RUN]
11:59:30  4 of 9 PASS unique_model_b_id .................................................. [PASS in 0.05s]
11:59:30  3 of 9 PASS unique_model_a_id .................................................. [PASS in 0.06s]
11:59:30  5 of 9 START sql view model public.combined_model .............................. [RUN]
11:59:30  5 of 9 OK created sql view model public.combined_model ......................... [CREATE VIEW in 0.03s]
11:59:30  6 of 9 START test custom_test_combined_model_combined_model_ ................... [RUN]
11:59:30  7 of 9 START test not_null_combined_model_created_at ........................... [RUN]
11:59:30  8 of 9 START test not_null_combined_model_id ................................... [RUN]
11:59:30  9 of 9 START test not_null_combined_model_name ................................. [RUN]
11:59:30  7 of 9 PASS not_null_combined_model_created_at ................................. [PASS in 0.07s]
11:59:30  9 of 9 PASS not_null_combined_model_name ....................................... [PASS in 0.07s]
11:59:30  8 of 9 PASS not_null_combined_model_id ......................................... [PASS in 0.07s]
11:59:30  6 of 9 PASS custom_test_combined_model_combined_model_ ......................... [PASS in 0.08s]
11:59:30  
11:59:30  Finished running 3 view models, 6 data tests in 0 hours 0 minutes and 0.50 seconds (0.50s).
11:59:30  
11:59:30  Completed successfully
11:59:30  
11:59:30  Done. PASS=9 WARN=0 ERROR=0 SKIP=0 TOTAL=9
```

This is what the pipeline topology looks like:

<img width="1020" alt="Screenshot 2024-12-27 at 11 39 31"
src="https://github.com/user-attachments/assets/d8a8e628-2fd7-4959-b13f-3d289e7250ed"
/>

The source code structure for this dbt project:

```
├── dbt_project.yml
├── macros
│   └── custom_test_combined_model.sql
├── models
│   ├── combined_model.sql
│   ├── model_a.sql
│   ├── model_b.sql
│   └── schema.yml
└── profiles.yml
```

When running `dbt ls`, it displays:

```
dbt ls
11:40:58  Running with dbt=1.8.2
11:40:58  Registered adapter: postgres=1.8.1
11:40:58  Unable to do partial parsing because saved manifest not found. Starting full parse.
11:40:59  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
11:40:59  Found 3 models, 6 data tests, 414 macros
my_dbt_project.combined_model
my_dbt_project.model_a
my_dbt_project.model_b
my_dbt_project.custom_test_combined_model_combined_model_
my_dbt_project.not_null_combined_model_created_at
my_dbt_project.not_null_combined_model_id
my_dbt_project.not_null_combined_model_name
my_dbt_project.unique_model_a_id
my_dbt_project.unique_model_b_id
```

## Behavior in Cosmos

The DAG `example_multiple_parents_test` uses this new dbt project:

```
import os
from datetime import datetime
from pathlib import Path

from cosmos import DbtDag, ProfileConfig, ProjectConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping

DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt"
DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH))

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="example_conn",
        profile_args={"schema": "public"},
        disable_event_tracking=True,
    ),
)

example_multiple_parents_test = DbtDag(
    # dbt/cosmos-specific parameters
    project_config=ProjectConfig(
        DBT_ROOT_PATH / "multiple_parents_test",
    ),
    profile_config=profile_config,
    # normal dag parameters
    start_date=datetime(2023, 1, 1),
    dag_id="example_multiple_parents_test",
)
```

When trying to run it using:

```
airflow dags test example_multiple_parents_test
```

Users face the original error because the test is being attempted to be
run after `model_a` was run but before `combined_model` is run:

<img width="861" alt="Screenshot 2024-12-27 at 12 10 36"
src="https://github.com/user-attachments/assets/33ea7b71-ba49-4418-b194-4d3590fff1b8"
/>

Excerpt from the logs of the failing task:

```
[2024-12-27T12:07:33.564+0000] {taskinstance.py:2905} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 401, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 796, in execute
    result = self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 654, in build_and_run_cmd
    result = self.run_command(cmd=dbt_cmd, env=env, context=context)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 509, in run_command
    self.handle_exception(result)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 237, in handle_exception_dbt_runner
    raise AirflowException(f"dbt invocation completed with errors: {error_message}")
airflow.exceptions.AirflowException: dbt invocation completed with errors: custom_test_combined_model_combined_model_: Database Error in test custom_test_combined_model_combined_model_ (models/schema.yml)
  relation "public.combined_model" does not exist
  LINE 12:     SELECT id FROM "postgres"."public"."combined_model"
                              ^
  compiled Code at target/run/my_dbt_project/models/schema.yml/custom_test_combined_model_combined_model_.sql
```

## Behaviour after this change

With this change, when running the DAG mentioned above, it results in:
<img width="1264" alt="Screenshot 2024-12-27 at 15 44 17"
src="https://github.com/user-attachments/assets/e0395a4d-dbae-4b63-a3c3-69ca79ad0b04"
/>

And it can successfully be run.

## Breaking Change?

This PR slightly changes the behaviour of Cosmos DAG rendering when
using `TestBeahavior.AFTER_EACH` or `TestBeahavior.BUILD` when there are
tests with multiple parents. Some may consider it a breaking change, but
a bug fix is a better classification since Cosmos did not support
rendering many dbt projects that met these circumstances.

The behaviour change in those cases is that we're isolating tests that
depend on multiple parents and running them outside of the
`TestBehaviour.AFTER_EACH` dbt node Cosmos TaskGroup or
`TestBehaviour.BUILD`.

This change will likely highlight any tests that depended on multiple
models and were not failing previously but running as part of the tests
of both models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:test Primarily related to dbt test command or functionality enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants