Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Reflect macro dependencies on model nodes in depends_on in manifest.json #9982

Closed
2 tasks done
epapineau opened this issue Apr 18, 2024 · 10 comments
Closed
2 tasks done
Labels
enhancement New feature or request manifest stale Issues that have gone stale

Comments

@epapineau
Copy link
Contributor

epapineau commented Apr 18, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I am not 100% sure this is a 🐛, but it doesn't meet my assumptions of what I'd find in manifest.json:

If I have a macro, defined as:

{%- macro generate_model_sql() -%}
    select
        column_name_a,
        column_name_b
    from {{ ref('some_model') }}
{%- endmacro -%}

I would expect macros.macro.project.generate_model_sql.depends_on to contain: {'node': ['model.project.generate_model_sql']}. However, AFAICT, this depends_on only contains metadata about referenced macros and not referenced nodes / models.

Expected Behavior

Ref'd models to be included in the depends_on object

Steps To Reproduce

  1. Create a macro using ref
  2. dbt parse
  3. Inspect this macro's depends_on value in manifest.json

Relevant log output

No response

Environment

- OS:
- Python:
- dbt: 1.7.5

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@epapineau epapineau added bug Something isn't working triage labels Apr 18, 2024
@dbeatty10
Copy link
Contributor

Heya @epapineau 👋

This looks the same to me as needing to force dependencies like this:

 -- depends_on: {{ ref('some_model') }}

 {{ generate_model_sql() }}

If you don't do the -- depends_on: {{ ref('some_model') }} bit, does your project display the expected lineage graph and run in the correct order?

@epapineau
Copy link
Contributor Author

epapineau commented Apr 18, 2024

@dbeatty10 thanks for the suggestion! The issue is not lineage or ordering - that is working as expected. However, after adding the suggested --depends_on and executing dbt parse, the ref'd models are still not present in the depends_on object for the macro in manifest.json.

@dbeatty10
Copy link
Contributor

After touching base with @gshank, I can confirm that it's known behavior that we don't track model refs when they are in a macro. So I'm going to update this to be a feature request.

@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working labels Apr 18, 2024
@dbeatty10 dbeatty10 changed the title [Bug] Macro node dependency not reflected in manifest.json [Feature] Reflect macro dependencies on model nodes in depends_on in manifest.json Apr 18, 2024
@jtcohen6
Copy link
Contributor

Correct!

  • Macros can only depend on other macros — this is part of our spec
  • However, a model which calls this macro transitively gets dependencies on the model ref'd in the macro

Conceptually, this is a bit similar to "vars referenced in macros are resolved based on the scope (project) of the model calling the macro, rather than the scope (package) where the macro is defined":

@epapineau
Copy link
Contributor Author

@jtcohen6 interesting. Is there an alternative recommendation to understand macro dependencies on models from manifest.json?

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 8, 2024

@epapineau Thinking aloud - could you add an ephemeral model which calls the macro, and then look at the ephemeral model's depends_on.nodes in manifest.json?

@jtcohen6 jtcohen6 removed the triage label May 8, 2024
@epapineau
Copy link
Contributor Author

That use case would work for one or two macros, but would quickly become unfeasible at the scale of 100+ macros 🤔

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2024

Fair! Okay, taking a step back — why not track model dependencies within macros?

Conceptually, macros are resources in your project, but they aren't actually nodes in the DAG. They're functions that return code, which can call (and depend on) other functions that return code — whereas depends_on.nodes describes actual edges within the DAG. A model is only directly depended on by another DAG node, not by the function called within that DAG node which returns code.

You're allowed to define a macro that references a model that doesn't exist:

{%- macro generate_bad_model_sql() -%}
    select
        column_name_a,
        column_name_b
    from {{ ref('does_not_exist') }}
{%- endmacro -%}

Until you call that macro within an actual model (DAG node), it doesn't pose a problem for constructing a viable DAG. As soon as you do, you'll see a Compilation Error.

It's also tricky to think about: How would we track model → macro dependencies when that macro takes an argument, to modify the reference based on some input or parse-time condition?

{%- macro generate_model_sql(model_name='some_model') -%}
    select
        column_name_a,
        column_name_b
    from {{ ref(model_name) }}
{%- endmacro -%}

I do see the value in tracking which macros depend on ref within their depends_on.macros. I don't think we support that for builtin ref right now (it's not a macro per se), but it does work for custom ref overrides. Not actually recommending this, just to illustrate how you might be able to programmatically determine which macros / how many macros directly reference ref:

{%- macro generate_model_sql() -%}
    select
        column_name_a,
        column_name_b
    from {{ my_project.ref('some_model') }}
{%- endmacro -%}

{%- macro custom_ref() -%}
    {{ return(builtins.ref(*varargs, **kwargs)) }}
{%- endmacro -%}

Then in manifest.json:

    "macro.my_project.generate_model_sql": {
      "name": "generate_model_sql",
      "resource_type": "macro",
      ...
      "depends_on": {
        "macros": [
          "macro.my_project.ref"
        ]
      },

Copy link
Contributor

github-actions bot commented Nov 6, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 6, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request manifest stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants