Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for configuring sources using resource paths #3298

Closed
kdeggelman opened this issue Apr 26, 2021 · 2 comments
Closed

Support for configuring sources using resource paths #3298

kdeggelman opened this issue Apr 26, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@kdeggelman
Copy link

Describe the feature

I'd like to be able to configure sources in dbt_project.yml using the resource-path pattern in a similar way to how I configure models. My current understanding is that I can only configure sources using package name, source name, and table name. This forces me to configure each source individual as opposed to configuring multiple sources within a directory.

Describe alternatives you've considered

Right now I have to add database: "<database name>" to every source that lives in a different database.
Another option would be to specify the database on a per-file level to avoid having to copy database: "<database name>" in every source in that file.

Additional context

We use BigQuery and have data spread between multiple projects (databases). Within my models/ directory I have a couple directories that each have a file like src_other_database.sql that define a bunch of different datasets (sources) that all live in that project.

Who will this benefit?

In general, it seems like a good idea have similar functionality for model configuration and source configuration. This feature would help keep source definitions "DRY" for many projects.

Are you interested in contributing this feature?

I'm happy to help any way that I can, though I suspect the actual implementation would be better left to some of the core contributors.

@kdeggelman kdeggelman added enhancement New feature or request triage labels Apr 26, 2021
@jtcohen6 jtcohen6 removed the triage label Apr 27, 2021
@jtcohen6
Copy link
Contributor

Hey @kdeggelman, thanks for opening!

There are two prerequisites to this, and I'll say more about each of them below:

  1. Sources need to have subdirectory paths inside their FQNs. (Turns out this is true today, just not documented right!)
  2. Source properties need to be configurable from dbt_project.yml. This is a big lift, and it's roughly the change proposed in Set configs in schema.yml files #2401.

In dbt, resources are configured in dbt_project.yml based on their fully qualified name (FQN). Source FQNs actually do include the relative file path of the YAML file where they're defined. That's documented right here in the codebase:

https://github.com/fishtown-analytics/dbt/blob/39f350fe89bd11215208be2513bb97020287a636/core/dbt/parser/schemas.py#L796-L798

This is not correct in the docs, which say:

Unlike models, source configurations are not applied hierarchically based on folder paths. Instead, source configurations are applied based on:

  • The package which contains the source
  • The source name
  • The table name

There should be another item, between the first and the second: the relative path of the source's YAML definition. Here, for instance, is a snippet from a source's manifest.json entry:

        "source.my_project.my_src.my_tbl": {
            "fqn": [
                "my_project",
                "subfolder",
                "my_src",
                "my_tbl"
            ],

And I would configure it like so:

sources:
  my_project:
    subfolder:  # required for this to work!
      my_src:
        +enabled: False

Now, for the second matter at hand: The attributes of source definitions are technically resource properties, rather than node configs. Today, sources can only carry one general node config: enabled. Everything else (database, freshness, etc) are all properties, defined in wherever/*.yml, not configurable from dbt_project.yml.

This is a technical distinction, and not a tremendously meaningful one. Ideally, we'll manage to do away with it before dbt v1.0, so that it will be possible to set property-configs like:

sources:
  my_package:
    my_subfolder:
      +database: raw

There are two takeaways for this:

  1. Update the docs site to reflect that configuring sources from dbt_project.yml requires full path specification
  2. Proceed with Set configs in schema.yml files #2401, which we're hoping to tackle later this year

In the meantime, I'm going to close this issue, since the required code change is well captured in the existing issue—it's just going to be a tricky maneuver to pull off.

@kdeggelman
Copy link
Author

@jtcohen6 Thanks for the thorough response, I learned a lot from reading it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants