Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2050] Use of "path" and "original_file_path" in nodes is confused #6879

Open
gshank opened this issue Feb 6, 2023 · 1 comment
Open
Labels
file_system How dbt-core interoperates with file systems to read/write data tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality

Comments

@gshank
Copy link
Contributor

gshank commented Feb 6, 2023

The "original_file_path" field identifies the file from which this node was parsed, such as "models/schema.yml" for a schema file, or "models/my_model.sql" for a model. The "path" field, however, means a number of different things depending on where the node came from.

For generic tests, original_file_path has the schema file listed, but "path" contains the name of the generic test, i.e. "unique_raw_customers_id.sql". If the same schema file implements the same generic test in multiple places, these tests would overwrite each other in the compiled "target/compiled/<project_name>/models/schema.yml" directory. (I think that at one point these written test names contained the generated unique id name, such as "unique_stg_customers_customer_id.c7614daada", but that seems to have been lost somewhere).

In addition, files that can produce multiple nodes, such as macros may have a name in "path" to distinguish the various blocks.

The so-called "path" is used in creating the file path for writing out the compiled files (in the "compiled" and "run" directories), but it does it by checking whether the path is equal to the original_file_path, and if it isn't (it usually is), it appends the path to the original_file_path to get the path for writing the compiled file.

So sometimes the "path" field has the normal path of the file, sometimes it has the filename of a generic test, sometimes it has a block name.

I propose that we change "path" to always have the path of the file, and we create a new field (path_extra) to contain the necessary extra pieces of the compiled file paths. That way we can more easily construct the "bulld_path" and "compiled_path" (for example see "write_node" in ParsedNode).

@github-actions github-actions bot changed the title Use of "path" and "original_file_path" in nodes is confused [CT-2050] Use of "path" and "original_file_path" in nodes is confused Feb 6, 2023
@gshank gshank added Team:Language tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality labels Feb 6, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 7, 2023

@gshank Thanks for opening! I'm reading this as good-old-fashioned tech debt: a thing that's currently inconsistent, and ought to be more consistent. I don't see this as blocking for or blocked by any of the work in #6873.

@jtcohen6 jtcohen6 added the file_system How dbt-core interoperates with file systems to read/write data label Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
file_system How dbt-core interoperates with file systems to read/write data tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality
Projects
None yet
Development

No branches or pull requests

2 participants