Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-impala: Add optional parameter for staging table prefix #1666

Merged
merged 6 commits into from
Mar 1, 2023

Conversation

sbuldeev
Copy link
Collaborator

What:
Adding the option to pass a staging table prefix as well as code refactoring and readme file enhancements

Why:
It is linked to the issue

Signed-off-by: Stefan Buldeev [email protected]

What:
Adding the option to pass a staging table prefix as well as code refactoring and readme file enhancements

Why:
It is linked to the issue

Signed-off-by: Stefan Buldeev [email protected]
@sbuldeev sbuldeev linked an issue Feb 22, 2023 that may be closed by this pull request
@antoniivanov
Copy link
Collaborator

I left a few questions in the source issue trying to understand the problem. I will take a look at the PR after we that.

Changing the staging table format from "{custom_prefix}_{target_table}" to "vdk_check_{target_schema}_{target_table}" and removing the optional parameter for custom prefix from the template.
Raising an error in case of combination of vdk_check_{target_schema}_{target_table} longer than 128 characters.
@antoniivanov
Copy link
Collaborator

Please add a test covering this new scenario - e.g execute_template being called twice with different schemas and same table name..

There are multiple reasons for why Adding a test to verify a bug fix is considered good practice:

  1. The most obvious one - ensuring the bug is fixed: Adding a test that verifies the bug fix confirms that the issue has been resolved and will not recur.
  2. Preventing regression in the future: A regression occurs when a new change breaks a previously working functionality. Adding a test that verifies a bug fix helps prevent regression by ensuring that the bug is not reintroduced in the future which tends to happen.
  3. Providing documentation: Adding a test that verifies a bug fix serves as a form of documentation for the code. The test provides a clear and concise example of the expected behavior of the code, which can be used as a reference for future development.
  4. Improving code quality: By writing a test that verifies the bug fix, you need to consider the full scope of the bug and its potential impact on the code. This helps improve the overall quality of the code by identifying potential issues that may have been missed.

@sbuldeev
Copy link
Collaborator Author

Actually this change is in order to improve performance instead of fixing a bug.
Referring to the current logic, if there are same table names in different schemas and if we try to check both tables in a same staging schema this is what will happen:

Lets say the tables are called vmc_schema.esx and vsphere_schema.esx. I would call the staging schema "check_schema".
When processing vmc_schema.esx a new table will be created with the name check_schema.vdk_check_esx. This table will be a copy of vmc_schema.esx. After that if we want to process data to vsphere_schema.esx, the template will check if table named
"check_schema.vdk_check_esx" exists and if yes it will check if its structure matches the structure of vsphere_schema.esx. If yes the check_schema.vdk_check_esx table will be reused for checking purposes. If no the check_schema.vdk_check_esx will be recreated to match vsphere_schema.esx and after that the checks will be executed.

With that said I assume we can have same table names in different schema reusing same staging schema for checks. But if the structure of these tables is different(which I guess would be most of the times), the shared staging table between them will be dropped and created on each processing run, which I want to prevent with this change.

@sbuldeev sbuldeev merged commit 474d2fc into main Mar 1, 2023
@sbuldeev sbuldeev deleted the person/sbuldeev/add-staging-table-prefix-to-scd1 branch March 1, 2023 09:56
yonitoo pushed a commit that referenced this pull request Mar 1, 2023
What:
Adding the option to pass a staging table prefix as well as code
refactoring and readme file enhancements

Why:
It is linked to the issue

Signed-off-by: Stefan Buldeev [email protected]

---------

Signed-off-by: Stefan Buldeev [email protected]
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add staging table prefix as optional parameter to scd1 template
4 participants