-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-impala: Truncate table before inserting data #2369
vdk-impala: Truncate table before inserting data #2369
Conversation
Why: Currently when the insert template is used with quality checks it only appends data to the staging table without truncating it at any moment. This will most probably lead to duplicated data on the following run, because old source data will be already in the staging table before appending the new one. More details explained in #1361 What: -Adding truncate statement before the data is being inserted in the staging table to ensure that no leftover data from previous runs is left. Signed-off-by: Stefan Buldeev [email protected]
It's important when we are fixing a bug we add a test to ensure no regressions. Otherwise it's quite possible the same issue to resurface in the future after some refactoring. So let's add a case that reproduces the issue (e.g we can try to execute the template twice and observe the duplicated rows) so test test would fail and after the fix would succeed. |
...gins/vdk-impala/src/vdk/plugin/impala/templates/load/fact/insert/02-handle-quality-checks.py
Show resolved
Hide resolved
Sure, added a test for checking whether the staging table data after 2 consecutive executions is being different, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good . Thanks for fixing this. It's approved.
I added a few comments which if addressed would help future troubleshooting and debugging so I encourage you to address them if you want. But you can merge it when you want.
projects/vdk-plugins/vdk-impala/tests/functional/template_regression_test.py
Show resolved
Hide resolved
projects/vdk-plugins/vdk-impala/tests/functional/template_regression_test.py
Show resolved
Hide resolved
projects/vdk-plugins/vdk-impala/tests/functional/template_regression_test.py
Outdated
Show resolved
Hide resolved
…ttps://github.com/vmware/versatile-data-kit into person/sbuldeev/insert-template-quality-checks-fix
for more information, see https://pre-commit.ci
Why:
Currently when the insert template is used with quality checks it only appends data to the staging table without truncating it at any moment. This will most probably lead to duplicated data on the following run, because old source data will be already in the staging table before appending the new one.
More details explained in
#1361
What:
-Adding truncate statement before the data is being inserted in the staging table to ensure that no leftover data from previous runs is left.
Signed-off-by: Stefan Buldeev [email protected]