Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-impala: Truncate table before inserting data #2369

Merged
merged 6 commits into from
Jul 12, 2023

Conversation

sbuldeev
Copy link
Collaborator

@sbuldeev sbuldeev commented Jul 7, 2023

Why:
Currently when the insert template is used with quality checks it only appends data to the staging table without truncating it at any moment. This will most probably lead to duplicated data on the following run, because old source data will be already in the staging table before appending the new one.

More details explained in
#1361

What:
-Adding truncate statement before the data is being inserted in the staging table to ensure that no leftover data from previous runs is left.

Signed-off-by: Stefan Buldeev [email protected]

Why:
Currently when the insert template is used with quality checks it only appends data to the staging table without truncating it at any moment. This will most probably lead to duplicated data on the following run, because old source data will be already in the staging table before appending the new one.

More details explained in
#1361

What:
-Adding truncate statement before the data is being inserted in the staging table to ensure that no leftover data from previous runs is left.

Signed-off-by: Stefan Buldeev [email protected]
@antoniivanov
Copy link
Collaborator

It's important when we are fixing a bug we add a test to ensure no regressions. Otherwise it's quite possible the same issue to resurface in the future after some refactoring.

So let's add a case that reproduces the issue (e.g we can try to execute the template twice and observe the duplicated rows) so test test would fail and after the fix would succeed.

@sbuldeev
Copy link
Collaborator Author

Sure, added a test for checking whether the staging table data after 2 consecutive executions is being different,

Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good . Thanks for fixing this. It's approved.

I added a few comments which if addressed would help future troubleshooting and debugging so I encourage you to address them if you want. But you can merge it when you want.

@sbuldeev sbuldeev merged commit 3dc95ac into main Jul 12, 2023
@sbuldeev sbuldeev deleted the person/sbuldeev/insert-template-quality-checks-fix branch July 12, 2023 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants