-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: default test TargetDuplicateRecords
failing, records not deduped using key_properties
#41
Comments
pnadolny13
added a commit
that referenced
this issue
Jun 6, 2023
Closes #28 - implements a fix for the test suites not running properly, also added it to the SDK meltano/sdk#1749 - implements a lot of the default test validates methods to make asserts - comment out the tests that fail for legitimate bugs - logged the bugs - #43 - #41 - #40 - I also logged #42 because I wrote a test to assert the exception but I'm not actually sure if we want that behavior or not --------- Co-authored-by: Ken Payne <[email protected]>
pnadolny13
added a commit
that referenced
this issue
Jun 14, 2023
Closes #41 The challenge is that we're using a merge statement which is successfully deduplicating against what already exists in the target table but within the batch of records in the stage there are also dupes. The test was failing because no data existed in the destination table so we weren't updating any records, only inserting, but within our staging file we had multiple primary keys ID 1 and 2 so they all get inserting and the result is duplicates in the destination table. The way I fixed it in this PR is by adding a qualify row_num = 1 to deduplicate within our staging file select query. It uses the SEQ8 function, which I've never used before, to order the records based on their place in the file i.e. the bottom of the table takes precedence over the top. I looks to work as expected but it feels a little sketchy, I wonder if unsorted streams would have issues where the wrong record gets selected. Ideally the user would tell us a sort by column to know how to take the latest. --------- Co-authored-by: Ken Payne <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The test data https://github.com/meltano/sdk/blob/main/singer_sdk/testing/target_test_streams/duplicate_records.singer sends 5 records but only 2 distinct key property IDs so they should be updating instead of appending. I added an assertion to the test to make sure only 2 records were present in the final table but theres still 5.
I think in #15 there was an attempt to dedupe using temp tables, maybe that was to fix this issue.
The text was updated successfully, but these errors were encountered: