bug: default test `TargetDuplicateRecords` failing, records not deduped using key_properties #41

pnadolny13 · 2023-06-05T21:33:09Z

The test data https://github.com/meltano/sdk/blob/main/singer_sdk/testing/target_test_streams/duplicate_records.singer sends 5 records but only 2 distinct key property IDs so they should be updating instead of appending. I added an assertion to the test to make sure only 2 records were present in the final table but theres still 5.

I think in #15 there was an attempt to dedupe using temp tables, maybe that was to fix this issue.

Closes #28 - implements a fix for the test suites not running properly, also added it to the SDK meltano/sdk#1749 - implements a lot of the default test validates methods to make asserts - comment out the tests that fail for legitimate bugs - logged the bugs - #43 - #41 - #40 - I also logged #42 because I wrote a test to assert the exception but I'm not actually sure if we want that behavior or not --------- Co-authored-by: Ken Payne <[email protected]>

Closes #41 The challenge is that we're using a merge statement which is successfully deduplicating against what already exists in the target table but within the batch of records in the stage there are also dupes. The test was failing because no data existed in the destination table so we weren't updating any records, only inserting, but within our staging file we had multiple primary keys ID 1 and 2 so they all get inserting and the result is duplicates in the destination table. The way I fixed it in this PR is by adding a qualify row_num = 1 to deduplicate within our staging file select query. It uses the SEQ8 function, which I've never used before, to order the records based on their place in the file i.e. the bottom of the table takes precedence over the top. I looks to work as expected but it feels a little sketchy, I wonder if unsorted streams would have issues where the wrong record gets selected. Ideally the user would tell us a sort by column to know how to take the latest. --------- Co-authored-by: Ken Payne <[email protected]>

MeltyBot added this to MeltanoLabs Overview Jun 5, 2023

pnadolny13 mentioned this issue Jun 5, 2023

fix: fix tests by overriding the runner for targets #38

Merged

pnadolny13 mentioned this issue Jun 8, 2023

fix: duplicate record test #47

Merged

pnadolny13 self-assigned this Jun 8, 2023

pnadolny13 added this to Data Team Jun 8, 2023

pnadolny13 closed this as completed in #47 Jun 14, 2023

github-project-automation bot moved this to Planned in Data Team Jun 14, 2023

github-project-automation bot moved this to Done in MeltanoLabs Overview Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: default test `TargetDuplicateRecords` failing, records not deduped using key_properties #41

bug: default test `TargetDuplicateRecords` failing, records not deduped using key_properties #41

pnadolny13 commented Jun 5, 2023

bug: default test TargetDuplicateRecords failing, records not deduped using key_properties #41

bug: default test TargetDuplicateRecords failing, records not deduped using key_properties #41

Comments

pnadolny13 commented Jun 5, 2023

bug: default test `TargetDuplicateRecords` failing, records not deduped using key_properties #41

bug: default test `TargetDuplicateRecords` failing, records not deduped using key_properties #41