Eradicate errors in newly-introduced history-mode tables #41

fivetran-jamie · 2023-08-17T18:46:31Z

PR Overview

This PR will address the following Issue/Feature:
extension of #40

The addition of a _fivetran_active field altered the grain of the *_history tables, as certain changes in Google ads (suc as budgetary changes) may or may not change the updated_at field (but will still pass new records to the Fivetran connector)

This PR will result in the following new package version:

v0.9.3 -- nothing should change for people without the _fivetran_active column. for those with it, this PR will fix errors popping up around the new grain

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

Before marking this PR as "ready for review" the following have been applied:

The appropriate issue has been linked and tagged
You are assigned to the corresponding issue and this PR
BuildKite integration tests are passing

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

You have validated these changes and assure this PR will address the respective Issue/Feature.
You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
You have provided details below around the validation steps performed to gain confidence in these changes.

I actually did find an internal dataset with the new _fivetran_active field. here's the output of a run +dbt test using main

18:35:32  Completed with 3 errors and 0 warnings:
18:35:32  
18:35:32  Failure in test dbt_utils_unique_combination_of_columns_stg_google_ads__ad_group_history_ad_group_id__updated_at (models/stg_google_ads.yml)
18:35:32    Got 174 results, configured to fail if != 0
18:35:32  
18:35:32    compiled Code at target/compiled/google_ads_source/models/stg_google_ads.yml/dbt_utils_unique_combination_o_0c1cbeb5a9539431a7fbce6af1a21d7a.sql
18:35:32  
18:35:32  Failure in test dbt_utils_unique_combination_of_columns_stg_google_ads__ad_history_ad_id__ad_group_id__updated_at (models/stg_google_ads.yml)
18:35:32    Got 7 results, configured to fail if != 0
18:35:32  
18:35:32    compiled Code at target/compiled/google_ads_source/models/stg_google_ads.yml/dbt_utils_unique_combination_o_0cf5dbf0b60dae1b36794a079a6f8b74.sql
18:35:32  
18:35:32  Failure in test dbt_utils_unique_combination_of_columns_stg_google_ads__campaign_history_campaign_id__updated_at (models/stg_google_ads.yml)
18:35:32    Got 1 result, configured to fail if != 0
18:35:32  
18:35:32    compiled Code at target/compiled/google_ads_source/models/stg_google_ads.yml/dbt_utils_unique_combination_o_bd5040437362e14b36ab7ce3eaa14d1d.sql
18:35:32  
18:35:32  Done. PASS=23 WARN=0 ERROR=3 SKIP=0 TOTAL=26

and using this working branch:

18:38:44  Completed successfully
18:38:44  
18:38:44  Done. PASS=26 WARN=0 ERROR=0 SKIP=0 TOTAL=26

Moreover, @kaogilvie verified in the thread of #40 that this fix worked for them

Standard Updates

Please acknowledge that your PR contains the following standard updates:

Package versioning has been appropriately indexed in the following locations:
- indexed within dbt_project.yml
- indexed within integration_tests/dbt_project.yml
CHANGELOG has individual entries for each respective change in this PR
[NA] README updates have been applied (if applicable)
[NA] DECISIONLOG updates have been updated (if applicable)
Appropriate yml documentation has been added (if applicable)

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

docs were regenerated (unless this PR does not include any code or yml updates)

If you had to summarize this PR in an emoji, which would it be?

🧯

fivetran-avinash

Hey @fivetran-jamie, is the intent of the PR to filter out all _fivetran_active = false records? This PR does eliminate the test failures but it also filters out all the false records and there are generally a lot of them.

For example, comparing the ad group history model before and after, over 90% of the records have been removed. I know it's eventually filtered out in the final google_ads package, but not sure if we want to be that restrictive in the source package.

I wonder if the best way to eliminate the test failures is to min/max _fivetran_start and _fivetran_end on the id/updated_at grains (if _fivetran_start and _fivetran_end exist). Although that is a bit of a heftier PR to apply that logic.

Also, do we also need to create an issue for this PR for tags/Github tracking?

fivetran-jamie · 2023-08-18T20:00:27Z

that is a very good point, as the staging models basically become non-historical with this filter on... we could just simply swap updated_at with _fivetran_start in the uniqueness tests. i do wonder though if users would prefer to limit out non-active records for computational reasons

curious what @fivetran-joemarkiewicz thinks (and if any package users want to chime in, i'm all ears 👂 🌽 )

fivetran-joemarkiewicz · 2023-08-21T15:08:51Z

@fivetran-avinash thank you for critically reviewing this PR and having a keen eye on how we may possibly keep the historical records so customers may still leverage them. However, after discussing this with the product team we decided the best immediate approach is to filter out the historical records for this first phase of the history rollouts.

I think this is something we should discuss more as a team to determine how we should best approach these newer history tables in connectors as they will be added to more connectors in the future. In the past, we have simply taken the approach of filtering out any non active records to make it easier for users to leverage the data in the staging models without needing to account for any historical nuance. This is similar to what we did in the Salesforce package originally to counteract historical data (although we did add a variable to introduce history records if the customer wanted). Although I do not feel the variable is the correct route going forward.

We can discuss this in our next data team and with customers for how we may want to handle these going forward, but for right now we should filter them out to avoid the errors the customers are seeing.

fivetran-avinash

Based on the note from @fivetran-joemarkiewicz above, I've gone ahead and approved!

fivetran-jamie added 3 commits August 16, 2023 11:04

trying this out

be985fc

prep for release

8669346

Docs

c7c2d69

fivetran-jamie self-assigned this Aug 17, 2023

fivetran-avinash self-requested a review August 17, 2023 21:30

fivetran-avinash reviewed Aug 18, 2023

View reviewed changes

fivetran-avinash approved these changes Aug 21, 2023

View reviewed changes

fivetran-jamie merged commit f2412c9 into main Aug 21, 2023

fivetran-jamie mentioned this pull request Aug 17, 2023

prevent-errors-in-tests-since-the-table-is-in-History-Mode #40

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eradicate errors in newly-introduced history-mode tables #41

Eradicate errors in newly-introduced history-mode tables #41

fivetran-jamie commented Aug 17, 2023 •

edited

Loading

fivetran-avinash left a comment

fivetran-jamie commented Aug 18, 2023

fivetran-joemarkiewicz commented Aug 21, 2023

fivetran-avinash left a comment

Eradicate errors in newly-introduced history-mode tables #41

Eradicate errors in newly-introduced history-mode tables #41

Conversation

fivetran-jamie commented Aug 17, 2023 • edited Loading

PR Overview

PR Checklist

Basic Validation

Detailed Validation

Standard Updates

dbt Docs

If you had to summarize this PR in an emoji, which would it be?

fivetran-avinash left a comment

Choose a reason for hiding this comment

fivetran-jamie commented Aug 18, 2023

fivetran-joemarkiewicz commented Aug 21, 2023

fivetran-avinash left a comment

Choose a reason for hiding this comment

fivetran-jamie commented Aug 17, 2023 •

edited

Loading