-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eradicate errors in newly-introduced history-mode tables #41
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @fivetran-jamie, is the intent of the PR to filter out all _fivetran_active = false
records? This PR does eliminate the test failures but it also filters out all the false records and there are generally a lot of them.
For example, comparing the ad group history model before and after, over 90% of the records have been removed. I know it's eventually filtered out in the final google_ads
package, but not sure if we want to be that restrictive in the source package.
I wonder if the best way to eliminate the test failures is to min/max _fivetran_start
and _fivetran_end
on the id
/updated_at
grains (if _fivetran_start
and _fivetran_end
exist). Although that is a bit of a heftier PR to apply that logic.
Also, do we also need to create an issue for this PR for tags/Github tracking?
that is a very good point, as the staging models basically become non-historical with this filter on... we could just simply swap curious what @fivetran-joemarkiewicz thinks (and if any package users want to chime in, i'm all ears 👂 🌽 ) |
@fivetran-avinash thank you for critically reviewing this PR and having a keen eye on how we may possibly keep the historical records so customers may still leverage them. However, after discussing this with the product team we decided the best immediate approach is to filter out the historical records for this first phase of the history rollouts. I think this is something we should discuss more as a team to determine how we should best approach these newer history tables in connectors as they will be added to more connectors in the future. In the past, we have simply taken the approach of filtering out any non active records to make it easier for users to leverage the data in the staging models without needing to account for any historical nuance. This is similar to what we did in the Salesforce package originally to counteract historical data (although we did add a variable to introduce history records if the customer wanted). Although I do not feel the variable is the correct route going forward. We can discuss this in our next data team and with customers for how we may want to handle these going forward, but for right now we should filter them out to avoid the errors the customers are seeing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the note from @fivetran-joemarkiewicz above, I've gone ahead and approved!
PR Overview
This PR will address the following Issue/Feature:
extension of #40
The addition of a
_fivetran_active
field altered the grain of the*_history
tables, as certain changes in Google ads (suc as budgetary changes) may or may not change theupdated_at
field (but will still pass new records to the Fivetran connector)This PR will result in the following new package version:
v0.9.3 -- nothing should change for people without the
_fivetran_active
column. for those with it, this PR will fix errors popping up around the new grainPlease detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":
I actually did find an internal dataset with the new
_fivetran_active
field. here's the output of a run +dbt test
usingmain
and using this working branch:
Moreover, @kaogilvie verified in the thread of #40 that this fix worked for them
Standard Updates
Please acknowledge that your PR contains the following standard updates:
dbt Docs
Please acknowledge that after the above were all completed the below were applied to your branch:
If you had to summarize this PR in an emoji, which would it be?
🧯