Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/ad-reporting-v2 #20

Merged
merged 25 commits into from
Sep 1, 2022
Merged

Update/ad-reporting-v2 #20

merged 25 commits into from
Sep 1, 2022

Conversation

fivetran-sheringuyen
Copy link
Contributor

@fivetran-sheringuyen fivetran-sheringuyen commented Jul 5, 2022

Pull Request
Are you a current Fivetran customer?
Fivetran employee

What change(s) does this PR introduce?
Breaking Changes
PR [] makes the below updates that may affect your workflow:

  • modified_timestamp columns have been renamed to modified_at and is_most_recent_version has been renamed to is_most_recent_record to reflect more recent package coding standards for the below models:
    • stg_microsoft_ads__account_history
    • stg_microsoft_ads__ad_group_history
    • stg_microsoft_ads__ad_history
    • stg_microsoft_ads__ad_performance_daily_report
    • stg_microsoft_ads__campaign_history
  • Deprecating *_version_id fields in *_history models.

Feature Enhancements
We have added the below feature enhancements to this package in (PR #)[]:

  • Add get_*_columns macros for previously included models and newly added models.
  • Updated staging model standards on old models to conform with recent package development standards. Updates were made to the below models:
    • stg_microsoft_ads__account_history
    • stg_microsoft_ads__ad_group_history
    • stg_microsoft_ads__ad_history
    • stg_microsoft_ads__ad_performance_daily_report
    • stg_microsoft_ads__campaign_history
  • New history and daily performance staging models including:
    • stg_microsoft_ads__account_daily_report
    • stg_microsoft_ads__campaign_daily_report
    • stg_microsoft_ads__ad_group_daily_report
    • stg_microsoft_ads__search_daily_report
    • stg_microsoft_ads__keyword_daily_report
    • stg_microsoft_ads__keyword_history
  • README updates for easier navigation and use of the package.
  • Addition of identifier variables for each of the source tables to allow for further flexibility in source table direction within the dbt project.
  • More robust testing for better data integrity including:
    • Freshness tests
    • Model grain tests
  • Passthrough metric functionality for more flexibility in reporting.
  • Additional documentation for new models added.

Did you update the CHANGELOG?

  • Yes

Does this PR introduce a breaking change?

  • Yes (please provide breaking change details below.)
  • No (please provide an explanation as to how the change is non-breaking below.)

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

  • Yes

Is this PR in response to a previously created Bug or Feature Request

N/A

How did you test the PR changes?

  • CircleCi
  • Local (please provide additional testing details below)

Select which warehouse(s) were used to test the PR

  • BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Provide an emoji that best describes your current mood

💃

Feedback

We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-sheringuyen thanks so much for this PR and putting this package through a much needed overhaul! I have a few comments you can find below and inline. Let me know if you want to chat more about any of them!

  • I really like the depth you put into the staging model tests. I do wonder if this would break if individuals add more metrics via the passthrough columns? Is this something we should consider?
  • It looks like the is_most_recent_record field was not included for all of the history staging model yml documentation. Would you be able to add that in.
  • I have a pretty open ended question/suggestion on what fields to include in the staging column macros and how to pass through the variable metrics. See my deeper comments in the review, happy to chat more about the approach we should take.
  • Address any other inline comments.

CHANGELOG.md Outdated
# dbt_microsoft_ads_source v0.6.0

## 🚨 Breaking Changes 🚨
PR [] makes the below updates that may affect your workflow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to add the PR link.

CHANGELOG.md Outdated
- Deprecating `*_version_id` fields in `*_history` models.

## 🎉 Feature Enhancements 🎉
We have added the below feature enhancements to this package in (PR #)[]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar reminder to add the PR link.

{"name": "bid_match_type", "datatype": dbt_utils.type_string()},
{"name": "clicks", "datatype": dbt_utils.type_int()},
{"name": "conversion_rate", "datatype": dbt_utils.type_float()},
{"name": "conversions", "datatype": dbt_utils.type_int()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I am in the camp of removing fields from the macro that are not included in the final CTE of the staging model. The reason being, if a customer has a field that they want to passthrough as a variable, but we don't have that field in this macro, then it would fail.

I think it would be better if we just add the passthrough jinja in both CTE pieces instead. This way there would be no possibilities of the model failing if we do not have the field in our macro. Thoughts?

So we would have the passthrough logic as follows:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same comment will apply to all the macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this for a while as well. I actually have some questions here:

  1. The only cases I could imagine where a customer would include a field that isn't already in the macro are:
  • (A) we've added new columns to the table in the connector, customer adds as passthrough column but we have not updated the package macros to reflect that change
  • (B) customer has custom fields that we don't see on our end. So outside of those two cases, are there others where the fields added as passthroughs aren't already included in the macro?
  1. In the case that a customer adds a passthrough column..that doesn't actually have any values (off chance), would we want to leverage the macro as a safety?

models/stg_microsoft_ads__account_daily_report.sql Outdated Show resolved Hide resolved

select *
from {{ ref('stg_microsoft_ads__account_daily_report_tmp') }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super small style note/question. Should we remove this extra enter. The other style guide suggests there is no whitespace following the from and before the ),.

@@ -0,0 +1 @@
select * from {{ var('account_performance_daily_report') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super small style note/question - should we have this query as two lines instead of one? I notice this is pretty inconsistent across packages. My preference would be two lines.

README.md Outdated
# dbt_project.yml
vars:
microsoft_ads__account_report_passthrough_metrics: ['the', 'list', 'of', 'metric', 'columns', 'to', 'include'] # from microsoft_ads.account_performance_daily_report
microsoft_ads__campaign_report_passthrough_columns: ['the', 'list', 'of', 'metric', 'columns', 'to', 'include'] # from microsoft_ads.campaign_performance_daily_report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just was passing by, noticed that these columns should be renamed to metrics!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

@fivetran-sheringuyen
Copy link
Contributor Author

Hey @fivetran-joemarkiewicz, re:

I really like the depth you put into the staging model tests. I do wonder if this would break if individuals add more metrics via the passthrough columns? Is this something we should consider?

I think in my head I've only been thinking of customers using that variable to passthrough metrics - in which case, this shouldn't be a problem at all. If they are passing through non-metrics, this would be a concern we should consider -- however, it would also break some of the sum( {{ metric }} ) logic that we have incorporated. But that would open another can of worms one that I think can be avoided since we do name the variables *_metrics so there should be the assumption that passthrough columns should only be metrics.

Additionally, before adding changes for the macro + CTE suggestions, I'd like to discuss that as a team first and walk through the nuances.

@fivetran-joemarkiewicz
Copy link
Contributor

Hey @fivetran-joemarkiewicz, re:

I really like the depth you put into the staging model tests. I do wonder if this would break if individuals add more metrics via the passthrough columns? Is this something we should consider?

I think in my head I've only been thinking of customers using that variable to passthrough metrics - in which case, this shouldn't be a problem at all. If they are passing through non-metrics, this would be a concern we should consider -- however, it would also break some of the sum( {{ metric }} ) logic that we have incorporated. But that would open another can of worms one that I think can be avoided since we do name the variables *_metrics so there should be the assumption that passthrough columns should only be metrics.

Additionally, before adding changes for the macro + CTE suggestions, I'd like to discuss that as a team first and walk through the nuances.

@fivetran-sheringuyen this makes sense to me! Maybe we can add a small clarifying statement in the README to only include metrics in this passthrough variable. This way there is nothing left up to interpretation (although it does seem very obvious to me haha).

I also agree that we should discuss the passthrough + macro behavior closer as a team. We can do so during standup this morning!

@fivetran-sheringuyen
Copy link
Contributor Author

Hey @fivetran-joemarkiewicz! Just updated the repo for PR changes. Let me know if you have any follow up questions!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-sheringuyen thanks so much for applying the review notes to this PR! After re-reviewing, this looks good to go on my end. Great job with this overhaul 🏋️

@fivetran-sheringuyen fivetran-sheringuyen merged commit 3d56e17 into main Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants