Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update/default-schema-change #11

Merged
merged 11 commits into from
Aug 27, 2024

Conversation

fivetran-joemarkiewicz
Copy link
Collaborator

@fivetran-joemarkiewicz fivetran-joemarkiewicz commented Aug 15, 2024

PR Overview

This PR will address the following Issue/Feature: Twitter Organic Source Issue #10

This PR will result in the following new package version: v0.3.0

While there are no breaking changes in this PR, there will be breaking changes in the upstream source package dependency. As such, this will also be breaking.

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

Upstream Breaking Changes

PR #12 from the upstream dbt_twitter_organic_source package includes the following breaking change updates:

  • The source defined in the src_twitter_organic.yml file has been renamed from twitter_organic to twitter to align with the default schema name used by the upstream Fivetran connector.
    • If you're referencing sources from this package, please update your source references as needed. See below for the full scope of source changes.
New Source Reference Old Source Reference
"{{ source('twitter','account_history') }}" "{{ source('twitter_organic','account_history') }}"
"{{ source('twitter','organic_tweet_report') }}" "{{ source('twitter_organic','organic_tweet_report') }}"
"{{ source('twitter','tweet') }}" "{{ source('twitter_organic','tweet') }}"
"{{ source('twitter','twitter_user_history') }}" "{{ source('twitter_organic','twitter_user_history') }}"
  • The default schema name has been modified from twitter_organic to now be twitter to more closely align with the default schema name generated by the Fivetran connector. Please be aware if you were leveraging the previous default schema then you will need to update the twitter_organic_schema variable accordingly.
  • All identifier variables in the src_twitter_organic.yml file have been renamed. If you’re using any of these in your project, please update them accordingly. The changes include:
    • Prepending twitter_organic_* has been updated to twitter_* to align with the schema change.
    • The spelling of *_identifer has been corrected to *_identifier.
New Identifier Variable Name Old Identifier Variable Name
twitter_account_history_identifier twitter_organic_account_history_identifer
twitter_organic_tweet_report_identifier twitter_organic_organic_tweet_report_identifer
twitter_tweet_identifier twitter_organic_tweet_identifer
twitter_twitter_user_history_identifier twitter_organic_twitter_user_history_identifer

Bug Fixes

  • Updated the is_most_recent_record window function in the following models to exclude the source_relation field from the partition statement when twitter_organic_union_schemas or twitter_organic_union_databases variables are empty. Also, modified it to skip the window function if the upstream table is empty, using the new window_function_if_table_exists() and is_table_empty() macros. This change addresses Redshift's issue with partitioning by constant expressions. (PR #11)
    • int_twitter_organic__latest_account
    • int_twitter_organic__latest_user

Under the Hood

  • Consistency validations for integration tests has been added for the twitter_organic__tweets model. (PR #11)
  • Renamed the seed files to allow for more testing functionality. (PR #11)
  • Updated the maintainer PR, Issue, Feature Request, and Config templates to resemble the most up to date format. (PR #11)
  • Incorporated the new fivetran_utils.drop_schemas_automation macro into the end of each Buildkite integration test job. (#7)

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • [n/a] dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

Please see the Twitter Organic Source PR #12 for evidence that the default schema name is now changed.

Additionally, I was able to verify with new consistency validation tests that the results are what we expect and there are no changes between dev and prod.

image

Lastly, in order to confirm the new window function conditional works as expected I was able to confirm the following compilation results:

  • ✅ When using a schema which does have an upstream table and not using the union data feature we can see the expected window function is being used.
    • image
  • ✅ When using a schema with doesn't have an upstream table and not using the union data feature we can see the window function is not used and instead we use the 1=1 logic.
    • image
  • ✅ When using the union data feature and upstream tables do exist then you can see the expected window function is used (the presence of the source_relation field)
    • image
  • ✅ When no tables are found upstream then no window function is used and we end with just a 1=1 as is_most_recent_record.
    • image

If you had to summarize this PR in an emoji, which would it be?

🐦

Comment on lines -14 to -16
models:
twitter_organic_source:
+materialized: table
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was masking the window function issues for Redshift. I'm removing this so we will always test to ensure the window functions work as expected across all warehouses per the default materialization.

Comment on lines -7 to +11
twitter_organic_source:
twitter_organic_account_history_identifer: "twitter_organic_account_history_data"
twitter_organic_organic_tweet_report_identifer: "twitter_organic_organic_tweet_report_data"
twitter_organic_tweet_identifer: "twitter_organic_tweet_data"
twitter_organic_twitter_user_history_identifer: "twitter_organic_twitter_user_history_data"
twitter_organic_schema: twitter_organic_integration_tests_1
twitter_account_history_identifier: "account_history"
twitter_organic_tweet_report_identifier: "organic_tweet_report"
twitter_tweet_identifier: "tweet"
twitter_twitter_user_history_identifier: "twitter_user_history"
twitter_organic_schema: twitter_organic_integration_tests_1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were made to ensure the union_data macro was working as expected. We cannot effectively test the union_data macro when the seed files are named different from the expected source names.

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz Confirmed the macros works as expected across all four cases you tested! One small changelog request but you're good for release review

- `int_twitter_organic__latest_account`
- `int_twitter_organic__latest_user`

## Under the Hood
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a line around the new automation release bot as it's a part of this PR.

Also the Secret needs to be added.
Screenshot 2024-08-21 at 11 59 01 AM (Update looks like this was just added so all good here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz bumping this one!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz this was already added

Copy link
Contributor

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to add the auto-releaser line but otherwise lgtm for release!

- `int_twitter_organic__latest_account`
- `int_twitter_organic__latest_user`

## Under the Hood
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz bumping this one!

@fivetran-joemarkiewicz fivetran-joemarkiewicz merged commit d9033a1 into main Aug 27, 2024
8 checks passed
@fivetran-joemarkiewicz fivetran-joemarkiewicz deleted the update/default-schema-change branch August 27, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants