Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Union schema compatibility #43

Merged
merged 19 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_google_ads_source v0.10.0

[PR #43](https://github.com/fivetran/dbt_google_ads_source/pull/43) includes the following updates:
## Feature update 🎉
- Unioning capability! This adds the ability to union source data from multiple google_ads connectors. Refer to the [Union Multiple Connectors README section](https://github.com/fivetran/dbt_google_ads_source/blob/main/README.md#union-multiple-connectors) for more details.

## Under the Hood 🚘
- Updated tmp models to union source data using the `fivetran_utils.union_data` macro.
- To distinguish which source each field comes from, added `source_relation` column in each staging model and applied the `fivetran_utils.source_relation` macro.
- Updated tests to account for the new `source_relation` column.

[PR #47](https://github.com/fivetran/dbt_google_ads_source/pull/47) includes the following update:
## Dependency Updates
- Removes the dependency on [dbt-expectations](https://github.com/calogica/dbt-expectations/releases). Specifically we removed the `dbt_expectations.expect_column_values_to_not_match_regex_list` test.

# dbt_google_ads_source v0.9.5
## Rollback
[PR #46](https://github.com/fivetran/dbt_google_ads_source/pull/46) rolls back [PR #45](https://github.com/fivetran/dbt_google_ads_source/pull/45)
Expand Down Expand Up @@ -142,4 +157,4 @@ PR [#29](https://github.com/fivetran/dbt_google_ads_source/pull/29) includes the
- [NoToWarAlways](https://github.com/NoToWarAlways) ([#19](https://github.com/fivetran/dbt_google_ads_source/pull/19))

# dbt_google_ads_source v0.1.0 -> v0.4.0
Refer to the relevant release notes on the Github repository for specific details for the previous releases. Thank you!
Refer to the relevant release notes on the Github repository for specific details for the previous releases. Thank you!
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,11 @@ To use this dbt package, you must have the following:
- A **BigQuery**, **Snowflake**, **Redshift**, **PostgreSQL**, or **Databricks** destination.

### Databricks Dispatch Configuration
If you are using a Databricks destination with this package you will need to add the below (or a variation of the below) dispatch configuration within your `dbt_project.yml`. This is required in order for the package to accurately search for macros within the `dbt-labs/spark_utils` then the `dbt-labs/dbt_utils` as well as the `calogica/dbt_expectations` then the `google_ads_source` packages respectively.
If you are using a Databricks destination with this package you will need to add the below (or a variation of the below) dispatch configuration within your `dbt_project.yml`. This is required in order for the package to accurately search for macros within the `dbt-labs/spark_utils` then the `dbt-labs/dbt_utils`.
```yml
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']

- macro_namespace: dbt_expectations
search_order: ['google_ads_source', 'dbt_expectations']
```

## Step 2: Install the package (skip if also using the `google_ads` transformation package)
Expand All @@ -43,7 +40,7 @@ If you are **not** using the [Google Ads transformation package](https://github
```yml
packages:
- package: fivetran/google_ads_source
version: [">=0.9.0", "<0.10.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.10.0", "<0.11.0"] # we recommend using ranges to capture non-breaking changes automatically
```
## Step 3: Define database and schema variables
By default, this package runs using your destination and the `google_ads` schema. If this is not where your Google Ads data is (for example, if your google_ads schema is named `google_ads_fivetran`), add the following configuration to your root `dbt_project.yml` file:
Expand All @@ -55,7 +52,17 @@ vars:
```

## (Optional) Step 4: Additional configurations
<details><summary>Expand for configurations</summary>
### Union multiple connectors
If you have multiple google_ads connectors in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. The package will union all of the data together and pass the unioned table into the transformations. You will be able to see which source it came from in the `source_relation` column of each model. To use this functionality, you will need to set either the `google_ads_union_schemas` OR `google_ads_union_databases` variables (cannot do both) in your root `dbt_project.yml` file:

```yml
vars:
google_ads_union_schemas: ['google_ads_usa','google_ads_canada'] # use this if the data is in different schemas/datasets of the same database/project
google_ads_union_databases: ['google_ads_usa','google_ads_canada'] # use this if the data is in different databases/projects but uses the same schema name
```
Please be aware that the native `source.yml` connection set up in the package will not function when the union schema/database feature is utilized. Although the data will be correctly combined, you will not observe the sources linked to the package models in the Directed Acyclic Graph (DAG). This happens because the package includes only one defined `source.yml`.

To connect your multiple schema/database sources to the package models, follow the steps outlined in the [Union Data Defined Sources Configuration](https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source) section of the Fivetran Utils documentation for the union_data macro. This will ensure a proper configuration and correct visualization of connections in the DAG.

### Passing Through Additional Metrics
By default, this package will select `clicks`, `impressions`, and `cost` from the source reporting tables to store into the staging models. If you would like to pass through additional metrics to the staging models, add the below configurations to your `dbt_project.yml` file. These variables allow for the pass-through fields to be aliased (`alias`) if desired, but not required. Use the below format for declaring the respective pass-through variables:
Expand Down Expand Up @@ -96,8 +103,6 @@ vars:
google_ads_<default_source_table_name>_identifier: your_table_name
```

</details>

## (Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for more details</summary>

Expand All @@ -116,12 +121,6 @@ packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]

- package: calogica/dbt_expectations
version: [">=0.8.0", "<0.9.0"]

- package: calogica/dbt_date
version: [">=0.7.0", "<0.8.0"]

- package: dbt-labs/spark_utils
version: [">=0.3.0", "<0.4.0"]
```
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'google_ads_source'
version: '0.9.5'
version: '0.10.0'
config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
vars:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,6 @@ integration_tests:
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: google_ads_source_integration_tests_3
threads: 2
threads: 8
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
5 changes: 1 addition & 4 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'google_ads_source_integration_tests'
version: '0.9.5'
version: '0.10.0'
profile: 'integration_tests'
config-version: 2

Expand Down Expand Up @@ -32,6 +32,3 @@ seeds:
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']

- macro_namespace: dbt_expectations
search_order: ['google_ads_source', 'dbt_expectations']
3 changes: 0 additions & 3 deletions macros/regexp_instr.sql

This file was deleted.

4 changes: 4 additions & 0 deletions models/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,7 @@ The Google Ad network type used across the account.
{% docs device %}
Account ad performance per unique device where the ads were served.
{% enddocs %}

{% docs source_relation %}
The source of the record if the unioning functionality is being used. If not this field will be empty.
{% enddocs %}
2 changes: 1 addition & 1 deletion models/src_google_ads.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: 2

sources:
- name: google_ads
- name: google_ads # This source will only be used if you are using a single google_ads source connector. If multiple sources are being unioned, their tables will be directly referenced via adapter.get_relation.
schema: "{{ var('google_ads_schema', 'google_ads') }}"
database: "{% if target.type != 'spark' %}{{ var('google_ads_database', target.database) }}{% endif %}"

Expand Down
35 changes: 30 additions & 5 deletions models/stg_google_ads.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- account_id
- updated_at
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('account_id') }}"
tests:
Expand All @@ -31,9 +34,12 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- ad_group_id
- updated_at
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: ad_group_id
description: "{{ doc('ad_group_id') }}"
tests:
Expand All @@ -58,10 +64,13 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- ad_id
- ad_group_id
- updated_at
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: ad_group_id
description: "{{ doc('ad_group_id') }}"
- name: ad_id
Expand All @@ -80,11 +89,6 @@ models:
description: "{{ doc('is_most_recent_record') }}"
- name: source_final_urls
description: The original list of final urls expressed as an array. Please be aware the test used on this field is intended to warn you if you have fields with multiple urls. If you do, the `final_url` field will filter down the urls within the array to just the first. Therefore, this package will only leverage one of possibly many urls within this field array.
tests:
- dbt_expectations.expect_column_values_to_not_match_regex_list:
regex_list: ","
match_on: any
severity: warn
- name: final_url
description: The first url in the list of the urls within the `final_urls` source field.
- name: base_url
Expand All @@ -109,13 +113,16 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- ad_id
- ad_network_type
- device
- ad_group_id
- keyword_ad_group_criterion
- date_day
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('external_customer_id') }}"
- name: date_day
Expand Down Expand Up @@ -148,9 +155,12 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- campaign_id
- updated_at
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: campaign_id
description: "{{ doc('campaign_id') }}"
tests:
Expand Down Expand Up @@ -183,10 +193,13 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- criterion_id
- ad_group_id
- updated_at
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: criterion_id
description: Unique identifier of the ad group criterion.
tests:
Expand All @@ -213,11 +226,14 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- ad_group_id
- device
- ad_network_type
- date_day
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('external_customer_id') }}"
- name: date_day
Expand Down Expand Up @@ -246,11 +262,14 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- campaign_id
- ad_network_type
- device
- date_day
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('external_customer_id') }}"
- name: date_day
Expand All @@ -277,9 +296,12 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- keyword_id
- date_day
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('external_customer_id') }}"
- name: date_day
Expand Down Expand Up @@ -308,11 +330,14 @@ models:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source_relation
- account_id
- device
- ad_network_type
- date_day
columns:
- name: source_relation
description: "{{ doc('source_relation') }}"
- name: account_id
description: "{{ doc('external_customer_id') }}"
tests:
Expand Down
13 changes: 10 additions & 3 deletions models/stg_google_ads__account_history.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,26 @@ fields as (
)
}}


{{ fivetran_utils.source_relation(
union_schema_variable='google_ads_union_schemas',
union_database_variable='google_ads_union_databases')
}}

from base
),

final as (

select

select
source_relation,
id as account_id,
updated_at,
currency_code,
auto_tagging_enabled,
time_zone,
descriptive_name as account_name,
row_number() over (partition by id order by updated_at desc) = 1 as is_most_recent_record
row_number() over (partition by source_relation, id order by updated_at desc) = 1 as is_most_recent_record
from fields
where coalesce(_fivetran_active, true)
)
Expand Down
11 changes: 9 additions & 2 deletions models/stg_google_ads__account_stats.sql
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,19 @@ fields as (
staging_columns=get_account_stats_columns()
)
}}

{{ fivetran_utils.source_relation(
union_schema_variable='google_ads_union_schemas',
union_database_variable='google_ads_union_databases')
}}

from base
),

final as (

select

select
source_relation,
customer_id as account_id,
date as date_day,
ad_network_type,
Expand Down
13 changes: 10 additions & 3 deletions models/stg_google_ads__ad_group_criterion_history.sql
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,19 @@ fields as (
staging_columns=get_ad_group_criterion_history_columns()
)
}}

{{ fivetran_utils.source_relation(
union_schema_variable='google_ads_union_schemas',
union_database_variable='google_ads_union_databases')
}}

from base
),

final as (

select

select
source_relation,
id as criterion_id,
cast(ad_group_id as {{ dbt.type_string() }}) as ad_group_id,
base_campaign_id,
Expand All @@ -29,7 +36,7 @@ final as (
status,
keyword_match_type,
keyword_text,
row_number() over (partition by id order by updated_at desc) = 1 as is_most_recent_record
row_number() over (partition by source_relation, id order by updated_at desc) = 1 as is_most_recent_record
from fields
where coalesce(_fivetran_active, true)
)
Expand Down
Loading