Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle schema pattern on BQ #275

Merged
merged 3 commits into from
Sep 14, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 46 additions & 8 deletions macros/sql/get_tables_by_pattern_sql.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

{% macro default__get_tables_by_pattern_sql(schema_pattern, table_pattern, exclude='', database=target.database) %}

select distinct
select distinct
table_schema as "table_schema", table_name as "table_name"
from {{database}}.information_schema.tables
where table_schema ilike '{{ schema_pattern }}'
Expand All @@ -16,13 +16,51 @@


{% macro bigquery__get_tables_by_pattern_sql(schema_pattern, table_pattern, exclude='', database=target.database) %}

select distinct
table_schema, table_name

from {{adapter.quote(database)}}.{{schema}}.INFORMATION_SCHEMA.TABLES
where table_schema = '{{schema_pattern}}'
and lower(table_name) like lower ('{{table_pattern}}')
and lower(table_name) not like lower ('{{exclude}}')
{% if '%' in schema_pattern %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, _ is a wildcard pattern (docs)
Screen Shot 2020-09-04 at 1 58 23 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we actually introduced a slight regression by switching to using like operators instead of = operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow how this is a regression?

Question is, should we amend that line to be

{% if '%' in schema_pattern or '_' in schema_pattern %}

knowing we're going to get a lot more false positives...

Copy link
Contributor Author

@clrcrl clrcrl Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow how this is a regression?

Let's say we have schemas named a_c and abc.

I think previously for get_relations_by_prefix we used a strict = for the schema name.

where schema_name = 'a_c'

This would return only a_c.

Now, we pass it to a like operator (i.e. here and here)

where schema_name like 'a_c'

This would return both a_c and abc.

It's a teeny tiny regression for anyone that uses get_relations_by_prefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. To match only a_c, the user should pass the pattern as a\\_c. I still think it's okay to ship this in a patch release, maybe link to the BQ doc in the readme.

Copy link
Contributor Author

@clrcrl clrcrl Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's just ship this as-is, and if someone notices the regression we can fix it heh

{% set schemata=dbt_utils._bigquery__get_matching_schemata(schema_pattern, database) %}
{% else %}
{% set schemata=[schema_pattern] %}
{% endif %}

{% set sql %}
{% for schema in schemata %}
select distinct
table_schema, table_name

from {{ adapter.quote(database) }}.{{ schema }}.INFORMATION_SCHEMA.TABLES
where lower(table_name) like lower ('{{ table_pattern }}')
and lower(table_name) not like lower ('{{ exclude }}')

{% if not loop.last %} union all {% endif %}

{% endfor %}
{% endset %}

{{ return(sql) }}

{% endmacro %}


{% macro _bigquery__get_matching_schemata(schema_pattern, database) %}
{% if execute %}

{% set sql %}
select schema_name from {{ adapter.quote(database) }}.INFORMATION_SCHEMA.SCHEMATA
where lower(schema_name) like lower('{{ schema_pattern }}')
{% endset %}

{% set results=run_query(sql) %}

{% set schemata=results.columns['schema_name'].values() %}

{{ return(schemata) }}

{% else %}

{{ return([]) }}

{% endif %}


{% endmacro %}