-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigquery model-level unique tests on concat() columns #35
Comments
Seems to be a Bigquery thing:
Therefore the compiled test query for Bigquery adapters should look like this to work: select
count(*) as failures
from (
select
concat(id, field, changed_at) as unique_field,
count(*) as n_records
from base_test
where concat(id, field, changed_at) is not null
group by unique_field
having count(*) > 1
) dbt_internal_test |
I've also experienced a number of tests start failing with the same message: After upgrading to version 20.0.2, these errors started appearing. There was no issue when we were running on 20.0.1. We're also running on BigQuery. It seems to be affecting generic I'm going to quickly check the compiled tests code for both versions and see what has changed. |
@Dimi727 @DigUpTheHatchet Thanks for opening and commenting. This one is on me: dbt-labs/dbt-core#3812. I added an alias to the column produced by the I have a few comments:
Per @artamoshin's detailed research in dbt-labs/dbt-core#3905 (comment), the vast majority of databases want us to One way to fix this for sure for sure would be to reimplement the {% macro bigquery__test_unique(model, column_name) %}
with dbt_test__target as (
select {{ column_name }} as unique_field
from {{ model }}
where {{ column_name }} is not null
)
select
unique_field,
count(*) as n_records
from dbt_test__target
group by unique_field
having count(*) > 1
{% endmacro %} That CTE-based approach would also have the nice side-benefit of fixing another BigQuery-specific issue (https://github.com/dbt-labs/dbt/issues/3489), when the column name and table name match. Personally, I'd love to make the CTE-based query the default, but I know that some databases don't support CTEs nested inside subqueries—and dbt now wraps generic test queries in subqueries when executing them. We can't win them all, unfortunately. Given that other databases by and large support the existing syntax, I don't think it's unreasonable to make this a BigQuery-specific change. Would either of you be interested in contributing:
|
@jtcohen6 I've been wanting to make a first contribution so I'll give this a go over the weekend. I can't promise how successful I will be but I will try! |
Well :-) I think its kind of redundant if there is no use for this column except an dbt-internal unique test or incremental-model wihch needs an unique field (which is not always used). |
Resolved by #10 |
Hi everyone.
I did not find this issue anywhere yet which seems odd to me and I dont know if it is on my side.
Using Bigquery and
dbt 0.20.2
.Having defined a test for a model like this:
Which was working before
dbt 0.20.0
. Now I get an error:I checked in projects where I use
dbt 0.19.x
and the test above compiled to this:Now it compiles to this:
which Bigquery seems not to like. Using the query above and removing the alias
unique_field
works for Bigquery.With Postgres and MySQL both of the above queries run. Did not test Snowflake and Redshift.
The text was updated successfully, but these errors were encountered: