diff --git a/CHANGELOG.md b/CHANGELOG.md index 5bb4df53..366a4964 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,14 +1,15 @@ - # dbt-utils v0.8.3 ## New features - A macro for deduplicating data ([#335](https://github.com/dbt-labs/dbt-utils/issues/335), [#512](https://github.com/dbt-labs/dbt-utils/pull/512)) +## Quality of life +- Updated references to 'schema test' in project file structure and documentation referred to in [#485](https://github.com/dbt-labs/dbt-utils/issues/485) + # dbt-utils v0.8.2 ## Fixes - Fix union_relations error from [#473](https://github.com/dbt-labs/dbt-utils/pull/473) when no include/exclude parameters are provided ([#505](https://github.com/dbt-labs/dbt-utils/issues/505), [#509](https://github.com/dbt-labs/dbt-utils/pull/509)) # dbt-utils v0.8.1 - ## New features - A cross-database implementation of `any_value()` ([#497](https://github.com/dbt-labs/dbt-utils/issues/497), [#501](https://github.com/dbt-labs/dbt-utils/pull/501)) - A cross-database implementation of `bool_or()` ([#504](https://github.com/dbt-labs/dbt-utils/pull/504)) @@ -92,12 +93,12 @@ ## Features -- Add `not_null_proportion` schema test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ([#411](https://github.com/dbt-labs/dbt-utils/pull/411)) +- Add `not_null_proportion` generic test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ([#411](https://github.com/dbt-labs/dbt-utils/pull/411)) ## Under the hood - Allow user to provide any case type when defining the `exclude` argument in `dbt_utils.star()` ([#403](https://github.com/dbt-labs/dbt-utils/pull/403)) -- Log whole row instead of just column name in 'accepted_range' schema test to allow better visibility into failures ([#413](https://github.com/dbt-labs/dbt-utils/pull/413)) +- Log whole row instead of just column name in 'accepted_range' generic test to allow better visibility into failures ([#413](https://github.com/dbt-labs/dbt-utils/pull/413)) - Use column name to group in 'get_column_values ' to allow better cross db functionality ([#407](https://github.com/dbt-labs/dbt-utils/pull/407)) # dbt-utils v0.7.1 @@ -154,7 +155,7 @@ If you were relying on the position to match up your optional arguments, this ma ## Features * Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/)) * Add `slugify` macro, and use it in the pivot macro. :rotating_light: This macro uses the `re` module, which is only available in dbt v0.19.0+. As a result, this feature introduces a breaking change. ([#314](https://github.com/fishtown-analytics/dbt-utils/pull/314)) -* Add `not_null_proportion` schema test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values +* Add `not_null_proportion` generic test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ## Under the hood * Update the default implementation of concat macro to use `||` operator ([#373](https://github.com/fishtown-analytics/dbt-utils/pull/314) from [@ChristopheDuong](https://github.com/ChristopheDuong)). Note this may be a breaking change for adapters that support `concat()` but not `||`, such as Apache Spark. @@ -165,18 +166,18 @@ If you were relying on the position to match up your optional arguments, this ma ## Fixes -- make `sequential_values` schema test use `dbt_utils.type_timestamp()` to allow for compatibility with db's without timestamp data type. [#376](https://github.com/fishtown-analytics/dbt-utils/pull/376) from [@swanderz](https://github.com/swanderz) +- make `sequential_values` generic test use `dbt_utils.type_timestamp()` to allow for compatibility with db's without timestamp data type. [#376](https://github.com/fishtown-analytics/dbt-utils/pull/376) from [@swanderz](https://github.com/swanderz) # dbt-utils v0.6.5 ## Features * Add new `accepted_range` test ([#276](https://github.com/fishtown-analytics/dbt-utils/pull/276) [@joellabes](https://github.com/joellabes)) * Make `expression_is_true` work as a column test (code originally in [#226](https://github.com/fishtown-analytics/dbt-utils/pull/226/) from [@elliottohara](https://github.com/elliottohara), merged via [#313](https://github.com/fishtown-analytics/dbt-utils/pull/313/)) -* Add new schema test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton)) +* Add new generic test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton)) * Support a new argument, `zero_length_range_allowed` in the `mutually_exclusive_ranges` test ([#307](https://github.com/fishtown-analytics/dbt-utils/pull/307) [@zemekeneng](https://github.com/zemekeneng)) -* Add new schema test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt)) +* Add new generic test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt)) * Support `quarter` in the `postgres__last_day` macro ([#333](https://github.com/fishtown-analytics/dbt-utils/pull/333/files) [@seunghanhong](https://github.com/seunghanhong)) * Add new argument, `unit`, to `haversine_distance` ([#340](https://github.com/fishtown-analytics/dbt-utils/pull/340) [@bastienboutonnet](https://github.com/bastienboutonnet)) -* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/)) +* Add new generic test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/)) ## Fixes * Handle booleans gracefully in the unpivot macro ([#305](https://github.com/fishtown-analytics/dbt-utils/pull/305) [@avishalom](https://github.com/avishalom)) @@ -250,7 +251,7 @@ enabling users of community-supported database plugins to add or override macro specific to their database ([#267](https://github.com/fishtown-analytics/dbt-utils/pull/267)) * Use `add_ephemeral_prefix` instead of hard-coding a string literal, to support database adapters that use different prefixes ([#267](https://github.com/fishtown-analytics/dbt-utils/pull/267)) -* Implement a quote_columns argument in the unique_combination_of_columns schema test ([#270](https://github.com/fishtown-analytics/dbt-utils/pull/270) [@JoshuaHuntley](https://github.com/JoshuaHuntley)) +* Implement a quote_columns argument in the unique_combination_of_columns generic test ([#270](https://github.com/fishtown-analytics/dbt-utils/pull/270) [@JoshuaHuntley](https://github.com/JoshuaHuntley)) ## Quality of life * Remove deprecated macros `get_tables_by_prefix` and `union_tables` ([#268](https://github.com/fishtown-analytics/dbt-utils/pull/268)) diff --git a/README.md b/README.md index 80c6cbbb..c9caa8cc 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this ---- ## Contents -**[Schema tests](#schema-tests)** +**[Generic tests](#generic-tests)** - [equal_rowcount](#equal_rowcount-source) - [fewer_rows_than](#fewer_rows_than-source) - [equality](#equality-source) @@ -69,9 +69,9 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this - [insert_by_period](#insert_by_period-source) ---- -### Schema Tests -#### equal_rowcount ([source](macros/schema_tests/equal_rowcount.sql)) -This schema test asserts the that two relations have the same number of rows. +### Generic Tests +#### equal_rowcount ([source](macros/generic_tests/equal_rowcount.sql)) +Asserts that two relations have the same number of rows. **Usage:** ```yaml @@ -85,8 +85,8 @@ models: ``` -#### fewer_rows_than ([source](macros/schema_tests/fewer_rows_than.sql)) -This schema test asserts that this model has fewer rows than the referenced model. +#### fewer_rows_than ([source](macros/generic_tests/fewer_rows_than.sql)) +Asserts that the respective model has fewer rows than the model being compared. Usage: ```yaml @@ -99,8 +99,8 @@ models: compare_model: ref('other_table_name') ``` -#### equality ([source](macros/schema_tests/equality.sql)) -This schema test asserts the equality of two relations. Optionally specify a subset of columns to compare. +#### equality ([source](macros/generic_tests/equality.sql)) +Asserts the equality of two relations. Optionally specify a subset of columns to compare. **Usage:** ```yaml @@ -116,8 +116,13 @@ models: - second_column ``` -#### expression_is_true ([source](macros/schema_tests/expression_is_true.sql)) -This schema test asserts that a valid sql expression is true for all records. This is useful when checking integrity across columns, for example, that a total is equal to the sum of its parts, or that at least one column is true. +#### expression_is_true ([source](macros/generic_tests/expression_is_true.sql)) +Asserts that a valid SQL expression is true for all records. This is useful when checking integrity across columns. +Examples: + +- Verify an outcome based on the application of basic alegbraic operations between columns. +- Verify the length of a column. +- Verify the truth value of a column. **Usage:** ```yaml @@ -164,8 +169,8 @@ models: condition: col_a = 1 ``` -#### recency ([source](macros/schema_tests/recency.sql)) -This schema test asserts that there is data in the referenced model at least as recent as the defined interval prior to the current timestamp. +#### recency ([source](macros/generic_tests/recency.sql)) +Asserts that a timestamp column in the reference model contains data that is at least as recent as the defined date interval. **Usage:** ```yaml @@ -180,8 +185,8 @@ models: interval: 1 ``` -#### at_least_one ([source](macros/schema_tests/at_least_one.sql)) -This schema test asserts if column has at least one value. +#### at_least_one ([source](macros/generic_tests/at_least_one.sql)) +Asserts that a column has at least one value. **Usage:** ```yaml @@ -195,8 +200,8 @@ models: - dbt_utils.at_least_one ``` -#### not_constant ([source](macros/schema_tests/not_constant.sql)) -This schema test asserts if column does not have same value in all rows. +#### not_constant ([source](macros/generic_tests/not_constant.sql)) +Asserts that a column does not have the same value in all rows. **Usage:** ```yaml @@ -210,8 +215,8 @@ models: - dbt_utils.not_constant ``` -#### cardinality_equality ([source](macros/schema_tests/cardinality_equality.sql)) -This schema test asserts if values in a given column have exactly the same cardinality as values from a different column in a different model. +#### cardinality_equality ([source](macros/generic_tests/cardinality_equality.sql)) +Asserts that values in a given column have exactly the same cardinality as values from a different column in a different model. **Usage:** ```yaml @@ -227,8 +232,8 @@ models: to: ref('other_model_name') ``` -#### unique_where ([source](macros/schema_tests/test_unique_where.sql)) -This test validates that there are no duplicate values present in a field for a subset of rows by specifying a `where` clause. +#### unique_where ([source](macros/generic_tests/test_unique_where.sql)) +Asserts that there are no duplicate values present in a field for a subset of rows by specifying a `where` clause. *Warning*: This test is no longer supported. Starting in dbt v0.20.0, the built-in `unique` test supports a `where` config. [See the dbt docs for more details](https://docs.getdbt.com/reference/resource-configs/where). @@ -245,8 +250,8 @@ models: where: "_deleted = false" ``` -#### not_null_where ([source](macros/schema_tests/test_not_null_where.sql)) -This test validates that there are no null values present in a column for a subset of rows by specifying a `where` clause. +#### not_null_where ([source](macros/generic_tests/test_not_null_where.sql)) +Asserts that there are no null values present in a column for a subset of rows by specifying a `where` clause. *Warning*: This test is no longer supported. Starting in dbt v0.20.0, the built-in `not_null` test supports a `where` config. [See the dbt docs for more details](https://docs.getdbt.com/reference/resource-configs/where). @@ -263,8 +268,8 @@ models: where: "_deleted = false" ``` -#### not_null_proportion ([source](macros/schema_tests/not_null_proportion.sql)) -This test validates that the proportion of non-null values present in a column is between a specified range [`at_least`, `at_most`] where `at_most` is an optional argument (default: `1.0`). +#### not_null_proportion ([source](macros/generic_tests/not_null_proportion.sql)) +Asserts that the proportion of non-null values present in a column is between a specified range [`at_least`, `at_most`] where `at_most` is an optional argument (default: `1.0`). **Usage:** ```yaml @@ -279,8 +284,8 @@ models: at_least: 0.95 ``` -#### not_accepted_values ([source](macros/schema_tests/not_accepted_values.sql)) -This test validates that there are no rows that match the given values. +#### not_accepted_values ([source](macros/generic_tests/not_accepted_values.sql)) +Asserts that there are no rows that match the given values. Usage: ```yaml @@ -295,8 +300,8 @@ models: values: ['Barcelona', 'New York'] ``` -#### relationships_where ([source](macros/schema_tests/relationships_where.sql)) -This test validates the referential integrity between two relations (same as the core relationships schema test) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc. +#### relationships_where ([source](macros/generic_tests/relationships_where.sql)) +Asserts the referential integrity between two relations (same as the core relationships assertions) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc. **Usage:** ```yaml @@ -314,9 +319,9 @@ models: to_condition: created_date >= '2020-01-01' ``` -#### mutually_exclusive_ranges ([source](macros/schema_tests/mutually_exclusive_ranges.sql)) -This test confirms that for a given lower_bound_column and upper_bound_column, -the ranges of between the lower and upper bounds do not overlap with the ranges +#### mutually_exclusive_ranges ([source](macros/generic_tests/mutually_exclusive_ranges.sql)) +Asserts that for a given lower_bound_column and upper_bound_column, +the ranges between the lower and upper bounds do not overlap with the ranges of another row. **Usage:** @@ -383,6 +388,7 @@ models: Additional `gaps` and `zero_length_range_allowed` examples **Understanding the `gaps` argument:** + Here are a number of examples for each allowed `gaps` argument. * `gaps: not_allowed`: The upper bound of one record must be the lower bound of the next record. @@ -431,7 +437,7 @@ models: -#### sequential_values ([source](macros/schema_tests/sequential_values.sql)) +#### sequential_values ([source](macros/generic_tests/sequential_values.sql)) This test confirms that a column contains sequential values. It can be used for both numeric values, and datetime values, as follows: ```yml @@ -459,8 +465,8 @@ seeds: * `interval` (default=1): The gap between two sequential values * `datepart` (default=None): Used when the gaps are a unit of time. If omitted, the test will check for a numeric gap. -#### unique_combination_of_columns ([source](macros/schema_tests/unique_combination_of_columns.sql)) -This test confirms that the combination of columns is unique. For example, the +#### unique_combination_of_columns ([source](macros/generic_tests/unique_combination_of_columns.sql)) +Asserts that the combination of columns is unique. For example, the combination of month and product is unique, however neither column is unique in isolation. @@ -495,8 +501,8 @@ An optional `quote_columns` argument (`default=false`) can also be used if a col ``` -#### accepted_range ([source](macros/schema_tests/accepted_range.sql)) -This test checks that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only. +#### accepted_range ([source](macros/generic_tests/accepted_range.sql)) +Asserts that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only. In addition to comparisons to a scalar value, you can also compare to another column's values. Any data type that supports the `>` or `<` operators can be compared, so you could also run tests like checking that all order dates are in the past. diff --git a/integration_tests/README.md b/integration_tests/README.md index 243af411..4f9f0131 100644 --- a/integration_tests/README.md +++ b/integration_tests/README.md @@ -26,14 +26,14 @@ Where possible, targets are being run in docker containers (this works for Postg ### Creating a new integration test -This directory contains an example dbt project which tests the macros in the `dbt-utils` package. An integration test typically involves making 1) a new seed file 2) a new model file 3) a schema test. +This directory contains an example dbt project which tests the macros in the `dbt-utils` package. An integration test typically involves making 1) a new seed file 2) a new model file 3) a generic test to assert anticipated behaviour. For an example integration tests, check out the tests for the `get_url_parameter` macro: 1. [Macro definition](https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/web/get_url_parameter.sql) 2. [Seed file with fake data](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/data/web/data_urls.csv) 3. [Model to test the macro](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/test_urls.sql) -4. [A schema test to assert the macro works as expected](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/schema.yml#L2) +4. [A generic test to assert the macro works as expected](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/schema.yml#L2) Once you've added all of these files, you should be able to run: diff --git a/integration_tests/models/datetime/test_date_spine.sql b/integration_tests/models/datetime/test_date_spine.sql index 93cd07f1..fa4ae52b 100644 --- a/integration_tests/models/datetime/test_date_spine.sql +++ b/integration_tests/models/datetime/test_date_spine.sql @@ -1,6 +1,6 @@ -- snowflake doesn't like this as a view because the `generate_series` --- call creates a CTE called `unioned`, as does the `equality` schema test. +-- call creates a CTE called `unioned`, as does the `equality` generic test. -- Ideally, Snowflake would be smart enough to know that these CTE names are -- different, as they live in different relations. TODO: use a less common cte name diff --git a/integration_tests/models/schema_tests/schema.yml b/integration_tests/models/generic_tests/schema.yml similarity index 100% rename from integration_tests/models/schema_tests/schema.yml rename to integration_tests/models/generic_tests/schema.yml diff --git a/integration_tests/models/schema_tests/test_equal_column_subset.sql b/integration_tests/models/generic_tests/test_equal_column_subset.sql similarity index 100% rename from integration_tests/models/schema_tests/test_equal_column_subset.sql rename to integration_tests/models/generic_tests/test_equal_column_subset.sql diff --git a/integration_tests/models/schema_tests/test_equal_rowcount.sql b/integration_tests/models/generic_tests/test_equal_rowcount.sql similarity index 100% rename from integration_tests/models/schema_tests/test_equal_rowcount.sql rename to integration_tests/models/generic_tests/test_equal_rowcount.sql diff --git a/integration_tests/models/schema_tests/test_fewer_rows_than.sql b/integration_tests/models/generic_tests/test_fewer_rows_than.sql similarity index 100% rename from integration_tests/models/schema_tests/test_fewer_rows_than.sql rename to integration_tests/models/generic_tests/test_fewer_rows_than.sql diff --git a/integration_tests/models/schema_tests/test_recency.sql b/integration_tests/models/generic_tests/test_recency.sql similarity index 100% rename from integration_tests/models/schema_tests/test_recency.sql rename to integration_tests/models/generic_tests/test_recency.sql diff --git a/integration_tests/models/sql/test_generate_series.sql b/integration_tests/models/sql/test_generate_series.sql index a943cf6c..11370b7b 100644 --- a/integration_tests/models/sql/test_generate_series.sql +++ b/integration_tests/models/sql/test_generate_series.sql @@ -1,6 +1,6 @@ -- snowflake doesn't like this as a view because the `generate_series` --- call creates a CTE called `unioned`, as does the `equality` schema test. +-- call creates a CTE called `unioned`, as does the `equality` generic test. -- Ideally, Snowflake would be smart enough to know that these CTE names are -- different, as they live in different relations. TODO: use a less common cte name diff --git a/macros/schema_tests/accepted_range.sql b/macros/generic_tests/accepted_range.sql similarity index 100% rename from macros/schema_tests/accepted_range.sql rename to macros/generic_tests/accepted_range.sql diff --git a/macros/schema_tests/at_least_one.sql b/macros/generic_tests/at_least_one.sql similarity index 100% rename from macros/schema_tests/at_least_one.sql rename to macros/generic_tests/at_least_one.sql diff --git a/macros/schema_tests/cardinality_equality.sql b/macros/generic_tests/cardinality_equality.sql similarity index 100% rename from macros/schema_tests/cardinality_equality.sql rename to macros/generic_tests/cardinality_equality.sql diff --git a/macros/schema_tests/equal_rowcount.sql b/macros/generic_tests/equal_rowcount.sql similarity index 100% rename from macros/schema_tests/equal_rowcount.sql rename to macros/generic_tests/equal_rowcount.sql diff --git a/macros/schema_tests/equality.sql b/macros/generic_tests/equality.sql similarity index 100% rename from macros/schema_tests/equality.sql rename to macros/generic_tests/equality.sql diff --git a/macros/schema_tests/expression_is_true.sql b/macros/generic_tests/expression_is_true.sql similarity index 100% rename from macros/schema_tests/expression_is_true.sql rename to macros/generic_tests/expression_is_true.sql diff --git a/macros/schema_tests/fewer_rows_than.sql b/macros/generic_tests/fewer_rows_than.sql similarity index 100% rename from macros/schema_tests/fewer_rows_than.sql rename to macros/generic_tests/fewer_rows_than.sql diff --git a/macros/schema_tests/mutually_exclusive_ranges.sql b/macros/generic_tests/mutually_exclusive_ranges.sql similarity index 100% rename from macros/schema_tests/mutually_exclusive_ranges.sql rename to macros/generic_tests/mutually_exclusive_ranges.sql diff --git a/macros/schema_tests/not_accepted_values.sql b/macros/generic_tests/not_accepted_values.sql similarity index 100% rename from macros/schema_tests/not_accepted_values.sql rename to macros/generic_tests/not_accepted_values.sql diff --git a/macros/schema_tests/not_constant.sql b/macros/generic_tests/not_constant.sql similarity index 100% rename from macros/schema_tests/not_constant.sql rename to macros/generic_tests/not_constant.sql diff --git a/macros/schema_tests/not_null_proportion.sql b/macros/generic_tests/not_null_proportion.sql similarity index 100% rename from macros/schema_tests/not_null_proportion.sql rename to macros/generic_tests/not_null_proportion.sql diff --git a/macros/schema_tests/recency.sql b/macros/generic_tests/recency.sql similarity index 100% rename from macros/schema_tests/recency.sql rename to macros/generic_tests/recency.sql diff --git a/macros/schema_tests/relationships_where.sql b/macros/generic_tests/relationships_where.sql similarity index 100% rename from macros/schema_tests/relationships_where.sql rename to macros/generic_tests/relationships_where.sql diff --git a/macros/schema_tests/sequential_values.sql b/macros/generic_tests/sequential_values.sql similarity index 100% rename from macros/schema_tests/sequential_values.sql rename to macros/generic_tests/sequential_values.sql diff --git a/macros/schema_tests/test_not_null_where.sql b/macros/generic_tests/test_not_null_where.sql similarity index 100% rename from macros/schema_tests/test_not_null_where.sql rename to macros/generic_tests/test_not_null_where.sql diff --git a/macros/schema_tests/test_unique_where.sql b/macros/generic_tests/test_unique_where.sql similarity index 100% rename from macros/schema_tests/test_unique_where.sql rename to macros/generic_tests/test_unique_where.sql diff --git a/macros/schema_tests/unique_combination_of_columns.sql b/macros/generic_tests/unique_combination_of_columns.sql similarity index 100% rename from macros/schema_tests/unique_combination_of_columns.sql rename to macros/generic_tests/unique_combination_of_columns.sql