Skip to content

Commit

Permalink
Add order_by argument to get_column_values (#349)
Browse files Browse the repository at this point in the history
Co-authored-by: Claus Herther <[email protected]>
  • Loading branch information
Claire Carroll and clausherther authored Mar 31, 2021
1 parent b3a7b0e commit c304881
Show file tree
Hide file tree
Showing 4 changed files with 77 additions and 28 deletions.
40 changes: 38 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@
# dbt-utils v0.7.0 (unreleased)

## :rotating_light: Breaking changes


### get_column_values
The order of (optional) arguments has changed in the `get_column_values` macro:
Before:
```jinja
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}
...
{% endmacro %}
```

After:
```jinja
{% macro get_column_values(table, column, max_records=none, default=none) -%}
...
{% endmacro %}
```
If you were relying on the position to match up your optional arguments, this may be a breaking change — in general, we recommend that you explicitly declare any optional arguments (if not all of your arguments!)
```
-- before: the `50` will now be passed through as the `order_by` argument
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
50
) %}
-- after
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
max_records=50
) %}
```

* Added optional `where` clause in `unique_combination_of_columns` test macro [#295](https://github.com/fishtown-analytics/dbt-utils/pull/295) [findinpath](https://github.com/findinpath)

## Features
Expand All @@ -7,8 +43,8 @@
* Allow individual columns in star macro to be aliased (code originally in [#230](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@elliottohara](https://github.com/elliottohara), merged via [#245])
* Allow star macro to be case insensitive, and improve docs (code originally in [#281](https://github.com/fishtown-analytics/dbt-utils/pull/230/) via [@mdimercurio](https://github.com/mdimercurio), merged via [#348](https://github.com/fishtown-analytics/dbt-utils/pull/348/))
* Add new schema test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton))
* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343])

* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/))
* Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/))

## Fixes
* Handle booleans gracefully in the unpivot macro ([#305](https://github.com/fishtown-analytics/dbt-utils/pull/305) [@avishalom](https://github.com/avishalom))
Expand Down
39 changes: 32 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -569,24 +569,49 @@ group by 1
```

#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation).
It takes an options `default` argument for compiling when the relation does not already exist.
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
- `max_records` (optional, default=`none`): The maximum number of column values you want to return
- `default` (optional, default=`[]`): The results this macro should return if the relation has not yet been created (and therefore has no column values).

Usage:
```
-- Returns a list of the top 50 states in the `users` table
{% set states = dbt_utils.get_column_values(table=ref('users'), column='state', max_records=50, default=[]) %}
```sql
-- Returns a list of the payment_methods in the stg_payments model_
{% set payment_methods = dbt_utils.get_column_values(table=ref('stg_payments'), column='payment_method') %}
{% for state in states %}
{% for payment_method in payment_methods %}
...
{% endfor %}
...
```

```sql
-- Returns the list sorted alphabetically
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='payment_method'
) %}
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
```sql
-- Returns the list sorted my most recently observed
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='max(created_at) desc',
max_records=50,
default=['bank_transfer', 'coupon', 'credit_card']
%}
...
```

#### get_relations_by_prefix
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.

Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sql/test_get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default = []) %}
{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default=[], order_by="field") %}


{% if target.type == 'snowflake' %}
Expand Down
24 changes: 6 additions & 18 deletions macros/sql/get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,26 +1,13 @@
{#
This macro fetches the unique values for `column` in the table `table`

Arguments:
table: A model `ref`, or a schema.table string for the table to query (Required)
column: The column to query for unique values
max_records: If provided, the maximum number of unique records to return (default: none)

Returns:
A list of distinct values for the specified columns
#}

{% macro get_column_values(table, column, max_records=none, default=none) -%}
{{ return(adapter.dispatch('get_column_values', packages = dbt_utils._get_utils_namespaces())(table, column, max_records, default)) }}
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}
{{ return(adapter.dispatch('get_column_values', packages = dbt_utils._get_utils_namespaces())(table, column, order_by, max_records, default)) }}
{% endmacro %}

{% macro default__get_column_values(table, column, max_records=none, default=none) -%}
{% macro default__get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
{{ return('') }}
{% endif %}
{#-- #}

{%- set target_relation = adapter.get_relation(database=table.database,
schema=table.schema,
Expand All @@ -40,12 +27,13 @@ Returns:

{%- else -%}


select
{{ column }} as value

from {{ target_relation }}
group by 1
order by count(*) desc
order by {{ order_by }}

{% if max_records is not none %}
limit {{ max_records }}
Expand Down

0 comments on commit c304881

Please sign in to comment.