Adding Pytest Tests #65

callum-mcdata · 2022-07-22T20:14:58Z

Description

This PR adds pytest tests. It does NOT implement them as part of CI. That will take place in a following PR so that people don't have to review one massive PR and its instead 1 massive PR and one tiny PR. Wait....

Test Types Added:

base_metric_types: ensures that all the base metric types are working
compilation: ensures that metrics show up in manifest. More of a canary for something going wrong in dbt-core
expression_metric_types: ensures that expression metrics are working. We test for a simple calculation AND a ratio calculation to ensure div/0 not reintroduced
invalid configs: ensures that known issues or anti-patterns aren't breaking things
metric_options:
- date_grains: checks that the default date grains of day, week, month all work correctly
- dimensions: ensures that single and multi dimensions provided work correctly
- end_date: ensures that the end date functionality works
- multiple_metrics: ensures that multiple metrics work
- start_date: ensures that the start date functionality works
- secondary_calculations:
  - period_over_period: ensures that both period over period options work
  - period_to_date: ensures that all allowed options work for all metric types
  - rolling: ensures that all allowed options work for all metric types

joellabes

I have reviewed five files! I think that is a representative set of the whole, but if there are things you think I should have looked at that got radio silence then let me know.

It's very possible that some of my complaints here come from being unfamiliar with pytest, but the things that are in here feel very fragile to me.

joellabes · 2022-07-26T01:27:36Z

dev-requirements.txt

+dbt-core==1.2.0rc1
+dbt-redshift==1.2.0rc1
+dbt-snowflake==1.2.0rc1
+dbt-bigquery==1.2.0rc1


How will these stay up to date over time?

We'll manually update these based on the version that we want to test. Right now its the RC candidate but we'll update it to v1.2 once it is no longer RC.

joellabes · 2022-07-26T01:35:50Z

integration_tests/models/materialized_models/fact_orders.sql

@@ -1,4 +1,4 @@
 select 
    *
-    ,round(order_total - (order_total/2)) as discount_total
+    ,1 as discount_total


❓ this is weird, shouldn't it be a percentage or something? I just spent a bunch of time digging through the files trying to work out why the average discount was always 1 in your expectation seed.

The thing that worries me here is that if your source data isn't that varied then you don't actually get the benefit of testing the aggregations - what if something has gone horribly wrong and you're actually calculating max/min or something?

Very good point - I was trying to keep it as simple as possible but if it doesn't represent the actual calculation then thats not helpful

Oh actually this is still a holdover from integration testing! These will mainly just be used for local development as we move the CI testing to pytest to test functionality. So the old version with round is retained in the pytest models 👯

joellabes · 2022-07-26T01:36:34Z

integration_tests/seeds/expected/base_average_metric__expected.csv

+2022-01-01,FALSE,1.00000000000000000000
+2022-02-01,FALSE,1.00000000000000000000
+2022-02-01,TRUE,1.00000000000000000000


This isn't testing much of anything (as mentioned above)

This is fixed in the pytest framework - everything within the integration tests folder is a holdover from when we were doing it that way and I'm hesitant to delete anything until we've got it all migrated over.

macros/secondary_calculations/secondary_calculation_period_over_period.sql

joellabes · 2022-07-26T01:43:09Z

tests/functional/base_metric_types/test_base_average_metric.py

+base_average_metric_yml = """
+version: 2 
+models:
+  - name: base_average_metric
+    tests: 
+      - dbt_utils.equality:
+          compare_model: ref('base_average_metric__expected')
+metrics:
+  - name: base_average_metric
+    model: ref('fact_orders')
+    label: Total Discount ($)
+    timestamp: order_date
+    time_grains: [day, week, month]
+    type: average
+    sql: discount_total
+    dimensions:
+      - had_discount
+      - order_country
+"""
+
+# seeds/base_average_metric__expected.csv
+base_average_metric__expected_csv = """
+date_month,base_average_metric
+2022-01-01,1.00000000000000000000
+2022-02-01,1.3333333333333333
+""".lstrip()


Okkkkkk so am I understanding this correctly:

These lines of hardcoded logic are the same as the ones that are up above as proper .sql/.yml/.csv files? So we're duplicating everything? How do they stay in sync? And why do we need the former if it's all just turning into string constants down here?

idk if this is pythonic or not, but I would strongly prefer something like

#https://stackoverflow.com/a/49564464/14007029 from pathlib import Path base_average_metric_sql = Path('../models/base_average_metric.sql').read_text(); base_average_metric_yml = Path('../models/base_average_metric.yml').read_text();

Such that we don't have to redefine everything from scratch in two places 🤢

Not quite. There are static datasets/definitions contained in fixtures.py and those are then imported. The reason that I keep some components outside of it (like the definition of the metric) is so we can understand what metric is being tested while just looking at the single file.

If we're trying to be more pythonic then I should move that definition into the fixtures file.

OK! I think I was caught up on thinking that the stuff in the integration tests project was this stuff's twin. Carry on!

joellabes · 2022-07-26T01:45:54Z

tests/functional/base_metric_types/test_base_average_metric.py

+
+        # test tests
+        results = run_dbt(["test"]) # expect passing test
+        assert len(results) == 1


What is the results object here? Is it an array of passing tests? I would sorta prefer that it returned all results, and a pass/fail, and then we could do whatever the python version of results.all(result => result.statuscode == 'PASS') is

Because otherwise anytime we add new tests, we have to go through and update the expected number of passes. (or, worse, we add a test which fails, we forget to update the expected count, and it sits there failing forever).

@dbeatty10 would have more context around this area. I suspect it returns the results of each command and then we confirm that it matches what we expect. So if we added a new test we'd want to assert that only two tests were running

The example containing hard-coded numbers comes from here.

dbt_metrics doesn't need to do it that way if it doesn't want to. The important part is to verify that it works as expected for all the possible edge cases. And that it fails in the ways expected with bad input.

joellabes · 2022-07-26T01:48:30Z

tests/functional/base_metric_types/test_base_average_metric.py

+        result_statuses = sorted(r.status for r in results)
+        assert result_statuses == ["pass"]


OK maybe that's what this line is? but then I don't know what the above stuff is, and I still feel very uncomfortable about hardcoded numbers of things in test files

(Deleted my original comment here and moved it above)

callum-mcdata added 22 commits July 20, 2022 16:34

bringing in line with main

4fa0437

adding pytest

efae9cc

fixing pytest.ini

50e859a

adding tests

0a96e21

adding base average

899d535

adding base count

73bb2c7

adding base max

409af77

adding base min

ca41dd4

adding base sum

ec0f1cd

adding base count distinct

eca67ce

commits

f6875d1

adding grain testing

4408160

adding mroe tests

a7bcd2c

more tests

9cd964b

more tests

9a1b525

more tests

7a727df

end date tests

a97f916

BRING ON THE TESTS

0aa8c79

more tests and secondary calc fix

32a1d0d

even more tests

32178e0

fixing tests

c5129e1

more tests

381b06d

callum-mcdata added the enhancement New feature or request label Jul 22, 2022

cla-bot bot added the cla:yes The CLA has been signed label Jul 22, 2022

callum-mcdata linked an issue Jul 22, 2022 that may be closed by this pull request

Adopting Pytest #45

Closed

callum-mcdata requested review from joellabes and dbeatty10 July 22, 2022 20:17

callum-mcdata self-assigned this Jul 25, 2022

joellabes requested changes Jul 26, 2022

View reviewed changes

pytest additions

562712f

callum-mcdata added 14 commits July 28, 2022 19:55

adding ci.env

5eeecdb

trying something new

338332e

removing prints

be26a45

fixing env variables

187f6b2

adding default

bba7873

re-adding print

9cb07da

removing environment override

be50f37

hard coding dbt_target

8223551

trying more

c369ac9

lets test some stuff limited

c836883

hard coding doesn't work appt

af7e995

I don't know what I changed

49bcc7a

goodbye hardcoded dbt_project

5de1eae

removing default and comments

8557c2d

This was referenced Jul 29, 2022

Adding pytest #60

Closed

Two New Datasets #35

Closed

Add more integration tests #26

Closed

callum-mcdata added 10 commits July 29, 2022 09:21

adding snowflake

3727855

adding snowflake dbt_target

7e25fff

cleaning out ci.env

6e943f3

removing ci.env and adding redshift

2550adb

adding env override back in

7904b48

adding bigquery

2e6dc64

adding bigquery test project

e582c04

adding bigquery seed ymls

49c5230

changing to table materialization

c0d8378

changing to table materialization for BQ

4918f02

callum-mcdata merged commit 8a7693d into main Jul 29, 2022

callum-mcdata deleted the clean_adding_pytest branch July 29, 2022 20:06

jtcohen6 mentioned this pull request Sep 21, 2022

Functional tests dbt-labs/dbt-external-tables#162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Pytest Tests #65

Adding Pytest Tests #65

callum-mcdata commented Jul 22, 2022

joellabes left a comment

joellabes Jul 26, 2022

callum-mcdata Jul 26, 2022

joellabes Jul 26, 2022

callum-mcdata Jul 26, 2022

callum-mcdata Jul 26, 2022

joellabes Jul 26, 2022

callum-mcdata Jul 26, 2022

joellabes Jul 26, 2022

callum-mcdata Jul 26, 2022

joellabes Jul 29, 2022

joellabes Jul 26, 2022

joellabes Jul 26, 2022

callum-mcdata Jul 26, 2022

dbeatty10 Jul 26, 2022

joellabes Jul 26, 2022

dbeatty10 Jul 26, 2022 •

edited

Loading

		result_statuses = sorted(r.status for r in results)
		assert result_statuses == ["pass"]

Adding Pytest Tests #65

Adding Pytest Tests #65

Conversation

callum-mcdata commented Jul 22, 2022

Description

Test Types Added:

joellabes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbeatty10 Jul 26, 2022 • edited Loading

Choose a reason for hiding this comment

dbeatty10 Jul 26, 2022 •

edited

Loading