-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish PR 9029: clickhouse normalization #9072
Conversation
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
@@ -699,6 +699,7 @@ def generate_scd_type_2_model(self, from_table: str, column_names: Dict[str, Tup | |||
{{ sql_table_comment }} | |||
), | |||
{{ '{% endif %}' }} | |||
{{ '{%- if var("destination") == "clickhouse" %}' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this if
as jinja (runtime) ? instead of doing a if in python and change the generated SQL code when normalization runs (compile)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the code block inside the if have some jinja variables, and I don't think there is an option to deep jinja variable attribution. I thought this would be more easy to read.
file.sql
{{ my_custom_code }}
run.py
s = "{{ another_jinja }}"
a = "select * from table"
render(my_custom_code=s, another_jinja=a)
This will break
What do you think @ChristopheDuong ? I can move the logic to a python if clause too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's move it to python and avoid adding a dbt environment variable for this in the dbt_project.yml
Here is a PR to split the jinja templating in multiple rendering:
#9278
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather if we don't have a dbt env var but refactor the python/jinja to handle the code divergence, I created a PR on top of yours to solve this
/test connector=bases/base-normalization
|
I merged Chris code into my branch, tests worked locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one question, everything else makes sense!
with | ||
|
||
input_data as ( | ||
select * | ||
from _airbyte_test_normalization.dedup_cdc_excluded_ab3 | ||
from _airbyte_test_normalization.dedup_cdc_excluded_stg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my education: what do ab3
and stg
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you use incremental + dedup
sync mode dbt will create the stg tables for you, but other sync methods the extraction of data, data type conversion and hashing are made using sub-queries/tables with suffix ab1
, ab2
and ab3
.
@@ -65,7 +66,7 @@ dedup_data as ( | |||
-- additionally, we generate a unique key for the scd table | |||
row_number() over ( | |||
partition by _airbyte_unique_key, _airbyte_start_at, _airbyte_emitted_at, accurateCastOrNull(_ab_cdc_deleted_at, 'String'), accurateCastOrNull(_ab_cdc_updated_at, 'String') | |||
order by _airbyte_ab_id | |||
order by _airbyte_active_row desc, _airbyte_ab_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it expected that we're now sorting on active_row
in additional to ab_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this prevents cases having multiple _airbyte_ab_id
where there is some rows with 0 (not active) and making sure to select the active row.
/publish connector=bases/base-normalization
|
Hi @jrhizor , thank you for the PR merge. I just upgraded from 0.35.3-alpha to 0.35.4-alpha on my laptop (git clone this repo and run Let me know whether you can reproduce this or I did anything wrong. Log:
|
@jovezhong you should be thanking @ChristopheDuong for working on normalization :) |
it looks like it failed to produce the profiles.yml because it didn't find the "password" field in your configuration, did you set one up when configuring clickhouse in airbyte UI?
|
This is not my PR though 😛 |
Oh, good point, @ChristopheDuong Yes, I am connecting to my local clickhouse with Currently in destination-clickhouse's spec.json while normalizing the data via dbt, it'll be great to use empty string if the password is not set. Again, really appreciate you guys' prompt replies. |
What
This PR implements the code submitted in #9029 by community member.
Also corrects the normalization code to run Clickhouse destination.
I added two if clauses when creating the scd2 tables.
At the moment I added global vars in dbt_project to help me doing the logic.
I'll try creating a variable inline when creating the jinja2 template.
How
Describe the solution
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes