Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish PR 9029: clickhouse normalization #9072

Merged
merged 11 commits into from
Jan 5, 2022
Merged

Conversation

marcosmarxm
Copy link
Member

@marcosmarxm marcosmarxm commented Dec 23, 2021

What

This PR implements the code submitted in #9029 by community member.
Also corrects the normalization code to run Clickhouse destination.

I added two if clauses when creating the scd2 tables.
At the moment I added global vars in dbt_project to help me doing the logic.
I'll try creating a variable inline when creating the jinja2 template.

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@marcosmarxm marcosmarxm temporarily deployed to more-secrets December 23, 2021 02:23 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Dec 23, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1613888259
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1613888259
🐛 https://gradle.com/s/4a2gdwudna3xi

@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 02:33 Inactive
@github-actions github-actions bot added area/platform issues related to the platform area/worker Related to worker labels Dec 23, 2021
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Dec 23, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1616537057
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1616537057
🐛 https://gradle.com/s/he6a2u6xpanqe

@marcosmarxm marcosmarxm temporarily deployed to more-secrets December 23, 2021 17:44 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 17:45 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets December 23, 2021 18:59 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Dec 23, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1616745878
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1616745878
🐛 https://gradle.com/s/rmqmwcaviuq6q

@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 19:02 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Dec 27, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1627800284
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1627800284
🐛

@marcosmarxm marcosmarxm temporarily deployed to more-secrets December 27, 2021 17:44 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 27, 2021 17:45 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Dec 29, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1632342020
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1632342020
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-python/.venv/lib/python3.8/site-packages/coverage/report.py:87: CoverageWarning: Couldn't parse '/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-python/rep-32>': No source for code: '/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-python/rep-32>'. (couldnt-parse)
	 Name                                       Stmts   Miss  Cover
	   coverage._warn(msg, slug="couldnt-parse")
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            89     64    28%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/cdk/utils/event_timing.py         47      3    94%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     15    55%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        713    397    44%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 496    315    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1201    503    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	   coverage._warn(msg, slug="couldnt-parse")
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 496    315    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1201    503    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     124      4    97%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 496     37    93%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     45    69%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1201    185    85%

@marcosmarxm marcosmarxm temporarily deployed to more-secrets December 29, 2021 00:46 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 29, 2021 00:47 Inactive
@@ -699,6 +699,7 @@ def generate_scd_type_2_model(self, from_table: str, column_names: Dict[str, Tup
{{ sql_table_comment }}
),
{{ '{% endif %}' }}
{{ '{%- if var("destination") == "clickhouse" %}' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this if as jinja (runtime) ? instead of doing a if in python and change the generated SQL code when normalization runs (compile)?

Copy link
Member Author

@marcosmarxm marcosmarxm Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the code block inside the if have some jinja variables, and I don't think there is an option to deep jinja variable attribution. I thought this would be more easy to read.

file.sql
{{ my_custom_code }}
run.py
s = "{{ another_jinja }}"
a = "select * from table"
render(my_custom_code=s, another_jinja=a)

This will break
What do you think @ChristopheDuong ? I can move the logic to a python if clause too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's move it to python and avoid adding a dbt environment variable for this in the dbt_project.yml

Here is a PR to split the jinja templating in multiple rendering:
#9278

Copy link
Contributor

@ChristopheDuong ChristopheDuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather if we don't have a dbt env var but refactor the python/jinja to handle the code divergence, I created a PR on top of yours to solve this

@edgao edgao temporarily deployed to more-secrets January 4, 2022 18:39 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets January 4, 2022 20:11 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jan 4, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1655883054
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1655883054
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            89     64    28%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/cdk/utils/event_timing.py         47      3    94%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     15    55%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        713    397    44%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 502    321    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1207    509    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 502    321    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1207    509    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     124      4    97%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 502     37    93%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     45    69%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1207    185    85%

@marcosmarxm marcosmarxm temporarily deployed to more-secrets January 4, 2022 22:45 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 4, 2022 22:47 Inactive
@marcosmarxm
Copy link
Member Author

I merged Chris code into my branch, tests worked locally.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one question, everything else makes sense!

with

input_data as (
select *
from _airbyte_test_normalization.dedup_cdc_excluded_ab3
from _airbyte_test_normalization.dedup_cdc_excluded_stg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my education: what do ab3 and stg mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you use incremental + dedup sync mode dbt will create the stg tables for you, but other sync methods the extraction of data, data type conversion and hashing are made using sub-queries/tables with suffix ab1, ab2 and ab3.

@@ -65,7 +66,7 @@ dedup_data as (
-- additionally, we generate a unique key for the scd table
row_number() over (
partition by _airbyte_unique_key, _airbyte_start_at, _airbyte_emitted_at, accurateCastOrNull(_ab_cdc_deleted_at, 'String'), accurateCastOrNull(_ab_cdc_updated_at, 'String')
order by _airbyte_ab_id
order by _airbyte_active_row desc, _airbyte_ab_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected that we're now sorting on active_row in additional to ab_id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this prevents cases having multiple _airbyte_ab_id where there is some rows with 0 (not active) and making sure to select the active row.

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jan 5, 2022

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1656148208
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1656148208

@jrhizor jrhizor temporarily deployed to more-secrets January 5, 2022 00:18 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets January 5, 2022 01:55 Inactive
@marcosmarxm marcosmarxm merged commit de56d47 into master Jan 5, 2022
@marcosmarxm marcosmarxm deleted the marcos/test-pr-9029 branch January 5, 2022 02:28
@jovezhong
Copy link
Contributor

jovezhong commented Jan 7, 2022

Hi @jrhizor , thank you for the PR merge.

I just upgraded from 0.35.3-alpha to 0.35.4-alpha on my laptop (git clone this repo and run docker-compose up). The clickhouse normalization still failed, with FileNotFoundError: [Errno 2] No such file or directory: '/data/12/0/normalize/profiles.yml'. I am simply loading a CSV file as source and send to local clickhouse as the destination.

Let me know whether you can reproduce this or I did anything wrong.

Log:

2022-01-07 15:56:51 �[32mINFO�[m i.a.c.i.LineGobbler(voidCall):82 - Pulled airbyte/normalization-clickhouse:0.1.63 from remote.
2022-01-07 15:56:51 �[32mINFO�[m i.a.w.p.DockerProcessFactory(create):171 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/12/0/normalize --network host --log-driver none airbyte/normalization-clickhouse:0.1.63 run --integration-type clickhouse --config destination_config.json --catalog destination_catalog.json
2022-01-07 15:56:52 �[42mnormalization�[0m > Running: transform-config --config destination_config.json --integration-type clickhouse --out /data/12/0/normalize
2022-01-07 15:56:52 �[42mnormalization�[0m > Namespace(config='destination_config.json', integration_type=<DestinationType.clickhouse: 'clickhouse'>, out='/data/12/0/normalize')
2022-01-07 15:56:52 �[42mnormalization�[0m > Traceback (most recent call last):
2022-01-07 15:56:52 �[42mnormalization�[0m > transform_clickhouse
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/bin/transform-config", line 8, in
2022-01-07 15:56:52 �[42mnormalization�[0m > sys.exit(main())
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 316, in main
2022-01-07 15:56:52 �[42mnormalization�[0m > TransformConfig().run(args)
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 33, in run
2022-01-07 15:56:52 �[42mnormalization�[0m > transformed_config = self.transform(integration_type, original_config)
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 62, in transform
2022-01-07 15:56:52 �[42mnormalization�[0m > transformed_integration_config = {
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 278, in transform_clickhouse
2022-01-07 15:56:52 �[42mnormalization�[0m > "password": config["password"],
2022-01-07 15:56:52 �[42mnormalization�[0m > KeyError: 'password'
2022-01-07 15:56:52 �[42mnormalization�[0m > Running: transform-catalog --integration-type clickhouse --profile-config-dir /data/12/0/normalize --catalog destination_catalog.json --out /data/12/0/normalize/models/generated/ --json-column _airbyte_data
2022-01-07 15:56:52 �[42mnormalization�[0m > Traceback (most recent call last):
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/bin/transform-catalog", line 8, in
2022-01-07 15:56:52 �[42mnormalization�[0m > sys.exit(main())
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 82, in main
2022-01-07 15:56:52 �[42mnormalization�[0m > TransformCatalog().run(args)
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 34, in run
2022-01-07 15:56:52 �[42mnormalization�[0m > self.parse(args)
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 45, in parse
2022-01-07 15:56:52 �[42mnormalization�[0m > profiles_yml = read_profiles_yml(parsed_args.profile_config_dir)
2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 66, in read_profiles_yml
2022-01-07 15:56:52 �[42mnormalization�[0m > with open(os.path.join(profile_dir, "profiles.yml"), "r") as file:
2022-01-07 15:56:52 �[42mnormalization�[0m > FileNotFoundError: [Errno 2] No such file or directory: '/data/12/0/normalize/profiles.yml'

@davinchia
Copy link
Contributor

@jovezhong you should be thanking @ChristopheDuong for working on normalization :)

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Jan 7, 2022

Let me know whether you can reproduce this or I did anything wrong.

Log:

2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 278, in transform_clickhouse
2022-01-07 15:56:52 �[42mnormalization�[0m > "password": config["password"],
2022-01-07 15:56:52 �[42mnormalization�[0m > KeyError: 'password'

it looks like it failed to produce the profiles.yml because it didn't find the "password" field in your configuration, did you set one up when configuring clickhouse in airbyte UI?

  • The transform_clickhouse method should probably check if the field is populated before accessing it though
  • or the destination clickhouse should declare the password field as required..

@ChristopheDuong
Copy link
Contributor

@jovezhong you should be thanking @ChristopheDuong for working on normalization :)

This is not my PR though 😛

@jovezhong
Copy link
Contributor

Let me know whether you can reproduce this or I did anything wrong.
Log:

2022-01-07 15:56:52 �[42mnormalization�[0m > File "/usr/local/lib/python3.8/site-packages/normalization/transform_config/transform.py", line 278, in transform_clickhouse
2022-01-07 15:56:52 �[42mnormalization�[0m > "password": config["password"],
2022-01-07 15:56:52 �[42mnormalization�[0m > KeyError: 'password'

it looks like it failed to produce the profiles.yml because it didn't find the "password" field in your configuration, did you set one up when configuring clickhouse in airbyte UI?

  • The transform_clickhouse method should probably check if the field is populated before accessing it though
  • or the destination clickhouse should declare the password field as required..

Oh, good point, @ChristopheDuong Yes, I am connecting to my local clickhouse with default database, default username, and empty password. In production environments, the access will be password protected, so it should be okay.

Currently in destination-clickhouse's spec.json "required": ["host", "port", "database", "username"] It makes sense, although we could also set default value host=localhost, port=8123,database=default, username=default

while normalizing the data via dbt, it'll be great to use empty string if the password is not set.

Again, really appreciate you guys' prompt replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants