Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 Snowflake produces permanent tables 🚨 #9063

Merged
merged 38 commits into from
Jan 6, 2022
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c6ae621
add normalization-clickhouse docker build step
jzcruiser Dec 22, 2021
b88225e
Merge branch 'patch-4' of github.com:jzcruiser/airbyte into marcos/te…
marcosmarxm Dec 23, 2021
cc499c2
bump normalization version
marcosmarxm Dec 23, 2021
a2517a3
small changes gradle
marcosmarxm Dec 23, 2021
4a51799
Merge branch 'master' into marcos/test-pr-9029
marcosmarxm Dec 23, 2021
a57495c
fix settings gradle
marcosmarxm Dec 27, 2021
c56ef54
fix eof file
marcosmarxm Dec 27, 2021
f8ccfd6
correct clickhouse normalization
marcosmarxm Dec 29, 2021
f24ea4b
Merge branch 'master' into marcos/test-pr-9029
edgao Jan 4, 2022
a6e4c31
Refactor jinja template for scd (#9278)
ChristopheDuong Jan 4, 2022
4f9f8ae
merge chris code and regenerate sql files
marcosmarxm Jan 4, 2022
621ae20
add snowflake as copy of standard
edgao Dec 22, 2021
d0ca72a
snowflake creates permanent tables
edgao Dec 22, 2021
3aef7c9
dockerfile respects updated dbt_project.yml
edgao Dec 22, 2021
05b0b0f
add to docker-compose
edgao Dec 22, 2021
5a9bacd
hook up custom normalization image
edgao Dec 22, 2021
f7c2d9b
add to test
edgao Dec 22, 2021
8c96d2d
more fixes?
edgao Dec 22, 2021
a0e7399
build.gradle; some sort of test?
edgao Dec 22, 2021
c0a755b
add to integration test
edgao Dec 23, 2021
4f92a66
case-sensitive patterns
edgao Dec 23, 2021
be09211
handle m1 error
edgao Dec 23, 2021
93c0b15
more snowflake-specific handling
edgao Dec 23, 2021
0bb000c
ran tests
edgao Dec 24, 2021
7e73552
docs + version bumps
edgao Dec 24, 2021
2f7dc29
inject :dev normalization version during test
edgao Dec 24, 2021
b31f5ec
try hardcoding :dev image
edgao Dec 27, 2021
405d038
add destination variable
edgao Dec 29, 2021
0c55399
regenerate test output
edgao Dec 29, 2021
b8db991
typo
edgao Dec 30, 2021
d8abb0c
exclude snowflake dbt template from spotless
edgao Dec 30, 2021
baf0db2
clarify documentation
edgao Jan 5, 2022
fe10d82
Merge branch 'master' into edgao/snowflake_permanent_tables
edgao Jan 5, 2022
01bcb67
regenerate normalization_test_output
edgao Jan 5, 2022
ab662e3
delete unused variable
edgao Jan 6, 2022
03e38dc
minor bump actually
edgao Jan 6, 2022
dffe792
bump definition
edgao Jan 6, 2022
663b498
Merge branch 'master' into edgao/snowflake_permanent_tables
edgao Jan 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
!dbt-project-template-mysql
!dbt-project-template-oracle
!dbt-project-template-clickhouse
!dbt-project-template-snowflake
11 changes: 11 additions & 0 deletions airbyte-integrations/bases/base-normalization/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,29 @@ integration_tests/normalization_test_output/**/*.yml
# Simple Streams
!integration_tests/normalization_test_output/**/dedup_exchange_rate*.sql
!integration_tests/normalization_test_output/**/exchange_rate.sql
!integration_tests/normalization_test_output/**/DEDUP_EXCHANGE_RATE*.sql
!integration_tests/normalization_test_output/**/EXCHANGE_RATE.sql
# Nested Streams
# Parent table
!integration_tests/normalization_test_output/**/nested_stream_with*_names_ab*.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_names_scd.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_names.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_NAMES_AB*.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_NAMES_SCD.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_NAMES.sql
# Nested table
!integration_tests/normalization_test_output/**/nested_stream_with_*_partition_ab1.sql
!integration_tests/normalization_test_output/**/nested_stream_with_*_data_ab1.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_partition_scd.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_data_scd.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_partition.sql
!integration_tests/normalization_test_output/**/nested_stream_with*_data.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH_*_PARTITION_AB1.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH_*_DATA_AB1.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_PARTITION_SCD.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_DATA_SCD.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_PARTITION.sql
!integration_tests/normalization_test_output/**/NESTED_STREAM_WITH*_DATA.sql

# but we keep all sql files for Postgres
!integration_tests/normalization_test_output/postgres/**/*.sql
Expand Down
5 changes: 5 additions & 0 deletions airbyte-integrations/bases/base-normalization/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,16 @@ task airbyteDockerClickhouse(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('clickhouse')
dependsOn assemble
}
task airbyteDockerSnowflake(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('snowflake')
dependsOn assemble
}

airbyteDocker.dependsOn(airbyteDockerMSSql)
airbyteDocker.dependsOn(airbyteDockerMySql)
airbyteDocker.dependsOn(airbyteDockerOracle)
airbyteDocker.dependsOn(airbyteDockerClickhouse)
airbyteDocker.dependsOn(airbyteDockerSnowflake)

task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs) {
module = "pytest"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# This file is necessary to install dbt-utils with dbt deps
# the content will be overwritten by the transform function

# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: "airbyte_utils"
version: "1.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: "normalize"

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
docs-paths: ["docs"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
modules-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies

clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"

quoting:
database: true
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
schema: false
identifier: true

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
models:
+transient: false
airbyte_utils:
+materialized: table
generated:
airbyte_ctes:
+tags: airbyte_internal_cte
+materialized: ephemeral
airbyte_incremental:
+tags: incremental_tables
+materialized: incremental
+on_schema_change: sync_all_columns
airbyte_tables:
+tags: normalized_tables
+materialized: table
airbyte_views:
+tags: airbyte_internal_views
+materialized: view

dispatch:
- macro_namespace: dbt_utils
search_order: ["airbyte_utils", "dbt_utils"]
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,10 @@ services:
context: .
labels:
io.airbyte.git-revision: ${GIT_REVISION}
normalization-snowflake:
image: airbyte/normalization-snowflake:${VERSION}
build:
dockerfile: snowflake.Dockerfile
context: .
labels:
io.airbyte.git-revision: ${GIT_REVISION}
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ services:
image: airbyte/normalization-oracle:${VERSION}
normalization-clickhouse:
image: airbyte/normalization-clickhouse:${VERSION}
normalization-snowflake:
image: airbyte/normalization-snowflake:${VERSION}
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,8 @@ def get_normalization_image(destination_type: DestinationType) -> str:
return "airbyte/normalization-oracle:dev"
elif DestinationType.CLICKHOUSE.value == destination_type.value:
return "airbyte/normalization-clickhouse:dev"
elif DestinationType.SNOWFLAKE.value == destination_type.value:
return "airbyte/normalization-snowflake:dev"
else:
return "airbyte/normalization:dev"

Expand Down Expand Up @@ -445,6 +447,8 @@ def run_check_dbt_command(normalization_image: str, command: str, cwd: str, forc
"Configuration paths exist in your dbt_project.yml", # When no cte / view are generated
"Error loading config file: .dockercfg: $HOME is not defined", # ignore warning
"depends on a node named 'disabled_test' which was not found", # Tests throwing warning because it is disabled
"The requested image's platform (linux/amd64) does not match the detected host platform "
+ "(linux/arm64/v8) and no specific platform was requested", # temporary patch until we publish images for arm64
]:
if except_clause in str_line:
is_exception = True
Expand Down
25 changes: 13 additions & 12 deletions ...integration_tests/normalization_test_output/snowflake/test_nested_streams/dbt_project.yml
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'airbyte_utils'
version: '1.0'
name: "airbyte_utils"
version: "1.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'normalize'
profile: "normalize"

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
Expand All @@ -22,25 +22,26 @@ test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
modules-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies
target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
modules-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies

clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"
clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"

quoting:
database: true
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
schema: false
identifier: true

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
models:
+transient: false
airbyte_utils:
+materialized: table
generated:
Expand All @@ -60,4 +61,4 @@ models:

dispatch:
- macro_namespace: dbt_utils
search_order: ['airbyte_utils', 'dbt_utils']
search_order: ["airbyte_utils", "dbt_utils"]
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES" as
(select * from(

-- Final base SQL model
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION" as
(select * from(

with __dbt__cte__NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_AB1 as (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DATA" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DATA" as
(select * from(

with __dbt__cte__NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DATA_AB1 as (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DOUBLE_ARRAY_DATA" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DOUBLE_ARRAY_DATA" as
(select * from(

with __dbt__cte__NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_DOUBLE_ARRAY_DATA_AB1 as (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_SCD" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_SCD" as
(select * from(

-- depends_on: ref('NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_STG')
Expand Down
25 changes: 13 additions & 12 deletions ...integration_tests/normalization_test_output/snowflake/test_simple_streams/dbt_project.yml
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'airbyte_utils'
version: '1.0'
name: "airbyte_utils"
version: "1.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'normalize'
profile: "normalize"

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
Expand All @@ -22,25 +22,26 @@ test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
modules-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies
target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
modules-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies

clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"
clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"

quoting:
database: true
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
schema: false
identifier: true

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
models:
+transient: false
airbyte_utils:
+materialized: table
generated:
Expand All @@ -60,4 +61,4 @@ models:

dispatch:
- macro_namespace: dbt_utils
search_order: ['airbyte_utils', 'dbt_utils']
search_order: ["airbyte_utils", "dbt_utils"]
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."DEDUP_EXCHANGE_RATE" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."DEDUP_EXCHANGE_RATE" as
(select * from(

-- Final base SQL model
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."DEDUP_EXCHANGE_RATE_SCD" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."DEDUP_EXCHANGE_RATE_SCD" as
(select * from(

-- depends_on: ref('DEDUP_EXCHANGE_RATE_STG')
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."EXCHANGE_RATE" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."EXCHANGE_RATE" as
(select * from(

with __dbt__cte__EXCHANGE_RATE_AB1 as (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


create or replace transient table "AIRBYTE_DATABASE".TEST_NORMALIZATION."EXCHANGE_RATE" as
create or replace table "AIRBYTE_DATABASE".TEST_NORMALIZATION."EXCHANGE_RATE" as
(select * from(

with __dbt__cte__EXCHANGE_RATE_AB1 as (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ def setup_test_dir(integration_type: str) -> str:
copy_tree("../dbt-project-template-mysql", test_root_dir)
elif integration_type == DestinationType.ORACLE.value:
copy_tree("../dbt-project-template-oracle", test_root_dir)
elif integration_type == DestinationType.SNOWFLAKE.value:
copy_tree("../dbt-project-template-snowflake", test_root_dir)
return test_root_dir


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ def setup_test_dir(destination_type: DestinationType, test_resource_name: str) -
elif destination_type.value == DestinationType.CLICKHOUSE.value:
copy_tree("../dbt-project-template-clickhouse", test_root_dir)
dbt_project_yaml = "../dbt-project-template-clickhouse/dbt_project.yml"
elif destination_type.value == DestinationType.SNOWFLAKE.value:
copy_tree("../dbt-project-template-snowflake", test_root_dir)
dbt_project_yaml = "../dbt-project-template-snowflake/dbt_project.yml"
dbt_test_utils.copy_replace(dbt_project_yaml, os.path.join(test_root_dir, "dbt_project.yml"))
return test_root_dir

Expand Down
33 changes: 33 additions & 0 deletions airbyte-integrations/bases/base-normalization/snowflake.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM fishtownanalytics/dbt:0.21.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we creating a special dockerfile for snowflake because it's DBT yamls deviate from the standard yamls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. Chris suggested modifying entrypoint.sh to use the modified yaml instead of publishing a new docker image, but my thinking was to keep entrypoint.sh integration-agnostic.

COPY --from=airbyte/base-airbyte-protocol-python:0.1.1 /airbyte /airbyte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to get this docker image anymore? can't we just pip install airbyte-cdk which contains the CDK models? not blocking but a good sweep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the COPY command equivalent to that pip install? (looks like the the docker image just contains base_python_structs)


# Install SSH Tunneling dependencies
RUN apt-get update && apt-get install -y jq sshpass

WORKDIR /airbyte
COPY entrypoint.sh .
COPY build/sshtunneling.sh .

WORKDIR /airbyte/normalization_code
COPY normalization ./normalization
COPY setup.py .
COPY dbt-project-template/ ./dbt-template/
COPY dbt-project-template-snowflake/* ./dbt-template/

# Install python dependencies
WORKDIR /airbyte/base_python_structs
RUN pip install .

WORKDIR /airbyte/normalization_code
RUN pip install .

WORKDIR /airbyte/normalization_code/dbt-template/
# Download external dbt dependencies
RUN dbt deps

WORKDIR /airbyte
ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh"
ENTRYPOINT ["/airbyte/entrypoint.sh"]

LABEL io.airbyte.version=0.1.62
LABEL io.airbyte.name=airbyte/normalization-snowflake
Loading