Skip to content

Commit

Permalink
vdk-impala: Add optional parameter for staging table prefix (#1666)
Browse files Browse the repository at this point in the history
What:
Adding the option to pass a staging table prefix as well as code
refactoring and readme file enhancements

Why:
It is linked to the issue

Signed-off-by: Stefan Buldeev [email protected]

---------

Signed-off-by: Stefan Buldeev [email protected]
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
2 people authored and yonitoo committed Mar 1, 2023
1 parent 1bdda95 commit c84fd29
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,19 @@ def run(job_input: IJobInput):
source_view = job_arguments.get("source_view")
target_schema = job_arguments.get("target_schema")
target_table = job_arguments.get("target_table")
staging_schema = job_arguments.get("staging_schema", target_schema)
insert_query = get_query("02-insert-into-target.sql")
staging_table_name = f"vdk_check_{target_table}"

if check:
if not staging_schema:
staging_schema = job_arguments.get("staging_schema", target_schema)
staging_table_name = f"vdk_check_{target_schema}_{target_table}"

if len(staging_table_name) > 128:
raise ValueError(
"No staging_schema specified to execute the defined data checks against."
f"Staging table - {staging_table_name} exceeds the 128 character limit."
)

staging_table = f"{staging_schema}.{staging_table_name}"

align_stg_table_with_target(
f"{target_schema}.{target_table}", staging_table, job_input
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ In summary, it overwrites the target table with the source data.

### Template Parameters (template_args):

- target_schema - SC Data Warehouse schema, where target data is loaded
- target_table - SC Data Warehouse table of DW type 'Slowly Changing Dimension Type 1', where target data is loaded
- source_schema - SC Data Lake schema, where source raw data is loaded from
- source_view - SC Data Lake view, where source raw data is loaded from
- target_schema - SC Data Warehouse schema, where target data is loaded
- target_table - SC Data Warehouse table of DW type 'Slowly Changing Dimension Type 1', where target data is loaded
- source_schema - SC Data Lake schema, where source raw data is loaded from
- source_view - SC Data Lake view, where source raw data is loaded from
- check - (Optional) Callback function responsible for checking the quality of the data
- staging_schema - (Optional) Schema where the checks will be executed. If not provided target_schema will be used as default

### Prerequisites:

Expand Down

0 comments on commit c84fd29

Please sign in to comment.