Skip to content

Commit

Permalink
Fix/records count (#33)
Browse files Browse the repository at this point in the history
* fix: total records and lambda dockerfile

* sum record count

* data-validator update
  • Loading branch information
lchen-2101 authored Jan 28, 2025
1 parent cf6fb99 commit 65d5e63
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 4 deletions.
4 changes: 2 additions & 2 deletions Lambda_Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.12
FROM public.ecr.aws/lambda/python:3.12

RUN pip install "poetry==1.8.4"

Expand All @@ -13,7 +13,7 @@ RUN poetry install --only main,processors

ARG LAMBDA_PATH

COPY ./${LAMBDA_PATH}/lambda_function.py ${LAMBDA_TASK_ROOT}
COPY ./src/sbl_validation_processor/${LAMBDA_PATH}/lambda_function.py ${LAMBDA_TASK_ROOT}

# Pass the name of the function handler as an argument to the runtime
CMD [ "lambda_function.lambda_handler" ]
4 changes: 2 additions & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions src/sbl_validation_processor/parquet_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ def combine_results(results: List[ValidationResults]):
):
syntax_error_counts = sum([r.error_counts.single_field_count for r in results])
val_res = {
"total_records": sum([r.record_count for r in results]),
"syntax_errors": {
"single_field_count": syntax_error_counts,
"multi_field_count": 0, # this will always be zero for syntax errors
Expand All @@ -148,6 +149,7 @@ def combine_results(results: List[ValidationResults]):
}
else:
val_res = {
"total_records": sum([r.record_count for r in results if r.phase == ValidationPhase.LOGICAL]),
"syntax_errors": {
"single_field_count": 0,
"multi_field_count": 0,
Expand Down
1 change: 1 addition & 0 deletions src/sbl_validation_processor/results_aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ def aggregate_validation_results(bucket, key, results):
SubmissionState.VALIDATION_EXPIRED,
SubmissionState.SUBMISSION_UPLOAD_MALFORMED,
]:
submission.total_records = results["total_records"]
file_paths, storage_options = get_parquet_paths(bucket, key)

# scan each result parquet into a lazyframe then diagonally concat so all columns are merged into the final lf. Otherwise
Expand Down

0 comments on commit 65d5e63

Please sign in to comment.