-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #14215: Add missing decode stage to gz/zip files in json ingestion reader. #14375
Conversation
Files that where zip/gz where not being decoded. This was leading to a error when we wanted them to be.
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
…n json ingestion reader. (open-metadata#14375) * add decoding stage to gz/zip files. Files that where zip/gz where not being decoded. This was leading to a error when we wanted them to be. * remove unnecessary comment --------- Co-authored-by: Carl Kristensen <[email protected]>
Describe your changes:
Fixes #14215
The decode stage was missing from the storage reader
I worked on fixing the s3 reader for json.gz files because it was giving me a error:
[2023-12-04 10:05:23] ERROR {metadata.Utils:datalake_utils:75} - Error fetching file [bucket-name/path/tofolder/part-00062-sda-c000.json.gz] using [S3Config] due to: [Error reading dataframe due to [a bytes-like object is required, not 'str']]
When I dig into the code, i notice that gz and zip is already supported, but there is a bug in the implementation.
The decode stage was missing from the files that where zip/gz.
So i fixed that.
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>