-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source S3: support of Parquet format #5305
Conversation
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix branch conflicts + fix the airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
file conflict on this branch.
airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
Outdated
Show resolved
Hide resolved
/test connector=connectors/source-s3 |
1 similar comment
/test connector=connectors/source-s3 |
/test connector=connectors/source-s3 |
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <[email protected]>
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, lgtm! One small note on buffer_size
to make that more clear.
Due to the way we're iterating through individual files at the abstract-level, I anticipate issues with partitioned parquet datasets. I think we should make clear in the documentation that partitioned parquet datasets are unsupported for now.
For more context, it should work however the performance could be very bad + the columns used for partition would be missing from output (I think).
...te-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py
Outdated
Show resolved
Hide resolved
/test connector=connectors/source-s3
|
/publish connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/publish connector=connectors/source-s3
|
How
Using same lib 'pyarrow' as for csv parsing
Recommended reading order
formats/parquet_spec.py
formats/parquet_parserpy
Pre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here