-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Destination S3 Support writing timestamps #7732
🎉 Destination S3 Support writing timestamps #7732
Conversation
vmaltsev seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
# Conflicts: # airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/avro/JsonToAvroSchemaConverter.java # airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetWriter.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay.
I spent quite some time on this, and the main concern I have for the current implementation is that we need to iterate through the Json object and modify it.
I think a better approach is to 1) keep the Json -> Avro schema converter change in this PR, 2) update the Json -> Avro object converter in https://github.com/airbytehq/json-avro-converter. Here is probably the entry point.
In this way, the Json object does not need to be mutated, and it is only iterated through once (in the Json -> Avro object converter). It also makes support more formats easier in the future, because we don't need to maintain the mutation methods in AvroRecordHelper.java
. Instead, the object converter already has a pretty solid iteration framework that we can extend.
Let me know what you think.
created PR airbytehq/json-avro-converter#9. Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Can you also update the doc here to include how the logical types you added here are processed?
if ((nonNullFieldTypes | ||
.stream().anyMatch(schema -> schema.getLogicalType() != null)) && | ||
(!nonNullFieldTypes.contains(Schema.create(Schema.Type.STRING)))) { | ||
nonNullFieldTypes.addLast(Schema.create(Schema.Type.STRING)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the nonNullFieldTypes
from the generic List
to LinkedList
does not seem necessary. The add
method on the List
interface is equivalent to addLast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -240,6 +254,11 @@ Schema getNullableFieldTypes(final String fieldName, final JsonNode fieldDefinit | |||
if (!nonNullFieldTypes.contains(NULL_SCHEMA)) { | |||
nonNullFieldTypes.add(0, NULL_SCHEMA); | |||
} | |||
if ((nonNullFieldTypes | |||
.stream().anyMatch(schema -> schema.getLogicalType() != null)) && | |||
(!nonNullFieldTypes.contains(Schema.create(Schema.Type.STRING)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to save Schema.create(Schema.Type.STRING))
as a static private variable to avoid creating the string schema repeatedly. This if
block does it twice: one in the check, and one to be appended to the field type list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -240,6 +254,11 @@ Schema getNullableFieldTypes(final String fieldName, final JsonNode fieldDefinit | |||
if (!nonNullFieldTypes.contains(NULL_SCHEMA)) { | |||
nonNullFieldTypes.add(0, NULL_SCHEMA); | |||
} | |||
if ((nonNullFieldTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comments before this if
block to explain why the string schema is added at the end?
Something like:
Logical types are converted to a union of logical type itself and string. The purpose is to default the logical type field to a string, if the value of the logical type field is invalid and cannot be properly processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/publish connector=connectors/destination-s3
|
# Conflicts: # airbyte-config/init/src/main/resources/config/STANDARD_DESTINATION_DEFINITION/4816b78f-1489-44c1-9060-4b19d5fa9362.json # airbyte-integrations/connectors/source-mongodb-strict-encrypt/src/test-integration/java/io/airbyte/integrations/source/mongodb/MongodbSourceStrictEncryptAcceptanceTest.java
/publish connector=connectors/destination-s3
|
Follow ups:
|
* get date-time format form json schema * created universal date-time converter * implemented jsonnode transformation for avro and parquet * removed unneeded dependency from build.gradle * fix checkstyle * add DateTimeUtilsTest * add AvroRecordHelperTest * resolve merge conflicts | fix checkstyle * update LocalTime parsing * added String type to avro schema for Logical Types, removed date-time conversion * fix checkstyle * fix checkstyle * added static String schema, added comments * bump version Co-authored-by: vmaltsev <[email protected]>
What
Support writing timestamps for S3 in Avro and Parquet formats
How
According avro documentation https://avro.apache.org/docs/current/spec.html#Date.
Date
A date logical type annotates an Avro int, where the int stores the number of days from the unix epoch, 1 January 1970 (ISO calendar).
Time (microsecond precision)
A time-micros logical type annotates an Avro long, where the long stores the number of microseconds after midnight, 00:00:00.000000.
Timestamp (microsecond precision)
A timestamp-micros logical type annotates an Avro long, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes