-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move S3Config into destination-s3; update dependencies accordingly #8562
Conversation
2e4e078
to
34dd5f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo unit tests
...ctors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Destination.java
Show resolved
Hide resolved
s3.deleteObject(s3Bucket, outputTableName); | ||
} | ||
|
||
public static AmazonS3 getAmazonS3(final S3DestinationConfig s3Config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a unit test make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test, but it's pretty mediocre. AmazonS3
objects don't actually expose any methods to extract their configurations, so all the test accomplishes is to check that this method doesn't throw an exception, and that it returns something non-null :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I can just kill this method and use S3DestinationConfig#getS3Client
instead
hm. I'm pretty sure |
hmmm. between the docs and (I think?) the spec, is spec.json canonical or are we building towards what docs.airbyte describes? Specifically, the docs say that part size is a top-level option for destination-s3, but the spec is putting it inside the lower-level avro/csv/jsonl format objects, with parquet not exposing it at all (the partSize field on S3DestinationConfig is actually redundant with S3FormatConfig#getPartSize, but not sure which one would be better to remove) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...s/destination-s3/src/test/java/io/airbyte/integrations/destination/s3/S3DestinationTest.java
Show resolved
Hide resolved
Page size is relevant for Parquet. Part size is not. Here is how we construct the S3 Parquet writer, it requires the page size for compression: Here is how we construct the routine S3 writer (for JSONL and CSV), it requires the part size for uploading: |
@edgao, the doc is outdated. The part size is only needed for Avro, CSV, and JSONL. It is not required for Parquet, at least not for now, because the Parquet writer is constructed differently and does not require this parameter. So the current spec.json is correct that the If I remember correctly, when the S3 destination was first rolled out, it only supported CSV. That's why the part size was at the top level initially. But when we added Parquet as the second format, we realized that part size should not be a top level parameter, and moved it inside format. However, the doc did not get updated accordingly. |
...integration/java/io/airbyte/integrations/destination/jdbc/JdbcDestinationAcceptanceTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will update the docs to match the spec.
...integration/java/io/airbyte/integrations/destination/jdbc/JdbcDestinationAcceptanceTest.java
Show resolved
Hide resolved
latest commits just do some final code cleanup (replacing calls to S3StreamCopier.attemptWriteAndDelete with the S3Destination method) and updating the S3 docs to match the spec. Doc update isn't super detailed, it just removes the part_size param and links to spec.json instead of referencing a nonexistent paragraph. Do I need to run through all the "Airbyter" stuff in the PR template? I.e. |
/test connector=destination-snowflake
|
/test connector=destination-snowflake
|
/test connector=destination-redshift
|
/test connector=destination-gcs
|
* upgrade gradle * upgrade to Java 17 (and fix a few of the node versioning misses) * oops * try to run a different format version * fix spotless by upgrading / reformatting some files * fix ci settings * upgrade mockito to avoid other errors * undo bad format * fix "incorrect" sql comments * fmt * add debug flag * remove * bump * bump jooq to a version that has a java 17 dist * fix * remove logs * oops * revert jooq upgrade * fix * set up java for connector test * fix yaml * generate std source tests * fail zombie job attempts and add failure reason (#8709) * fail zombie job attempts and add failure reason * remove failure reason * bump gcp dependencies to pick up grpc update (#8713) * Bump Airbyte version from 0.33.9-alpha to 0.33.10-alpha (#8714) Co-authored-by: jrhizor <[email protected]> * Change CDK "Caching" header to "nested streams & caching" * Update fields in source-connectors specifications: file, freshdesk, github, google-directory, google-workspace-admin-reports, iterable (#8524) Signed-off-by: Sergey Chvalyuk <[email protected]> Co-authored-by: Serhii Chvaliuk <[email protected]> Co-authored-by: Sherif A. Nada <[email protected]> * move S3Config into destination-s3; update dependencies accordingly (#8562) Co-authored-by: Lake Mossman <[email protected]> Co-authored-by: jrhizor <[email protected]> Co-authored-by: Sherif A. Nada <[email protected]> Co-authored-by: Iryna Grankova <[email protected]> Co-authored-by: Serhii Chvaliuk <[email protected]> Co-authored-by: Edward Gao <[email protected]>
@@ -11,6 +11,9 @@ WORKDIR /airbyte | |||
|
|||
ENV APPLICATION destination-snowflake | |||
|
|||
# Needed for JDK17 (in turn, needed on M1 macs) - see https://github.com/snowflakedb/snowflake-jdbc/issues/589#issuecomment-983944767 | |||
ENV DESTINATION_SNOWFLAKE_OPTS "--add-opens java.base/java.nio=ALL-UNNAMED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'm not sure what happened - snowflake was definitely passing for me locally, but after the java 17 upgrade it's failing again. Haven't been able to find a fix, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao @VitaliiMaltsev Were you guys able to find the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also https://github.com/airbytehq/airbyte/pull/8854/files#diff-a3dd6be9c3819c1fcfd593b0f11c29663912e686e0dd74cae14610ac9c282480R50, but technically we only needed to one or the other)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao is there any side effect using JVM command jvmArgs = ["--add-opens=java.base/java.nio=ALL-UNNAMED"]
?
Any potential security issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that this isn't a security check, it's more so that the Java devs can modify some Java internal implementation in the future. I.e. they want to discourage reliance on java's internal APIs, so access is now disabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao thanks for the response. However, based on the JEPS-261:
The --add-exports and --add-opens options must be used with great care. You can use them to gain access to an internal API of a library module, or even of the JDK itself, but you do so at your own risk: If that internal API is changed or removed then your library or application will fail.
I'm not that familiar with how gaining more internal access to java.nio could be concerning. Therefore, asking if you have insight into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk occurs when upgrading Java versions - there's no guarantee that the underlying implementation will remain the same. So in theory, the Java devs could release a new version of Java, which modifies those internal APIs, and therefore break the snowflake JDBC driver. For Airbyte, this risk is mitigated by the source/destination acceptance tests, which actually build the Docker image and check that it can interact with a real Snowflake instance, i.e. this would be caught before release. But we would be unable to upgrade to that new Java version until an updated snowflake-jdbc is released.
I'm not actually sure which issue is being used to track progress on removing snowflake-jdbc's dependency on those internal APIs, but would recommend asking in https://github.com/snowflakedb/snowflake-jdbc if you're interested in the details / how they're thinking about the problem. Some potentially related issues: snowflakedb/snowflake-jdbc#484, snowflakedb/snowflake-jdbc#589, snowflakedb/snowflake-jdbc#533
* upgrade gradle * upgrade to Java 17 (and fix a few of the node versioning misses) * oops * try to run a different format version * fix spotless by upgrading / reformatting some files * fix ci settings * upgrade mockito to avoid other errors * undo bad format * fix "incorrect" sql comments * fmt * add debug flag * remove * bump * bump jooq to a version that has a java 17 dist * fix * remove logs * oops * revert jooq upgrade * fix * set up java for connector test * fix yaml * generate std source tests * fail zombie job attempts and add failure reason (airbytehq#8709) * fail zombie job attempts and add failure reason * remove failure reason * bump gcp dependencies to pick up grpc update (airbytehq#8713) * Bump Airbyte version from 0.33.9-alpha to 0.33.10-alpha (airbytehq#8714) Co-authored-by: jrhizor <[email protected]> * Change CDK "Caching" header to "nested streams & caching" * Update fields in source-connectors specifications: file, freshdesk, github, google-directory, google-workspace-admin-reports, iterable (airbytehq#8524) Signed-off-by: Sergey Chvalyuk <[email protected]> Co-authored-by: Serhii Chvaliuk <[email protected]> Co-authored-by: Sherif A. Nada <[email protected]> * move S3Config into destination-s3; update dependencies accordingly (airbytehq#8562) Co-authored-by: Lake Mossman <[email protected]> Co-authored-by: jrhizor <[email protected]> Co-authored-by: Sherif A. Nada <[email protected]> Co-authored-by: Iryna Grankova <[email protected]> Co-authored-by: Serhii Chvaliuk <[email protected]> Co-authored-by: Edward Gao <[email protected]>
version bumps for airbytehq#8562
What
It doesn't make much sense for destination-jdbc to hold S3Config, or for destination-s3 to depend on destination-jdbc. This is a prereq for #8550.
Also delete the unnecessary JDBC acceptance test - it doesn't even work (b/c destination-jdbc doesn't even declare a main class).
Also update the S3 docs to reflect a past spec.json update.
How
Replace references to S3Config with S3DestinationConfig, then update module dependencies. Most importantly, remove s3's dependency on jdbc.
Note that a few class members got moved around to accommodate this:
S3StreamCopier#DEFAULT_PART_SIZE
now lives insideS3DestinationConfig
; this was the only place it was referenced anyway.S3StreamCopier
'sattemptS3WriteAndDelete
andgetAmazonS3
methods now live onS3Destination
. This necessitated updates to the JDBC, Redshift, and Snowflake destinations, which now listdestination-s3
as a dependency.Also had a few formatting changes for some reason, which intellij insists on making. Intellij is set up according to this, so not quite sure what's happening.
Recommended reading order
I'd actually recommend reviewing the two commits separately; they're only in one PR to keep this change atomic.
The first commit moves S3Config into destination-s3, and makes any module that depended on destination-jdbc now depend on destination-s3.
S3Config.java
S3Destination.java
S3StreamCopier.java
The second commit moves all S3Config references to use S3DestinationConfig instead.
S3DestinationConfig.java
- all of the changes are just copying in code from the oldS3Config
🚨 User Impact 🚨
If anyone had code that depended on
S3Config
/S3StreamCopier
(i.e. outside of this repo) then that will break. Otherwise, this should have no visible impact.Pre-merge Checklist
Expand the relevant checklist and delete the others.
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here