-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Destination Redshift (copy): accept bucket path for staging data #8607
Conversation
...src/test/java/io/airbyte/integrations/destination/jdbc/copy/s3/LegacyS3StreamCopierTest.java
Show resolved
Hide resolved
7a9e31c
to
1ce7b90
Compare
322ce7c
to
22e2bd7
Compare
@sherifnada do we ever do minor version bumps on connectors? I want to have destination-s3 go from 0.1.16 to 0.2.0 to reflect the change in filename convention; is that reasonable or is this still considered just a patch-level change? i.e. this changelog entry fed6c36#diff-69de65b90934b0e7e8f813cfa8298d11e6cc1104e9f64e3a010eee0441b81db9R226 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao doing a minor version bump is totally fine. Once we push connectors to GA we'll also start doing honest-to-god semver.
airbyte-integrations/connectors/destination-redshift/src/main/resources/spec.json
Outdated
Show resolved
Hide resolved
* @return The path within the bucket that this writer will create. For example, if we wrote to "s3://yourBucket/some/path/to/file.csv", this method | ||
* would return "some/path/to/file.csv". | ||
*/ | ||
String getObjectPath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would getOutputPath
be accurate?
outputStreams.get(0).toString(StandardCharsets.UTF_8)); | ||
} | ||
|
||
private static void checkObjectName(final String objectName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we really just want to test the name generator, and that the name generator is being used. this is fine for now IMO, may be worth leaving this context you provided as a comment and saying output path generator should be probably be DI'd and we can just test the generator itself? in any case this is fine
Co-authored-by: Sherif A. Nada <[email protected]>
Co-authored-by: Sherif A. Nada <[email protected]>
/test connector=connectors/destination-snowflake
|
/publish connector=connectors/destination-redshift
|
/publish connector=connectors/destination-s3
|
/publish connector=connectors/destination-redshift
|
What
Create a new S3StreamCopier that utilizes the same code as destination-s3, rather than rolling its own S3 upload logic. Addresses #8550
Also, incidentally allows users to specify the bucket path to store staging data in.
How
@Deprecate
themMisc. notes:
s3://bucket/randomUuid/schema/stream-12345
, where12345
is the number of the file (i.e. starts at00000
and increments for each file). Note that there was no.csv
extension at all.s3://bucket/bucketPath/namespace/stream/timestamp_randomUuid.csv
.bucketPath
option.timestamp
is set to when theRedshiftStreamCopier
is instantiated. I'm pretty sure this is basically the time at which theAirbyteMessageConsumer
is created, i.e. pretty close to the start of the upload. But technically NOT the same.S3Writer#getObjectKey
is unimplemented in most classes, because I didn't want to dig into all of their behaviors.mockito:mockito-inline
dependencies are needed for the mocked constructors.Recommended reading order
S3CsvWriter.java
+ test +StagingDatabaseCsvSheetGenerator.java
+BaseS3Writer
S3StreamCopier.java
+ test + factoryLegacyS3StreamCopierTest.java
- mostly there to demonstrate identical behavior withS3StreamCopierTest.java
RedshiftStreamCopier.java
+ test + factoryspec.json
(adding parsing for this new option happened in https://github.com/airbytehq/airbyte/pull/8562/files#diff-234d0545ef4c8a9abd756f519d85f20cefa71acacdb93a9a8f4ce2a6f88482f3R74-R77 )S3DestinationConfig
,S3CsvFormatConfig
build.gradle
(new mockito-inline dep), the variousXyzWriter.java
(autoformatting + stubgetObjectKey
implementation)This commit (1b4f41f#diff-5a76fbf0219c39138b2a18b074b17dde604966094df637542b142b08cd4358b9) could be interesting; it's the diff between the legacy and new behavior.
🚨 User Impact 🚨
None. The temporary files will be named differently, but they will still be created+deleted the same way.
They will be in a different directory. Hopefully nobody was allowlisting a service account to only have permission to write into
s3://bucket/anyValidUuid/*
.Pre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here