-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS S3 Staging for Snowflake Destinations not working correctly #3935
Comments
@sherifnada this is not a problem with the connector. I set up a postgres source with CDC and the error happen there too. |
Setting new source with CDCVisiting the setting page after setting the sourcecurl -X POST "http://localhost:8001/api/v1/sources/get" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"sourceId":"353715d2-f303-402f-b572-9b2e47e7bca7"}' {
"sourceDefinitionId": "decd338e-5647-4c0b-adf4-da0e75f5a750",
"sourceId": "353715d2-f303-402f-b572-9b2e47e7bca7",
"workspaceId": "5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6",
"connectionConfiguration": {
"replication_method": {
"replication_slot": "slot1",
"publication": "pub1"
},
"ssl": false,
"password": "**********",
"username": "postgres",
"database": "postgres",
"port": 2000,
"host": "localhost"
},
"name": "pg-source-with-cdc",
"sourceName": "Postgres"
} |
Based on the File connector I added one property common to all |
@marcosmarxm I don't follow the fix you found -- are you saying you changed something in the |
yes, I added the required field {
...
"loading_method": {
"type": "object",
"title": "Loading Method",
"description": "Loading method used to send data to Snowflake.",
"order": 7,
"oneOf": [
{
"title": "Standard Inserts",
"additionalProperties": false,
"description": "Uses <pre>INSERT</pre> statements to send batches of records to Snowflake. Easiest (no setup) but not recommended for large production workloads due to slow speed.",
"required": ["method"],
"properties": {
"method": {
"type": "string",
"enum": ["Standard"],
"default": "Standard"
}
}
},
{
"title": "AWS S3 Staging",
"additionalProperties": false,
"description": "Writes large batches of records to a file, uploads the file to S3, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.",
"required": [
"method",
"s3_bucket_name",
"access_key_id",
"secret_access_key"
],
"properties": {
"method": {
"type": "string",
"enum": ["S3"],
"default": "S3",
"order": 0
},
"s3_bucket_name": { "required": ["method"],
"properties": {
"method": {
"type": "string",
"enum": ["Standard"],
"default": "Standard"
}
"title": "S3 Bucket Name",
"type": "string",
"description": "The name of the staging S3 bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.",
"examples": ["airbyte.staging"],
"order": 1
},
"s3_bucket_region": {
"title": "S3 Bucket Region",
"type": "string",
"default": "",
"description": "The region of the S3 staging bucket to use if utilising a copy strategy.",
"enum": [
"",
"us-east-1",
"us-east-2",
"us-west-1",
"us-west-2",
"af-south-1",
"ap-east-1",
"ap-south-1",
"ap-northeast-1",
"ap-northeast-2",
"ap-northeast-3",
"ap-southeast-1",
"ap-southeast-2",
"ca-central-1",
"cn-north-1",
"cn-northwest-1",
"eu-west-1",
"eu-west-2",
"eu-west-3",
"eu-south-1",
"eu-north-1",
"sa-east-1",
"me-south-1"
],
"order": 2
},
"access_key_id": {
"type": "string",
"description": "The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.",
"title": "S3 Key Id",
"airbyte_secret": true,
"order": 3
},
"secret_access_key": {
"type": "string",
"description": "The corresponding secret to the above access key id.",
"title": "S3 Access Key",
"airbyte_secret": true,
"order": 4
}
}
},
{
"title": "GCS Staging",
"additionalProperties": false,
"description": "Writes large batches of records to a file, uploads the file to GCS, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.",
"required": ["method", "project_id", "bucket_name", "credentials_json"],
"properties": { "required": ["method"],
"properties": {
"method": {
"type": "string",
"enum": ["Standard"],
"default": "Standard"
}
"method": {
"type": "string",
"enum": ["GCS"],
"default": "GCS",
"order": 0
},
"project_id": {
"title": "GCP Project ID",
"type": "string",
"description": "The name of the GCP project ID for your credentials.",
"examples": ["my-project"],
"order": 1
},
"bucket_name": {
"title": "GCS Bucket Name",
"type": "string",
"description": "The name of the staging GCS bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.",
"examples": ["airbyte-staging"],
"order": 2
},
"credentials_json": {
"title": "Google Application Credentials",
"type": "string",
"description": "The contents of the JSON key file that has read/write permissions to the staging GCS bucket. You will separately need to grant bucket access to your Snowflake GCP service account. See the <a href=\"https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys\">GCP docs</a> for more information on how to generate a JSON key for your service account.",
"airbyte_secret": true,
"multiline": true,
"order": 3
}
}
}
]
},
"basic_normalization": {
"type": "boolean",
"default": true,
"description": "Whether or not to normalize the data in the destination. See <a href=\"https://docs.airbyte.io/architecture/basic-normalization\">basic normalization</a> for more details.",
"title": "Basic Normalization",
"examples": [true, false],
"order": 8
}
}
}
} |
@marcosmarxm this is an intended behaviour from UI point of view due to the way we handle
As path with You can also find more info in this discussion #962 |
@marcosmarxm @jamakase I'm going to try to write down my understanding of what's going on here based on reading these comments.
FixesOption 1Add a method field to snowflake like we in other places (example, example, example) so that the UI can tell these apart.
|
Expected Behavior
I expect that the connector would first drop the data ingestion to an S3 location then copy to the snowflake destination when complete.
Current Behavior
When running data ingestion (from mySQL) data is working as if it is in "Standard Inserts" configuration. Also selecting the "AWS S3 Staging" configuration doesn't seem to be saving.
More Information
AWS S3 Staging for the Snowflake Destination connector is not currently working. The proper information is added, all connection tests pass, but when running ingestion instead of the files being staged into S3 first THEN copied into snowflake, the regular "Standard Inserts" method is used instead. It seems like when selecting the "AWS S3 Staging" it just doesn't stick. If I select it, test connections (pass), go out then back into the Snowflake destination settings, the "Standard Inserts" is back as the default again.
Steps to Reproduce
Severity of the bug for you
Medium
Airbyte Version
Airbyte version: 0.24.4-alpha
airbyte/destination-snowflake:0.3.7
airbyte/source-mysql:0.3.5
The text was updated successfully, but these errors were encountered: