AWS S3 Staging for Snowflake Destinations not working correctly #3935

tylerdelange · 2021-06-07T22:30:16Z

Expected Behavior

I expect that the connector would first drop the data ingestion to an S3 location then copy to the snowflake destination when complete.

Current Behavior

When running data ingestion (from mySQL) data is working as if it is in "Standard Inserts" configuration. Also selecting the "AWS S3 Staging" configuration doesn't seem to be saving.

More Information

AWS S3 Staging for the Snowflake Destination connector is not currently working. The proper information is added, all connection tests pass, but when running ingestion instead of the files being staged into S3 first THEN copied into snowflake, the regular "Standard Inserts" method is used instead. It seems like when selecting the "AWS S3 Staging" it just doesn't stick. If I select it, test connections (pass), go out then back into the Snowflake destination settings, the "Standard Inserts" is back as the default again.

Steps to Reproduce

Set Up Snowflake Destination
Select AWS S3 Staging
Save configuration and exit
Go back into Snowflake Destination Settings
Notice that "AWS S# Staging has reverted BACK to "Standard Inserts"

Severity of the bug for you

Medium

Airbyte Version

Airbyte version: 0.24.4-alpha
airbyte/destination-snowflake:0.3.7
airbyte/source-mysql:0.3.5

marcosmarxm · 2021-06-08T00:55:08Z

@sherifnada this is not a problem with the connector. I set up a postgres source with CDC and the error happen there too.
@tylerdelange source/destination is created correctly but UI cannot load the component correctly.
@jamakase fyi

marcosmarxm · 2021-06-08T00:58:26Z

Setting new source with CDC

Visiting the setting page after setting the source

curl -X POST "http://localhost:8001/api/v1/sources/get" \
 -H "Accept: application/json" \
 -H "Content-Type: application/json" \
 -d '{"sourceId":"353715d2-f303-402f-b572-9b2e47e7bca7"}'

{
  "sourceDefinitionId": "decd338e-5647-4c0b-adf4-da0e75f5a750",
  "sourceId": "353715d2-f303-402f-b572-9b2e47e7bca7",
  "workspaceId": "5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6",
  "connectionConfiguration": {
    "replication_method": {
      "replication_slot": "slot1",
      "publication": "pub1"
    },
    "ssl": false,
    "password": "**********",
    "username": "postgres",
    "database": "postgres",
    "port": 2000,
    "host": "localhost"
  },
  "name": "pg-source-with-cdc",
  "sourceName": "Postgres"
}

marcosmarxm · 2021-06-08T01:02:54Z

Based on the File connector I added one property common to all oneOf fields called method. This solves the problem, but I don't know if this is the best way to solve it. @sherifnada wdyt?

sherifnada · 2021-06-08T04:50:02Z

@marcosmarxm I don't follow the fix you found -- are you saying you changed something in the spec.json that made this work?

marcosmarxm · 2021-06-08T11:01:35Z

yes, I added the required field method to each one.

{
...
      "loading_method": {
        "type": "object",
        "title": "Loading Method",
        "description": "Loading method used to send data to Snowflake.",
        "order": 7,
        "oneOf": [
          {
            "title": "Standard Inserts",
            "additionalProperties": false,
            "description": "Uses <pre>INSERT</pre> statements to send batches of records to Snowflake. Easiest (no setup) but not recommended for large production workloads due to slow speed.",
            "required": ["method"],
            "properties": {
              "method": {
                "type": "string",
                "enum": ["Standard"],
                "default": "Standard"
              }
            }
          },
          {
            "title": "AWS S3 Staging",
            "additionalProperties": false,
            "description": "Writes large batches of records to a file, uploads the file to S3, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.",
            "required": [
              "method",
              "s3_bucket_name",
              "access_key_id",
              "secret_access_key"
            ],
            "properties": {
              "method": {
                "type": "string",
                "enum": ["S3"],
                "default": "S3",
                "order": 0
              },
              "s3_bucket_name": { "required": ["method"],
            "properties": {
              "method": {
                "type": "string",
                "enum": ["Standard"],
                "default": "Standard"
              }
                "title": "S3 Bucket Name",
                "type": "string",
                "description": "The name of the staging S3 bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.",
                "examples": ["airbyte.staging"],
                "order": 1
              },
              "s3_bucket_region": {
                "title": "S3 Bucket Region",
                "type": "string",
                "default": "",
                "description": "The region of the S3 staging bucket to use if utilising a copy strategy.",
                "enum": [
                  "",
                  "us-east-1",
                  "us-east-2",
                  "us-west-1",
                  "us-west-2",
                  "af-south-1",
                  "ap-east-1",
                  "ap-south-1",
                  "ap-northeast-1",
                  "ap-northeast-2",
                  "ap-northeast-3",
                  "ap-southeast-1",
                  "ap-southeast-2",
                  "ca-central-1",
                  "cn-north-1",
                  "cn-northwest-1",
                  "eu-west-1",
                  "eu-west-2",
                  "eu-west-3",
                  "eu-south-1",
                  "eu-north-1",
                  "sa-east-1",
                  "me-south-1"
                ],
                "order": 2
              },
              "access_key_id": {
                "type": "string",
                "description": "The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.",
                "title": "S3 Key Id",
                "airbyte_secret": true,
                "order": 3
              },
              "secret_access_key": {
                "type": "string",
                "description": "The corresponding secret to the above access key id.",
                "title": "S3 Access Key",
                "airbyte_secret": true,
                "order": 4
              }
            }
          },
          {
            "title": "GCS Staging",
            "additionalProperties": false,
            "description": "Writes large batches of records to a file, uploads the file to GCS, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.",
            "required": ["method", "project_id", "bucket_name", "credentials_json"],
            "properties": { "required": ["method"],
            "properties": {
              "method": {
                "type": "string",
                "enum": ["Standard"],
                "default": "Standard"
              }
              "method": {
                "type": "string",
                "enum": ["GCS"],
                "default": "GCS",
                "order": 0
              },
              "project_id": {
                "title": "GCP Project ID",
                "type": "string",
                "description": "The name of the GCP project ID for your credentials.",
                "examples": ["my-project"],
                "order": 1
              },
              "bucket_name": {
                "title": "GCS Bucket Name",
                "type": "string",
                "description": "The name of the staging GCS bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.",
                "examples": ["airbyte-staging"],
                "order": 2
              },
              "credentials_json": {
                "title": "Google Application Credentials",
                "type": "string",
                "description": "The contents of the JSON key file that has read/write permissions to the staging GCS bucket. You will separately need to grant bucket access to your Snowflake GCP service account. See the <a href=\"https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys\">GCP docs</a> for more information on how to generate a JSON key for your service account.",
                "airbyte_secret": true,
                "multiline": true,
                "order": 3
              }
            }
          }
        ]
      },
      "basic_normalization": {
        "type": "boolean",
        "default": true,
        "description": "Whether or not to normalize the data in the destination. See <a href=\"https://docs.airbyte.io/architecture/basic-normalization\">basic normalization</a> for more details.",
        "title": "Basic Normalization",
        "examples": [true, false],
        "order": 8
      }
    }
  }
}

jamakase · 2021-06-08T11:20:26Z

@marcosmarxm this is an intended behaviour from UI point of view due to the way we handle oneOf. As the backend doesn't return which of the options we selected, we have to guess on frontend the selected path. We start traversing all the json_schemas of oneOf until we find the one matching.

              case "formGroup":
                const selectedValues = get(formValues, subConditionItems.path);

                const subPathSchema = buildYupFormForJsonSchema({
                  type: "object",
                  ...subConditionItems.jsonSchema,
                });

                if (subPathSchema.isValidSync(selectedValues)) {
                  return key;
                }

As path with Standart replication is actually just an empty object with no properties - it matches yup validator. Probably we should add noUnknownFields check for validation, so it won't match empty object.

You can also find more info in this discussion #962

cgardens · 2021-06-08T16:42:20Z

@marcosmarxm @jamakase I'm going to try to write down my understanding of what's going on here based on reading these comments.

If a user configures snowflake to use S3 upload and saves that configuration, it works successfully.
If they then reload the page, the UI will make it look like that connector is configured for Standard (not S3 upload). The UI in this case is just not reflecting the truth. The connector is configured as expected, but the UI makes it look like it is not.

Fixes

Option 1

Add a method field to snowflake like we in other places (example, example, example) so that the UI can tell these apart.

Option 2

~~Set additionalProperties: false for the standard section~~

Option 2

On the frontend, the yum check should not match if there are any unknownfields. @jamakase is this easy to do?

Is that all true? Or am I missing something? Assuming that this is correct, @marcosmarxm can you do option 1 and @jamakase can you hit option 2? Seems like both things make sense to knock out.

As a follow up, it seems related to #3385. I will bump the priority of resolving this issue would should get at the root cause. In the meanwhile we should figure out how to fix this instance (and identify any others that might have this problem.

tylerdelange added the type/bug Something isn't working label Jun 7, 2021

sherifnada added the area/connectors Connector related issues label Jun 7, 2021

marcosmarxm added area/frontend and removed area/connectors Connector related issues labels Jun 8, 2021

cgardens added the priority/high High priority label Jun 8, 2021

cgardens assigned marcosmarxm Jun 8, 2021

marcosmarxm mentioned this issue Jun 8, 2021

🐛 Add method field on spec.json connectors (snowflake and postgres) #3960

Merged

marcosmarxm closed this as completed in #3960 Jun 8, 2021

igrankova added connectors/destinations-files connectors/destination/s3 labels Jan 17, 2022

karinakuz added connectors/destinations-warehouse connectors/destination/snowflake labels Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS S3 Staging for Snowflake Destinations not working correctly #3935

AWS S3 Staging for Snowflake Destinations not working correctly #3935

tylerdelange commented Jun 7, 2021

marcosmarxm commented Jun 8, 2021

marcosmarxm commented Jun 8, 2021

marcosmarxm commented Jun 8, 2021 •

edited

Loading

sherifnada commented Jun 8, 2021

marcosmarxm commented Jun 8, 2021 •

edited

Loading

jamakase commented Jun 8, 2021 •

edited

Loading

cgardens commented Jun 8, 2021 •

edited

Loading

AWS S3 Staging for Snowflake Destinations not working correctly #3935

AWS S3 Staging for Snowflake Destinations not working correctly #3935

Comments

tylerdelange commented Jun 7, 2021

Expected Behavior

Current Behavior

More Information

Steps to Reproduce

Severity of the bug for you

Airbyte Version

marcosmarxm commented Jun 8, 2021

marcosmarxm commented Jun 8, 2021

Setting new source with CDC

Visiting the setting page after setting the source

marcosmarxm commented Jun 8, 2021 • edited Loading

sherifnada commented Jun 8, 2021

marcosmarxm commented Jun 8, 2021 • edited Loading

jamakase commented Jun 8, 2021 • edited Loading

cgardens commented Jun 8, 2021 • edited Loading

Fixes

Option 1

Option 2

Option 2

marcosmarxm commented Jun 8, 2021 •

edited

Loading

marcosmarxm commented Jun 8, 2021 •

edited

Loading

jamakase commented Jun 8, 2021 •

edited

Loading

cgardens commented Jun 8, 2021 •

edited

Loading