-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination BigQuery (normalization): Investigate "pickling client objects is not supported" error #17327
Comments
Hi, I am getting this exact error while trying to connect hubspot to big query for the first time. Specifically, I get errors that look like this for any tables that I try to transfer, whether contacts, campaign, e-mail, etc.
When we sync with raw json instead of tabular normalization, the records do transfer to big query as json entries in the resultant table. I am new to Airbyte so any suggestions in the right direction are welcome. |
can you post the full log file from the sync? I was looking into this recently and it looked like the source actually failed, which led to some |
HI @edgao please find attached the error log. huspot_bq_errorlog.txt |
interesting - so it sounds like e.g. That's certainly different behavior from what I ran into. Doesn't seem like a dataset ID mismatch either (destination and normalization both referenced the |
We're getting the same error on two very different connections: PostgreSQL CDC -> BigQuery and Stripe -> BigQuery. What's interesting, identical connections run just fine when not using the GCS staging area. Previously we haven't used the staging area at all, so it might be related to the staging bucket or service account configuration. While I won't be comfortable sharing the entire log, I could share parts of it that are relevant to investigation is needed. For example, I've seen this happening a lot in the log (but not with all streams):
|
^ Update, it was indeed a misconfiguration. There were two service accounts involved:
The destination was configured to use the default account (no JSON uploaded) and the HMAC key of the bucket account. I figure this means the data could be written to the bucket, but couldn't be loaded to BigQuery. I fixed this by adding the bucket account JSON to the destination and updating the permissions of that account to also have access to BigQuery. So, in this case the documentation for the connector was a bit ambiguous and should probably be more explicit on generating HMAC in different scenarios. My gut feeling was not to add more permissions to the default account, so I created a new one. |
@killthekitten nice debugging, and thanks for the detailed report! There's also got an issue open to improve just to be super explicit - once you solved the permissions problems, your syncs were able to run successfully? |
@edgao that's right, the sync completed correctly and the errors are not present in the log. Thanks for referring the check discussion, that one actually helped me diagnose the issue, I just forgot to mention it. |
got it. I think the general statement is that the service account needs to have both bigquery write and GCS read permissions - if GCP didn't require GCS read permissions, then malicious actors could circumvent GCS permissioning by running a bigquery load operation. I'll update our docs to make that more explicit. |
@edgao agreed! The confusing part is that service accounts are mentioned in multiple places, in different context. In the pre-requisites section:
In the storage bucket guide:
And in the connector setup guide:
The bit from the storage bucket guide is especially confusing (and probably dangerous), because when I followed the Create HMAC key and access ID guide, I've actually created a second service account at the end of the GCS interoperability settings flow. It didn't occur to me that I could specify the service account already used by airbyte (we run our instance on GCP): I'd say the doc needs a bit of restructuring and deserves a separate section talking about service accounts, and their possible combinations. The reader might have either one or two service accounts configured, and whether they've attached the JSON in the destination settings or not, would define how this combination behaves. |
@edgao I did create Dataset ID, but not tables. I assumed that airbyte would create the required tables from the json files. Is this assumption wrong? |
Just wanted to add another data point here. Was working with a user that ran into this issue as well. Their logs are attached. Logs: |
@edgao I was able to resolve my issue with the tabular normalization today. I am not sure what really changed, but I created a new destination connection to BigQuery and made sure that my destination during setup in BQ and in Airbyte were the same (for example both have location as us central 1) just picking US in Airbyte and is going to throw a destination error. Also, for the GCS Bucket name and Bucket path was confusing for me in the beginning , but I ended up not creating a folder in the bucket, thus the name of the bucket and the bucket path are the same this time around. Finally I got a new service account key json for the bucket project I created. I had done all these in the past so I cannot point out specifically that this was what solved the issue. I started out with just 2 streams (raw json) and when that sync succeeded, I then tried the normalization and it worked. If I figure out what caused the sync to work this time around I will be sure to return and share my findings. Thank you @edgao for taking time to read the error log. Regards, |
I get the same error "Pickling client objects is explicitly not supported.", moving data from Postgresql to BigQuery in Standard Mode. |
+1 |
Using |
we've released bigquery under destinations v2, which replaces normalization. DV2 doesn't use dbt under the hood, so we'll never see this error again. Closing this issue. |
@edgao I'm on the latest BigQuery Destination Connector (2.3.16) and still get the "pickling client objects is not supported" error on quite a few of my source connectors... Is it still in progress of being released, or does it only work with some sources? (I'm using Airbyte Open Source btw) |
that's surprising - normalization shouldn't launch at all with bigquery 2.x. What version of platform are you using? DV2 destinations require platform version 0.50.24+ |
I just upgraded to 0.50.34 |
@edgao Here are some example logs from Postgres to BigQuery, both on the latest versions of their connectors too: |
I realize now that the reason this was failing is because the "Normalized Tabular Data" option was checked. Switching to "Raw data (JSON)" fixes the issue AND results in normalized tables as needed as long as I delete the schema and do a hard reset (even though the settings are a bit confusing). So, user error on my part. |
interesting - I'd thought we turned off the ability to use normalization with bigquery 2.x. Glad you figured it out, but I've got some more investigation to do now :) (so... user error sort of, but ideally we wouldn't give you the option at all) |
see https://github.com/airbytehq/oncall/issues/417
Normalization fails with this error:
Enabling GCS staging on the bigquery destination config apparently helps, but it's unclear why that would change anything. Possibly actually a permissions problem, since they also switched the dataset ID.
my original theorizing: (https://github.com/airbytehq/oncall/issues/417#issuecomment-1226588642)
The text was updated successfully, but these errors were encountered: