-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client.insert_rows_json()
: add option to disable best-effort deduplication
#720
Comments
I'll double check why this is the case, but at a glance this request sounds reasonable. 👍 |
I recall that you can explicitly pass in a list of It might be useful to have a more discoverable way to do this, though. |
It seems to work. However I don't see the point in allocating memory just to generate a list of |
A memory efficient alternative would be to provide a fake sequence-ish object that returns class NoneItems:
def __getitem__(self, index):
return None
>>> insert_ids = NoneItems()
>>> assert insert_ids[0] is None
>>> assert insert_ids[1] is None
>>> assert insert_ids[42] is None A bit hacky, but should be a good enough workaround until more user friendly support is added. :) |
Update: This was confirmed, we'll add support for this in a more user-friendly way. |
Amazing, thanks! |
Currently, the
Client.insert_rows_json()
method for streaming inserts always inserts aninsertId
unique identifier for each row provided.This row identifier can be user-provided; if the user doesn't provide any identifiers, the library automatically fills the row IDs by using UUID4.
Here's the code:
However, insert IDs are entirely optional, and there are actually valid use cases not to use them. From the BigQuery documentation:
The BigQuery Python client library provides no way of omitting the
insertId
s. it would be nice to have a parameter for that.The text was updated successfully, but these errors were encountered: