Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert to Partition #2729

Closed
stealthcode opened this issue Nov 12, 2016 · 2 comments
Closed

Insert to Partition #2729

stealthcode opened this issue Nov 12, 2016 · 2 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@stealthcode
Copy link

Could you add a parameter to the Table.insert_data method that accepts a partition label?

I'm trying to write a Python app that is able to copy data from one partition in a table to another partition. I have a data field timestamp in my Bigquery schema and I want to query off of this value. However the partition does not always align with values in this field. Some entries with _PARTITIONTIME 2016-11-11 actually have a timestamp 2016-11-10. This is usually caused by application delay, latency, or Bigquery outages such as the one on Tuesday.

Using the bq shell api I am able to use table decorators to target the partition directly. In the below shell example I am able to target a partition using --destination_table and the query can limit it's select using a similar partition decorator.

bq --project foo-dev query --allow_large_results --destination_table 'my_ds.table_name$20161110' --noflatten_results --append_table 'SELECT * from [my_ds.table_name$20161111] WHERE timestamp BETWEEN TIMESTAMP(\'2016-11-10\') AND TIMESTAMP(\'2016-11-11\')'

However in this api it's strange that I have to create 2 table instances, one to address the table (unpartitioned) and one to insert to the table. See below.

table = dataset.table(TABLE_NAME)
assert not table.exists()
table.create(client=client, dataset=dataset)
table.insert_data(rows, client=client) # will only insert to the current partition
partition = dataset.table(dataset + '.' + TABLE_NAME + '$20161111')
partition.insert_data(rows=rows, client=client) # should insert to the 20161111 partition

The Table.insert_data source code seems to just pass the table name to the underlying REST api. So assuming the REST api will accept and use the table partition decorator then the code above should insert as I expect. The api would be easier to use for my use case if it handled a partition label if one is provided at the time of insert.

@dhermes dhermes added the api: bigquery Issues related to the BigQuery API. label Nov 12, 2016
@viktort
Copy link

viktort commented Mar 5, 2017

Where is this at? Sounds like an awesome addition to the python sdk

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer
Copy link
Contributor

Hello,
One of the challenges of maintaining a large open source project is that sometimes, you can bite off more than you can chew. As the lead maintainer of google-cloud-python, I can definitely say that I have let the issues here pile up.

As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates.

My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

4 participants