Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job #4553

nanodan · 2017-12-07T21:43:20Z

BigQuery API
Linux, Ubuntu 16.04.3 LTS running Google Cloud Datalab 1.0.1
Python 2.7.12
Google cloud v 0.31.0
//
Create a dataframe or dict list and try to send to an existing table. The first request will raise no errors but nothing will be appended to the table. Running the same command again and rows are added.

This will not add rows:

from google.cloud import bigquery as bq
import time

client = bq.Client(project='project-name')
dataset = bq.DatasetReference('project-name', 'dataset-name')
tableref = bq.table.TableReference(dataset, 'users')
schema = [bq.SchemaField('email', 'STRING'), bq.SchemaField('id', 'STRING'), bq.SchemaField('added', 'TIMESTAMP')]
table = bq.Table(tableref, schema=schema)
create = client.create_table(table)
public_members = [{'email': '[email protected]', 'id': '1234', 'added': '2017-12-01 13:13:13 UTC'}]
client.create_rows_json(table, public_members)

This will add exactly one row:

from google.cloud import bigquery as bq
import time

client = bq.Client(project='project-name')
dataset = bq.DatasetReference('project-name', 'dataset-name')
tableref = bq.table.TableReference(dataset, 'users')
schema = [bq.SchemaField('email', 'STRING'), bq.SchemaField('id', 'STRING'), bq.SchemaField('added', 'TIMESTAMP')]
table = bq.Table(tableref, schema=schema)
create = client.create_table(table)
public_members = [{'email': '[email protected]', 'id': '1234', 'added': '2017-12-01 13:13:13 UTC'}]
client.create_rows_json(table, public_members)
time.sleep(15)
client.create_rows_json(table, public_members)

The text was updated successfully, but these errors were encountered:

theacodes · 2017-12-07T21:47:20Z

@tswast any ideas on this one?

tswast · 2017-12-07T22:11:34Z

I believe this is a backend issue with the streaming buffer taking a little time to get created beyond that of the table create call.

According to https://stackoverflow.com/a/41446002/101923

If you delete or create a table, you must wait a least 2 minutes to start streaming data on it.

This may be due to the fact that the streaming buffer has a cache of table information, which gets refreshed every 60 seconds.

Closing since this isn't a client library issue.

theacodes · 2017-12-07T22:16:01Z

@tswast is the recommend alternative to just create tables ahead of time?

tswast · 2017-12-07T22:21:12Z

If you know what rows you want to insert at table creation time, I recommend using the Client.load_table_from_file() method to insert the rows using a StringIO object as the file.

For example:

from google.cloud.bigquery import LoadJobConfig
from six import StringIO

destination_table = client.dataset(dataset_id).table(table_id)
job_config = LoadJobConfig()
job_config.write_disposition = 'WRITE_APPEND'
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
rows = []

for row in maybe_a_dataframe:
    row_json = row.to_json(force_ascii=False, date_unit='s', date_format='iso')
    rows.append(row_json)

body = StringIO('{}\n'.format('\n'.join(rows)))

client.load_table_from_file(
    body,
    destination_table,
    job_config=job_config).result()

tswast · 2017-12-07T22:22:12Z

Reopening because we could probably help out in this case by having something like create_rows but that creates the table for you like this.

tswast · 2017-12-07T22:25:32Z

If we do implement such a method, we should probably encode as Avro, since Avro is supposedly 10x faster to load than JSON or CSV.

nanodan · 2017-12-07T23:16:51Z

So I dropped the immediate push after create step from this to see if the "error" would replicate after a wait like suggested.

I created the table (and verified that it was indeed created), and set a timer to sleep for 4 minutes. After which I sent the request and waited another 4 minutes before checking the table; there was a row in the streaming buffer - so it seems like you are correct in that this is not a bug.

This would be a nice feature though as I sometimes have to programmatically create tables based on large JSON files with things like user lists on the fly, but now that I know I have to wait I can simply add a delay in the code.

tswast · 2017-12-07T23:24:28Z

It's a little clunky but if you use client.load_table_from_file() with a StringIO object, you can avoid having to wait.

tseaver · 2018-06-19T18:07:09Z

@tswast Does this issue need to remain open?

tswast · 2018-06-19T18:16:52Z

Yeah, we haven't completed this feature request. It is on @alixhami's list of OKRs to tackle.

tswast · 2018-06-19T18:37:58Z

Actually, @alixhami implemented load_table_from_dataframe() which covers a similar use case, but uploads a Pandas DataFrame rather than JSON or JSON-like rows.

tseaver · 2018-10-11T16:50:30Z

@tswast Should this item remain open here, or are we tracking it somewhere in a feature backlog?

tswast · 2018-10-11T19:44:52Z

@tseaver We can add to a feature request backlog.

plamut · 2019-08-15T09:24:34Z

Posting for better visibility - when this is implemented, use it instead of _add_rows(), a similar helper method in system tests - #8992 (comment)

smarquezs · 2019-12-11T21:35:53Z

If you know what rows you want to insert at table creation time, I recommend using the Client.load_table_from_file() method to insert the rows using a StringIO object as the file.

For example:

from google.cloud.bigquery import LoadJobConfig
from six import StringIO

destination_table = client.dataset(dataset_id).table(table_id)
job_config = LoadJobConfig()
job_config.write_disposition = 'WRITE_APPEND'
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
rows = []

for row in maybe_a_dataframe:
    row_json = row.to_json(force_ascii=False, date_unit='s', date_format='iso')
    rows.append(row_json)

body = StringIO('{}\n'.format('\n'.join(rows)))

client.load_table_from_file(
    body,
    destination_table,
    job_config=job_config).result()

It worked fine for me, thanks :)

theacodes assigned tswast Dec 7, 2017

theacodes added api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 7, 2017

tswast closed this as completed Dec 7, 2017

tswast reopened this Dec 7, 2017

tswast changed the title ~~create_rows and create_rows_json only work if I send the request twice with a delay~~ Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job Dec 7, 2017

tswast mentioned this issue Dec 12, 2017

google.cloud.bigquery.table.Table.insert_data fail silently after deleting and recreating table #3822

Closed

tswast mentioned this issue Jan 6, 2018

BigQuery: Challenged inserting data into BQ tables #4709

Closed

tswast mentioned this issue Mar 13, 2018

document pandas-gbq vision and roadmap googleapis/python-bigquery-pandas#149

Closed

JustinBeckwith added the 🚨 This issue needs some love. label Jun 8, 2018

tswast assigned alixhami Jun 19, 2018

tseaver removed 🚨 This issue needs some love. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 20, 2018

theacodes unassigned tswast Sep 28, 2018

theacodes unassigned alixhami Sep 28, 2018

tswast mentioned this issue Aug 14, 2019

BigQuery Storage: Add more in-depth system tests #8992

Merged

2 tasks

plamut self-assigned this Aug 19, 2019

This was referenced Aug 21, 2019

BigQuery: Add load_table_from_json() method to BQ client #9070

Closed

BigQuery: Add load_table_from_json() method to BQ client #9076

Merged

plamut closed this as completed in #9076 Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job #4553

Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job #4553

nanodan commented Dec 7, 2017

theacodes commented Dec 7, 2017

tswast commented Dec 7, 2017 •

edited

Loading

theacodes commented Dec 7, 2017

tswast commented Dec 7, 2017 •

edited

Loading

tswast commented Dec 7, 2017

tswast commented Dec 7, 2017 •

edited

Loading

nanodan commented Dec 7, 2017 •

edited

Loading

tswast commented Dec 7, 2017

tseaver commented Jun 19, 2018

tswast commented Jun 19, 2018

tswast commented Jun 19, 2018

tseaver commented Oct 11, 2018

tswast commented Oct 11, 2018

plamut commented Aug 15, 2019 •

edited

Loading

smarquezs commented Dec 11, 2019

Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job #4553

Add functions equivalent to create_rows and create_rows_json that create a table for you using a load job #4553

Comments

nanodan commented Dec 7, 2017

theacodes commented Dec 7, 2017

tswast commented Dec 7, 2017 • edited Loading

theacodes commented Dec 7, 2017

tswast commented Dec 7, 2017 • edited Loading

tswast commented Dec 7, 2017

tswast commented Dec 7, 2017 • edited Loading

nanodan commented Dec 7, 2017 • edited Loading

tswast commented Dec 7, 2017

tseaver commented Jun 19, 2018

tswast commented Jun 19, 2018

tswast commented Jun 19, 2018

tseaver commented Oct 11, 2018

tswast commented Oct 11, 2018

plamut commented Aug 15, 2019 • edited Loading

smarquezs commented Dec 11, 2019

tswast commented Dec 7, 2017 •

edited

Loading

tswast commented Dec 7, 2017 •

edited

Loading

tswast commented Dec 7, 2017 •

edited

Loading

nanodan commented Dec 7, 2017 •

edited

Loading

plamut commented Aug 15, 2019 •

edited

Loading