BigQuery: Challenged inserting data into BQ tables #4709

DazWilkin · 2018-01-05T22:10:54Z

I'm writing a script with the Python Library and need to programmatically populate tables with data. I'm reviewing the documentation but unable to get these to work:

https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#tables

python --version
Python 2.7.13

pip --version
pip 9.0.1

virtualenv --version
15.1.0

requirements.txt:

google-cloud-bigquery==0.28.0
six==1.11.0

I'm running in virtualenv.

I'm able to connect a client to a project, enumerate datasets, set dataset expiration, create/enumerate/delete tables and set table expiry. I'm unable to insert data into the tables.

1. client.insert_rows

Trying the code from the docs does not work for me:

    ROW_DATA = [
        (u'Phred Phlyntstone', 32),
        (u'Wylma Phlyntstone', 29),
    ]

    errors = client.insert_rows(table, rows_to_insert)  # API request

I get

errors = client.insert_rows(table, ROW_DATA)
AttributeError: 'Client' object has no attribute 'insert_rows'

From:

def insert_test_data_old(dataset_name, table_name):
    dataset_ref = client.dataset(dataset_name)
    table_ref = dataset_ref.table(table_name)
    table = client.get_table(table_ref)
    errors = client.insert_rows(table, ROW_DATA)

2. client.load_table_from_file

So, then I tried load_table_from_file and this works for the 1st table only. I've tried iterating outside of the job and inside the job but receive:

resumable_media/_upload.py", line 410, in _prepare_initiate_request
    raise ValueError(u'Stream must be at beginning.')
ValueError: Stream must be at beginning.

I tried:

def insert_test_data(dataset_name, table_names):
    dataset = client.dataset(dataset_name)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = "CSV"
    job_config.skip_leading_rows = 1
    for table_name in table_names:
        table = dataset.table(table_name)
        job = client.load_table_from_file(
            CSV_FILE,
            table,
            job_config=job_config
        )
    job.result()

and

def insert_test_data(dataset_name, table_name):
    dataset = client.dataset(dataset_name)
    table = dataset.table(table_name)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = "CSV"
    job_config.skip_leading_rows = 1
    job = client.load_table_from_file(
        CSV_FILE,
        table,
        job_config=job_config
    )
   job.result()

The text was updated successfully, but these errors were encountered:

dhermes · 2018-01-05T23:51:22Z

The

ValueError: Stream must be at beginning.

comes from google-resumable-media and may be caused by

job_config.skip_leading_rows = 1

@tswast Can you confirm?

As for insert_rows, that is in the not-yet-released 0.29.0 (we pushed a tag but for some reason it never triggered a build and no one followed up).

tswast · 2018-01-06T00:02:54Z

job_config.skip_leading_rows = 1 is a server-side configuration. It shouldn't affect google-resumable-media. More likely is that the file object is being reused or something and the file hasn't been seeked back to the beginning.

Note: You don't actually have to create a physical file to use a load job. You can just as easily use StringIO. See: #4553 (comment)

We have #4553 open for making this easier.

DazWilkin · 2018-01-06T02:10:24Z

Thank you!

I'll try with StringIO.

tseaver · 2018-01-08T17:52:11Z

As for insert_rows, that is in the not-yet-released 0.29.0 (we pushed a tag but for some reason it never triggered a build and no one followed up).

I just retriggered the post-tag build, which failed due to error_reporting (??)

dhermes · 2018-01-08T18:09:27Z

@tseaver The push to PyPI will only happen on a tag build, the build you linked to is a regular master (i.e. "push") build.

tseaver · 2018-01-08T18:12:58Z

@dhermes Hmmm. I don't see a tag build at all.

dhermes · 2018-01-08T18:14:39Z

That's correct

we pushed a tag but for some reason it never triggered a build and no one followed up

tseaver · 2018-01-08T18:18:04Z

I've got no clue how to force CircleCI to run the build -- delete and re-push the tag?

tseaver · 2018-01-08T18:18:23Z

Or I can just push the release to PyPI manually.

dhermes · 2018-01-08T18:18:47Z

I think a manual push to PyPI is fine.

tseaver · 2018-01-08T18:29:16Z

OK, 0.29.0 release pushed manually.

chemelnucfin added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Jan 9, 2018

chemelnucfin assigned tseaver, tswast, dhermes and chemelnucfin Jan 11, 2018

tswast closed this as completed Jan 16, 2018

theacodes unassigned tswast, dhermes and chemelnucfin Sep 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Challenged inserting data into BQ tables #4709

BigQuery: Challenged inserting data into BQ tables #4709

DazWilkin commented Jan 5, 2018

dhermes commented Jan 5, 2018

tswast commented Jan 6, 2018

DazWilkin commented Jan 6, 2018

tseaver commented Jan 8, 2018 •

edited by dhermes

Loading

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

tseaver commented Jan 8, 2018

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

BigQuery: Challenged inserting data into BQ tables #4709

BigQuery: Challenged inserting data into BQ tables #4709

Comments

DazWilkin commented Jan 5, 2018

1. client.insert_rows

2. client.load_table_from_file

dhermes commented Jan 5, 2018

tswast commented Jan 6, 2018

DazWilkin commented Jan 6, 2018

tseaver commented Jan 8, 2018 • edited by dhermes Loading

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

tseaver commented Jan 8, 2018

dhermes commented Jan 8, 2018

tseaver commented Jan 8, 2018

tseaver commented Jan 8, 2018 •

edited by dhermes

Loading