Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Challenged inserting data into BQ tables #4709

Closed
DazWilkin opened this issue Jan 5, 2018 · 11 comments
Closed

BigQuery: Challenged inserting data into BQ tables #4709

DazWilkin opened this issue Jan 5, 2018 · 11 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.

Comments

@DazWilkin
Copy link

I'm writing a script with the Python Library and need to programmatically populate tables with data. I'm reviewing the documentation but unable to get these to work:

https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#tables

python --version
Python 2.7.13

pip --version
pip 9.0.1

virtualenv --version
15.1.0

requirements.txt:

google-cloud-bigquery==0.28.0
six==1.11.0

I'm running in virtualenv.

I'm able to connect a client to a project, enumerate datasets, set dataset expiration, create/enumerate/delete tables and set table expiry. I'm unable to insert data into the tables.

1. client.insert_rows

Trying the code from the docs does not work for me:

    ROW_DATA = [
        (u'Phred Phlyntstone', 32),
        (u'Wylma Phlyntstone', 29),
    ]

    errors = client.insert_rows(table, rows_to_insert)  # API request

I get

errors = client.insert_rows(table, ROW_DATA)
AttributeError: 'Client' object has no attribute 'insert_rows'

From:

def insert_test_data_old(dataset_name, table_name):
    dataset_ref = client.dataset(dataset_name)
    table_ref = dataset_ref.table(table_name)
    table = client.get_table(table_ref)
    errors = client.insert_rows(table, ROW_DATA)

2. client.load_table_from_file

So, then I tried load_table_from_file and this works for the 1st table only. I've tried iterating outside of the job and inside the job but receive:

resumable_media/_upload.py", line 410, in _prepare_initiate_request
    raise ValueError(u'Stream must be at beginning.')
ValueError: Stream must be at beginning.

I tried:

def insert_test_data(dataset_name, table_names):
    dataset = client.dataset(dataset_name)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = "CSV"
    job_config.skip_leading_rows = 1
    for table_name in table_names:
        table = dataset.table(table_name)
        job = client.load_table_from_file(
            CSV_FILE,
            table,
            job_config=job_config
        )
    job.result()

and

def insert_test_data(dataset_name, table_name):
    dataset = client.dataset(dataset_name)
    table = dataset.table(table_name)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = "CSV"
    job_config.skip_leading_rows = 1
    job = client.load_table_from_file(
        CSV_FILE,
        table,
        job_config=job_config
    )
   job.result()
@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2018

The

ValueError: Stream must be at beginning.

comes from google-resumable-media and may be caused by

job_config.skip_leading_rows = 1

@tswast Can you confirm?

As for insert_rows, that is in the not-yet-released 0.29.0 (we pushed a tag but for some reason it never triggered a build and no one followed up).

@tswast
Copy link
Contributor

tswast commented Jan 6, 2018

job_config.skip_leading_rows = 1 is a server-side configuration. It shouldn't affect google-resumable-media. More likely is that the file object is being reused or something and the file hasn't been seeked back to the beginning.

Note: You don't actually have to create a physical file to use a load job. You can just as easily use StringIO. See: #4553 (comment)

We have #4553 open for making this easier.

@DazWilkin
Copy link
Author

Thank you!

I'll try with StringIO.

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

As for insert_rows, that is in the not-yet-released 0.29.0 (we pushed a tag but for some reason it never triggered a build and no one followed up).

I just retriggered the post-tag build, which failed due to error_reporting (??)

@dhermes
Copy link
Contributor

dhermes commented Jan 8, 2018

@tseaver The push to PyPI will only happen on a tag build, the build you linked to is a regular master (i.e. "push") build.

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

@dhermes Hmmm. I don't see a tag build at all.

@dhermes
Copy link
Contributor

dhermes commented Jan 8, 2018

That's correct

we pushed a tag but for some reason it never triggered a build and no one followed up

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

I've got no clue how to force CircleCI to run the build -- delete and re-push the tag?

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

Or I can just push the release to PyPI manually.

@dhermes
Copy link
Contributor

dhermes commented Jan 8, 2018

I think a manual push to PyPI is fine.

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

OK, 0.29.0 release pushed manually.

@chemelnucfin chemelnucfin added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Jan 9, 2018
@tswast tswast closed this as completed Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants