Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in pandas.io.gbq.read_gbq when no DataFrame should be returned #45

Closed
madhav-datt opened this issue Jun 1, 2017 · 4 comments
Closed

Comments

@madhav-datt
Copy link

Problem description

I am trying to use the read_gbq function from pandas.io.gbq as a means to execute INSERT or UPDATE queries on BigQuery. These queries do not return anything and so there is nothing for pandas to retrieve and return to my program, and so my expected output for something like this would be that no dataframe is returned (ie. it should ideally return None), instead of throwing a KeyError despite the fact that the INSERT or UPDATE statement executed successfully.

Code Sample

from pandas.io import gbq

insert_or_update_statement = r"""
INSERT INTO `MyBigQueryTable` (ColumnA, ColumnB, ColumnC) (
    SELECT ColumnA, ColumnB, ColumnC
    FROM `AnotherBigQueryTable`
    WHERE ColumnD = 4
        AND ColumnC = ColumnE + 1
)"""
gbq.read_gbq(insert_or_update_statement, project_id='my-project', dialect='standard')

Actual vs Expected Output

Expected that None would be returned for INSERT or UPDATE statements that don't have any results to retrieve. Actual output is as follows:

    453             self._print('Retrieving results...')
    454 
--> 455         total_rows = int(query_reply['totalRows'])
    456         result_pages = list()
    457         seen_page_tokens = list()

KeyError: 'totalRows'

Current Workaround

from pandas.io import gbq

# The statement executes successfully even though a KeyError is raised.
try:
    gbq.read_gbq(insert_or_update_statement, project_id='my-project', dialect='standard')
except KeyError:
    pass

Proposed Solution

This could be fixed in a fairly simple manner by catching the KeyError right there, skipping the row retrieval parts, and returning None. I would be happy to create a pull request to resolve this.

@jreback
Copy link
Contributor

jreback commented Jun 1, 2017

I suppose this would be ok. most sql interfaces expose something like .execute_sql() to do this (which might be ok/better).

cc @parthea

@max-sixty
Copy link
Contributor

Though why use pandas if you only want to send a query? Why not use the bigquery python library?

@madhav-datt
Copy link
Author

@MaximilianR That is definitely a very good point. I am working in a development environment where it is much harder to set-up and use the BigQuery python library directly, but I also feel that it is a good idea to be able to execute insert and update statements from pandas.

@jreback Thanks so much for the help! Please do let me know what would be the best way to go ahead on this.

@max-sixty
Copy link
Contributor

max-sixty commented Aug 21, 2018

I tested this locally; it does work, though with the issue at #102

Closing this in favor of #102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants