ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

londoso · 2021-11-11T04:50:40Z

Environment details

OS type and version: Windows 10 x64
Python version: 3.8.5
pip version: 20.2.4
pandas-gbq version: 0.16.0

Steps to reproduce

This code was executed in previous version of pandas-gbq (0.15.0) and was successfully executed.

Code example

import os
import pandas as pd
import pandas_gbq as gbq

table_schema = [{
    "name": "id", 
    "type": "INTEGER"
},{
    "name": "nombre", 
    "type": "STRING"
},{
    "name": "precio", 
    "type": "NUMERIC"
},{
    "name": "fecha", 
    "type": "DATE"
}]

data = pd.DataFrame({'id': [123],'nombre': ['Anderson'],'precio': [1.25],'fecha': ['2021-12-12']})

project_id = 'proyecto111'
table_name = 'prueba.clientes'

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

Stack trace

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     74     try:
---> 75         client.load_table_from_dataframe(
     76             dataframe, destination_table_ref, job_config=job_config, location=location,

~\anaconda3\lib\site-packages\google\cloud\bigquery\client.py in load_table_from_dataframe(self, dataframe, destination, num_retries, job_id, job_id_prefix, location, project, job_config, parquet_compression, timeout)
   2650 
-> 2651                     _pandas_helpers.dataframe_to_parquet(
   2652                         dataframe,

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_parquet(dataframe, bq_schema, filepath, parquet_compression, parquet_use_compliant_nested_type)
    585     bq_schema = schema._to_schema_fields(bq_schema)
--> 586     arrow_table = dataframe_to_arrow(dataframe, bq_schema)
    587     pyarrow.parquet.write_table(

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_arrow(dataframe, bq_schema)
    528         arrow_arrays.append(
--> 529             bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
    530         )

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
    289         return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290     return pyarrow.Array.from_pandas(series, type=arrow_type)
    291 

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.Array.from_pandas()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Got bytestring of length 8 (expected 16)

The above exception was the direct cause of the following exception:

ConversionError                           Traceback (most recent call last)
<ipython-input-5-49cafceeee1e> in <module>
     24 table_name = 'prueba.clientes'
     25 
---> 26 gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key)
   1093         return
   1094 
-> 1095     connector.load_data(
   1096         dataframe,
   1097         destination_table_ref,

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in load_data(self, dataframe, destination_table_ref, chunksize, schema, progress_bar, api_method)
    544 
    545         try:
--> 546             chunks = load.load_chunks(
    547                 self.client,
    548                 dataframe,

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_chunks(client, dataframe, destination_table_ref, chunksize, schema, location, api_method)
    164 ):
    165     if api_method == "load_parquet":
--> 166         load_parquet(client, dataframe, destination_table_ref, location, schema)
    167         # TODO: yield progress depending on result() with timeout
    168         return [0]

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     77         ).result()
     78     except pyarrow.lib.ArrowInvalid as exc:
---> 79         raise exceptions.ConversionError(
     80             "Could not convert DataFrame to Parquet."
     81         ) from exc

ConversionError: Could not convert DataFrame to Parquet.

The text was updated successfully, but these errors were encountered:

tswast · 2021-11-11T19:24:53Z

Thanks for the report! I wonder which data type it's struggling with? Perhaps NUMERIC? Note that floating point values aren't expected for a numeric column, as the whole point of that data type is to act similar to a decimal type in Python.

tswast · 2021-11-11T19:26:35Z

As a workaround, you can specify api_method='load_csv' to use the 0.15.0 behavior.

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema, api_method="load_csv")

tswast · 2021-11-11T20:19:51Z

There are actually two problems discovered while investigating this issue:

Can't write floating point to NUMERIC column.
Can't write string to DATE column.

londoso · 2021-11-12T17:16:45Z

Hi Tim, as you said, I'm trying to write a float into a NUMERIC bq data type. Using the argument api_method="load_csv" works fine for me.

Thank you.

tswast · 2021-11-17T22:54:34Z

Lowering the priority since there's a workaround of api_method="load_csv".

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Nov 11, 2021

tswast added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 11, 2021

tswast self-assigned this Nov 11, 2021

tswast mentioned this issue Nov 11, 2021

fix: to_gbq allows strings for DATE and floats for NUMERIC, require pandas 0.24+ and db-dtypes #423

Merged

4 tasks

yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Nov 17, 2021

tswast added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Nov 17, 2021

tswast closed this as completed in #423 Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

londoso commented Nov 11, 2021

tswast commented Nov 11, 2021

tswast commented Nov 11, 2021 •

edited

Loading

tswast commented Nov 11, 2021 •

edited

Loading

londoso commented Nov 12, 2021 •

edited

Loading

tswast commented Nov 17, 2021

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

Comments

londoso commented Nov 11, 2021

Environment details

Steps to reproduce

Code example

Stack trace

tswast commented Nov 11, 2021

tswast commented Nov 11, 2021 • edited Loading

tswast commented Nov 11, 2021 • edited Loading

londoso commented Nov 12, 2021 • edited Loading

tswast commented Nov 17, 2021

tswast commented Nov 11, 2021 •

edited

Loading

tswast commented Nov 11, 2021 •

edited

Loading

londoso commented Nov 12, 2021 •

edited

Loading