Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

Closed
londoso opened this issue Nov 11, 2021 · 5 comments · Fixed by #423
Closed

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0 #421

londoso opened this issue Nov 11, 2021 · 5 comments · Fixed by #423
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@londoso
Copy link

londoso commented Nov 11, 2021

Environment details

  • OS type and version: Windows 10 x64
  • Python version: 3.8.5
  • pip version: 20.2.4
  • pandas-gbq version: 0.16.0

Steps to reproduce

This code was executed in previous version of pandas-gbq (0.15.0) and was successfully executed.

Code example

import os
import pandas as pd
import pandas_gbq as gbq

table_schema = [{
    "name": "id", 
    "type": "INTEGER"
},{
    "name": "nombre", 
    "type": "STRING"
},{
    "name": "precio", 
    "type": "NUMERIC"
},{
    "name": "fecha", 
    "type": "DATE"
}]

data = pd.DataFrame({'id': [123],'nombre': ['Anderson'],'precio': [1.25],'fecha': ['2021-12-12']})

project_id = 'proyecto111'
table_name = 'prueba.clientes'

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

Stack trace

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     74     try:
---> 75         client.load_table_from_dataframe(
     76             dataframe, destination_table_ref, job_config=job_config, location=location,

~\anaconda3\lib\site-packages\google\cloud\bigquery\client.py in load_table_from_dataframe(self, dataframe, destination, num_retries, job_id, job_id_prefix, location, project, job_config, parquet_compression, timeout)
   2650 
-> 2651                     _pandas_helpers.dataframe_to_parquet(
   2652                         dataframe,

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_parquet(dataframe, bq_schema, filepath, parquet_compression, parquet_use_compliant_nested_type)
    585     bq_schema = schema._to_schema_fields(bq_schema)
--> 586     arrow_table = dataframe_to_arrow(dataframe, bq_schema)
    587     pyarrow.parquet.write_table(

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_arrow(dataframe, bq_schema)
    528         arrow_arrays.append(
--> 529             bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
    530         )

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
    289         return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290     return pyarrow.Array.from_pandas(series, type=arrow_type)
    291 

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.Array.from_pandas()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Got bytestring of length 8 (expected 16)

The above exception was the direct cause of the following exception:

ConversionError                           Traceback (most recent call last)
<ipython-input-5-49cafceeee1e> in <module>
     24 table_name = 'prueba.clientes'
     25 
---> 26 gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key)
   1093         return
   1094 
-> 1095     connector.load_data(
   1096         dataframe,
   1097         destination_table_ref,

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in load_data(self, dataframe, destination_table_ref, chunksize, schema, progress_bar, api_method)
    544 
    545         try:
--> 546             chunks = load.load_chunks(
    547                 self.client,
    548                 dataframe,

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_chunks(client, dataframe, destination_table_ref, chunksize, schema, location, api_method)
    164 ):
    165     if api_method == "load_parquet":
--> 166         load_parquet(client, dataframe, destination_table_ref, location, schema)
    167         # TODO: yield progress depending on result() with timeout
    168         return [0]

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     77         ).result()
     78     except pyarrow.lib.ArrowInvalid as exc:
---> 79         raise exceptions.ConversionError(
     80             "Could not convert DataFrame to Parquet."
     81         ) from exc

ConversionError: Could not convert DataFrame to Parquet.
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Nov 11, 2021
@tswast tswast added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 11, 2021
@tswast tswast self-assigned this Nov 11, 2021
@tswast
Copy link
Collaborator

tswast commented Nov 11, 2021

Thanks for the report! I wonder which data type it's struggling with? Perhaps NUMERIC? Note that floating point values aren't expected for a numeric column, as the whole point of that data type is to act similar to a decimal type in Python.

@tswast
Copy link
Collaborator

tswast commented Nov 11, 2021

As a workaround, you can specify api_method='load_csv' to use the 0.15.0 behavior.

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema, api_method="load_csv")

@tswast
Copy link
Collaborator

tswast commented Nov 11, 2021

There are actually two problems discovered while investigating this issue:

  • Can't write floating point to NUMERIC column.
  • Can't write string to DATE column.

@londoso
Copy link
Author

londoso commented Nov 12, 2021

Hi Tim, as you said, I'm trying to write a float into a NUMERIC bq data type. Using the argument api_method="load_csv" works fine for me.

Thank you.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Nov 17, 2021
@tswast tswast added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Nov 17, 2021
@tswast
Copy link
Collaborator

tswast commented Nov 17, 2021

Lowering the priority since there's a workaround of api_method="load_csv".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
3 participants