import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) #405

keyapi · 2020-02-20T16:12:23Z

In pandas dataframe, dtype of a column with NaN+integer would be changed into 'float', in cases of must-keep-integer, it's useful to replace dtype:'float' with dtype:'Int64', see: https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

I use wks.set_dataframe(df, (1,1), nan="") to transfer a dataframe to a gsheet worksheet, but the pandas column with dtype:'float' (mixture of NaN and integer) could only be transfered as 'float' in a gsheet, but what i wanted for this column is only ""+integer, not float.

I also tried changing the dtype of the column first into 'Int64', e.g. df.column = df.column.astype('Int64') and then set_dataframe(), but got a "TypeError: <U1 cannot be converted to an IntegerDtype"

solution 1:
new parameter in set_dataframe() to import float as int

solution 2:
new parameter in set_dataframe() to deal with dytpe: "Int64", take < > like a "NaN"

nithinmurali · 2020-02-21T16:44:11Z

Can you please provide a small example to reproduce this?

keyapi · 2020-02-23T13:16:15Z

Can you please provide a small example to reproduce this?

example snippet, set_dataframe() cloumn with 2 dtype: float / Int64, nullable integer

https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

import numpy as np
import pandas as pd
import pygsheets

pygsheets authorize and worksheet

gc = pygsheets.authorize()
wks = gc.open_by_key().sheet1

try 1: a dataframe with 2 columns, 'col2' with a NaN

d = {'col1': [1, 2], 'col2': [3, np.nan]}
df = pd.DataFrame(d)
print(df.dtypes)
col1 int64
col2 float64

print(df)

	col1	col2
0	1	3.0
1	2	NaN

wks.set_dataframe(df, (1,1), nan="")

worksheet, 'col2' will keep float and "", "" is good, but 3.0 is not what i want

	col1	col2
0	1	3.0
1	2

try 2: make a copy and change dtype of 'col2' to 'Int64' (not int64)

df_Int64 = df.copy()
df_Int64['col2'] = df_Int64['col2'].astype('Int64')
print(df_Int64.dtypes)
col1 int64
col2 Int64

print(df_Int64) # 'col2' will keep integer 3 and NaN, both are what i want

	col1	col2
0	1	3
1	2	NaN

wks.set_dataframe(df_Int64, (1,1), nan="")

TypeError: <U1 cannot be converted to an IntegerDtype

gosuto-inzasheru · 2020-02-26T09:15:21Z

Workaround for this is to call set_dataframe() with nan=pd.NA. It will prevent to attempt to cast the pd.NA values to string.

gosuto-inzasheru · 2020-02-26T09:34:04Z

Problem is that pandas v1.0.0 introduced a new dtype; Int64 (opposed to int64). This dtype allows for <NA> in an integer column whereas previously this was only possible with dtype object.

Calling .fillna() on this new dtype is what is causing said error; an Int64 column cannot contain the NaN string that is provided by the kwargs by default.

pygsheets/pygsheets/worksheet.py

Line 1303 in 8a74911

df = df.fillna(nan)

My suggestion would be to drop the nan= feature completely and let .astype('unicode') handle the casting.

If you want I can pick up this issue and submit a PR.

keyapi · 2020-02-27T11:38:19Z

Workaround for this is to call set_dataframe() with nan=pd.NA. It will prevent to attempt to cast the pd.NA values to string.

do you mean wks.set_dataframe(df_Int64, (1,1), nan=np.nan) ?
i tried, got no Error, gesheet shows

col1	col2
1	3
2	nan

but nan is not what i expected for this column on gsheet, better kept as empty like below.

col1	col2
1	3
2

Problem is that pandas v1.0.0 introduced a new dtype; Int64

That dtype Int64 should be since pandas 0.24.0 (Jan 2019), i tried it last year

pygsheets/pygsheets/worksheet.py

Line 1303 in 8a74911

df = df.fillna(nan)

i think the gerneral problem: .fillna() + Int64 lies in pandas.

My suggestion would be to drop the nan= feature completely and let .astype('unicode') handle the casting.

i think the option nan= for .set_dataframe() is still convenient for simple user like me, who just transfers DataFrame to gsheets.

your sugesstion .astype('unicode) is a good option, would be better just to change the column with dtype:'Int64', but i tried pandas 0.25 and 1.0.1, both can still not distinguish dtypes int64 and 'Int64'
print(df_Int64.dtypes) will give 'col1' as int64 and col2 as Int64(not int64)
but print([col for col in list(df_Int64.columns) if df_Int64[col].dtypes=='Int64']) will give a list ['col1', 'col2'], wtf... still pandas problem

ldsalomone · 2020-03-06T03:48:27Z

This isn't a problem within pygsheets, but instead a problem with pandas: pandas-dev/pandas#25288

A fix can be done by calling astype('object') before .fillna().

d = {'col1': [1, 2], 'col2': [3, np.nan]}
df = pd.DataFrame(d)

df.col2 = df.col2.astype('Int64')
print(df)
print(df.col2.dtype)
print(type(df.col2[0]))
print(type(df.col2[1]))

df = df.astype('object').fillna('')
print(df)
print(df.col2.dtype)
print(type(df.col2[0]))
print(type(df.col2[1]))

This is also a non-issue because the first version of the function worked (without the Int62 dtype) and once uploaded to google sheets, just tell google sheets to only display 0 decimal point for your whole table.

gosuto-inzasheru · 2020-04-03T20:18:05Z

@keyapi are you still experiencing problems? Latest version should have it fixed.

nithinmurali · 2020-11-15T19:50:46Z

Fixed

keyapi changed the title ~~import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with NaN and integer)~~ import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) Feb 20, 2020

nithinmurali added the Info needed label Feb 21, 2020

gosuto-inzasheru mentioned this issue Feb 27, 2020

prevent error message when trying to fill strings into an Int64 column #409

Merged

nithinmurali removed the Info needed label Mar 7, 2020

nithinmurali closed this as completed Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) #405

import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) #405

keyapi commented Feb 20, 2020 •

edited

Loading

nithinmurali commented Feb 21, 2020

keyapi commented Feb 23, 2020 •

edited

Loading

gosuto-inzasheru commented Feb 26, 2020 •

edited

Loading

gosuto-inzasheru commented Feb 26, 2020

keyapi commented Feb 27, 2020 •

edited

Loading

ldsalomone commented Mar 6, 2020 •

edited

Loading

gosuto-inzasheru commented Apr 3, 2020

nithinmurali commented Nov 15, 2020

import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) #405

import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) #405

Comments

keyapi commented Feb 20, 2020 • edited Loading

nithinmurali commented Feb 21, 2020

keyapi commented Feb 23, 2020 • edited Loading

example snippet, set_dataframe() cloumn with 2 dtype: float / Int64, nullable integer

https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

pygsheets authorize and worksheet

try 1: a dataframe with 2 columns, 'col2' with a NaN

worksheet, 'col2' will keep float and "", "" is good, but 3.0 is not what i want

try 2: make a copy and change dtype of 'col2' to 'Int64' (not int64)

TypeError: <U1 cannot be converted to an IntegerDtype

gosuto-inzasheru commented Feb 26, 2020 • edited Loading

gosuto-inzasheru commented Feb 26, 2020

keyapi commented Feb 27, 2020 • edited Loading

ldsalomone commented Mar 6, 2020 • edited Loading

gosuto-inzasheru commented Apr 3, 2020

nithinmurali commented Nov 15, 2020

keyapi commented Feb 20, 2020 •

edited

Loading

keyapi commented Feb 23, 2020 •

edited

Loading

gosuto-inzasheru commented Feb 26, 2020 •

edited

Loading

keyapi commented Feb 27, 2020 •

edited

Loading

ldsalomone commented Mar 6, 2020 •

edited

Loading