passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

randerse10 · 2019-09-09T08:15:27Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, None, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values='D', index=['A','B'], columns=['C'], aggfunc=np.sum, fill_value='-')

df.fillna('-').style.highlight_max()
table.style.hightlight_max()

Problem description

pandas.DataFrame.style.highlight_* does not work on column where nan has been replaced by string using pandas.DataFrame.fillna() or pandas.PivotTable(fill_value=)

Expected Output

Expected that highlight will still work on column when fillna or fill_value get strings.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.0.0-27-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-09-09T14:26:40Z

When you fill with a string, you convert to object dtype. The string '-' can't be compared to the numeric values.

I think what you want is a way to control how NA values are printed in the output, and an option for highlight_max to skip NA values. Not sure if those are possible today or not.

randerse10 · 2019-09-09T14:58:23Z

I think what you want is a way to control how NA values are printed in the output, and an option for highlight_max to skip NA values.

Yes, that is what I'm looking for. I did not know how to properly describe the problem.

immaxchen · 2019-10-19T10:44:29Z

Actually, highlight_max skips nan values already, and you can pass your custom function into format to control how nan values are printed in the output.

However, I believe formatting nan values in the output is common enough that can be included in the "built-in" functions in the Styler. I would like to resolve this issue by adding a formatna function works someway like: df.style.highlight_max().formatna('-'), is that OK?

TomAugspurger · 2019-10-19T13:20:51Z

Seems reasonable. I think we have similar keywords elsewhere in the library to control the NA format. Check to make sure we match those names.

…

On Sat, Oct 19, 2019 at 5:44 AM Max Chen ***@***.***> wrote: Actually, highlight_max skips nan values already, and you can pass your custom function into format to control how nan values are printed in the output. However, I believe formatting nan values in the output is *common enough* that can be included in the "built-in" functions <https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Builtin-styles> in the Styler. I would like to resolve this issue by adding a formatna function works someway like: df.style.highlight_max().formatna('-'), is that OK? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28358?email_source=notifications&email_token=AAKAOITZYKP3NXAFATCTQGDQPLQJLA5CNFSM4IUYBOO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBXLPAA#issuecomment-544126848>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOISJIGYLNZWHASIII5LQPLQJLANCNFSM4IUYBOOQ> .

…ing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling) Since the behavior is common in styling (for reports etc.), suggest to add this shortcut function to enable users format their NA values as something like '--' or 'Not Available' easily. example usage: `df.style.highlight_max().format_null('--')`

immaxchen · 2019-10-20T19:03:01Z

The keyword to control NA format is na_rep and can be found in to_csv and to_excel etc. In the context of Styler, the most relevant method would be highlight_null, so I thought method name format_null and parameter name na_rep might be appropriate.

…r missing values (#29118) * Add built-in funcion for Styler to format the text displayed for missing values As described in GH #28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling)

…r missing values (pandas-dev#29118) * Add built-in funcion for Styler to format the text displayed for missing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling)

attack68 · 2021-02-25T18:44:53Z

i believe this to be closed by the merge and the added tests

TomAugspurger added Code Style Code style, linting, code_checks Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap and removed Code Style Code style, linting, code_checks labels Sep 9, 2019

immaxchen mentioned this issue Oct 20, 2019

ENH: Add built-in function for Styler to format the text displayed for missing values #29118

Merged

5 tasks

mroeschke added the Enhancement label May 7, 2020

attack68 closed this as completed Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

randerse10 commented Sep 9, 2019 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Sep 9, 2019

randerse10 commented Sep 9, 2019

immaxchen commented Oct 19, 2019

TomAugspurger commented Oct 19, 2019 via email

immaxchen commented Oct 20, 2019 •

edited

Loading

attack68 commented Feb 25, 2021

passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

Comments

randerse10 commented Sep 9, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Sep 9, 2019

randerse10 commented Sep 9, 2019

immaxchen commented Oct 19, 2019

TomAugspurger commented Oct 19, 2019 via email

immaxchen commented Oct 20, 2019 • edited Loading

attack68 commented Feb 25, 2021

randerse10 commented Sep 9, 2019 •

edited

Loading

Output of `pd.show_versions()`

immaxchen commented Oct 20, 2019 •

edited

Loading