Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358

Closed
randerse10 opened this issue Sep 9, 2019 · 6 comments
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string

Comments

@randerse10
Copy link

randerse10 commented Sep 9, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, None, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values='D', index=['A','B'], columns=['C'], aggfunc=np.sum, fill_value='-')

df.fillna('-').style.highlight_max()
table.style.hightlight_max()

Screenshot from 2019-09-09 11-07-48

Screenshot from 2019-09-09 11-07-23

Problem description

pandas.DataFrame.style.highlight_* does not work on column where nan has been replaced by string using pandas.DataFrame.fillna() or pandas.PivotTable(fill_value=)

Expected Output

Expected that highlight will still work on column when fillna or fill_value get strings.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.0.0-27-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None

@TomAugspurger
Copy link
Contributor

When you fill with a string, you convert to object dtype. The string '-' can't be compared to the numeric values.

I think what you want is a way to control how NA values are printed in the output, and an option for highlight_max to skip NA values. Not sure if those are possible today or not.

@TomAugspurger TomAugspurger added Code Style Code style, linting, code_checks Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap and removed Code Style Code style, linting, code_checks labels Sep 9, 2019
@randerse10
Copy link
Author

I think what you want is a way to control how NA values are printed in the output, and an option for highlight_max to skip NA values.

Yes, that is what I'm looking for. I did not know how to properly describe the problem.

@immaxchen
Copy link
Contributor

Actually, highlight_max skips nan values already, and you can pass your custom function into format to control how nan values are printed in the output.

However, I believe formatting nan values in the output is common enough that can be included in the "built-in" functions in the Styler. I would like to resolve this issue by adding a formatna function works someway like: df.style.highlight_max().formatna('-'), is that OK?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 19, 2019 via email

immaxchen added a commit to immaxchen/pandas that referenced this issue Oct 20, 2019
…ing values

As described in GH pandas-dev#28358, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)

Since the behavior is common in styling (for reports etc.), suggest to add this
shortcut function to enable users format their NA values as something like '--'
or 'Not Available' easily.

example usage: `df.style.highlight_max().format_null('--')`
@immaxchen
Copy link
Contributor

immaxchen commented Oct 20, 2019

The keyword to control NA format is na_rep and can be found in to_csv and to_excel etc. In the context of Styler, the most relevant method would be highlight_null, so I thought method name format_null and parameter name na_rep might be appropriate.

TomAugspurger pushed a commit that referenced this issue Nov 25, 2019
…r missing values (#29118)

* Add built-in funcion for Styler to format the text displayed for missing values

As described in GH #28358, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
…r missing values (pandas-dev#29118)

* Add built-in funcion for Styler to format the text displayed for missing values

As described in GH pandas-dev#28358, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
…r missing values (pandas-dev#29118)

* Add built-in funcion for Styler to format the text displayed for missing values

As described in GH pandas-dev#28358, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)
@attack68
Copy link
Contributor

i believe this to be closed by the merge and the added tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

5 participants