Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add built-in function for Styler to format the text displayed for missing values #29118

Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/reference/style.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Builtin styles
Styler.highlight_max
Styler.highlight_min
Styler.highlight_null
Styler.format_null
Styler.background_gradient
Styler.bar

Expand Down
16 changes: 16 additions & 0 deletions doc/source/user_guide/style.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,22 @@
"df.style.highlight_max(axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can format the text displayed for missing values by `.format_null`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.style.highlight_max(axis=0).format_null(na_rep='-')"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ Other enhancements
- :meth:`DataFrame.to_json` now accepts an ``indent`` integer argument to enable pretty printing of JSON output (:issue:`12004`)
- :meth:`read_stata` can read Stata 119 dta files. (:issue:`28250`)
- Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`)
- :meth:`Styler.format_null` is now added into the built-in functions to help formatting missing values (:issue:`28358`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add this into the user guide as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the name format_nans better, to be similar to fillna, hasnans etc.

@jreback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll be glad to!


Build Changes
^^^^^^^^^^^^^
Expand Down
19 changes: 19 additions & 0 deletions pandas/io/formats/style.py
Original file line number Diff line number Diff line change
Expand Up @@ -930,6 +930,25 @@ def hide_columns(self, subset):
# A collection of "builtin" styles
# -----------------------------------------------------------------------

def format_null(self, na_rep="-"):
"""
Format the text displayed for missing values.

.. versionadded:: 1.0.0

Parameters
----------
na_rep : str

Returns
-------
self : Styler
"""
self.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will overwrite the formatting of a previously applied formatter for non-NA values. Something like

df.style.format("hi-{}".format).format_null()

is that the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TomAugspurger, I like the name format_na! and yes, I was intended to make an overwriting implementation. Actually, I have considered the set_* approach as you, but it seems confusing for the case:

df.style.format('{:.2%}').set_na_format('-') # got 'nan%' instead of '-'

I've got a new idea, how about interface like this?

.format_na('-', subset=['col1','col2'])
.format('{:.2%}', na_rep='-', subset=['col3','col4'])

And the docstring for format_na rephrase to:

Format the text display value using default formatter but represent nan as `na_rep`.
For more advanced formatting, use Styler.format() with your custom formatter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not have been clear about my concern. It's fine that na_format overwrites the formatting for NA values. I'm concerned tht it overwrites the formatting for non-NA values. In my .format("hi-{}".format).format_na('NA') example, the NA values should be formatted as 'NA' and the non-NA values should be formatted as hi-<value>. But I suspect that right now the non-NA formatting is lost (though perhaps it's not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about adding an na_rep to the .format function... That's probably fine. I think it'd still be useful for users to have a way to control the default NA formatting at the table level.

But if we add an na_rep to format, then we wouldn't need a new format_na method, right?

Copy link
Contributor Author

@immaxchen immaxchen Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so a setting at the table level: self.na_rep
and the def format(self, formatter, subset=None): becomes
def format(self, formatter=None, subset=None, na_rep=None):
drop .format_na('-'), use .format(na_rep='-') instead, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds correct. I'm not sure what the default should be, but probably just None (no special formatting for NA values).

lambda x: na_rep if pd.isna(x) else self._display_funcs.default_factory()(x)
)
return self

@staticmethod
def _highlight_null(v, null_color):
return (
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/io/formats/test_style.py
Original file line number Diff line number Diff line change
Expand Up @@ -990,6 +990,14 @@ def test_bar_bad_align_raises(self):
with pytest.raises(ValueError):
df.style.bar(align="poorly", color=["#d65f5f", "#5fba7d"])

def test_format_null(self, na_rep="-"):
# GH 28358
df = pd.DataFrame({"A": [0, np.nan]})
ctx = df.style.format_null()._translate()
result = ctx["body"][1][1]["display_value"]
expected = "-"
assert result == expected

def test_highlight_null(self, null_color="red"):
df = pd.DataFrame({"A": [0, np.nan]})
result = df.style.highlight_null()._compute().ctx
Expand Down