Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG-17280 to_html follows display.precision for column numbers in notebooks #25914

Merged
merged 9 commits into from
Apr 4, 2019

Conversation

JustinZhengBC
Copy link
Contributor

When printing column labels, check if they are floats and if they are, then round according to display.precision preferences

@codecov
Copy link

codecov bot commented Mar 28, 2019

Codecov Report

Merging #25914 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25914      +/-   ##
==========================================
- Coverage   91.77%   91.77%   -0.01%     
==========================================
  Files         175      175              
  Lines       52607    52610       +3     
==========================================
- Hits        48282    48281       -1     
- Misses       4325     4329       +4
Flag Coverage Δ
#multiple 90.32% <100%> (ø) ⬆️
#single 41.9% <0%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/html.py 99.36% <100%> (ø) ⬆️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/core/frame.py 96.79% <0%> (-0.12%) ⬇️
pandas/io/formats/css.py 100% <0%> (ø) ⬆️
pandas/io/formats/excel.py 97.4% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b9f933...9040b98. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 28, 2019

Codecov Report

Merging #25914 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25914      +/-   ##
==========================================
- Coverage   91.84%   91.83%   -0.01%     
==========================================
  Files         175      175              
  Lines       52550    52554       +4     
==========================================
- Hits        48266    48265       -1     
- Misses       4284     4289       +5
Flag Coverage Δ
#multiple 90.39% <100%> (ø) ⬆️
#single 41.89% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/html.py 99.36% <100%> (ø) ⬆️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/core/frame.py 96.79% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.61% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4814a28...ebcc8f8. Read the comment docs.

@simonjayhawkins
Copy link
Member

As this is a display option, shouldn't it only affect the notebook display and leave the to_html untouched?

@JustinZhengBC
Copy link
Contributor Author

I have tested this fix in a notebook and it works as expected. As for whether this is the appropriate location for a fix, NotebookFormatter is the class responsible for displaying data in Jupyter notebooks, and it inherits from HTMLFormatter (and the bug was present in to_html(notebook=False) as well).

@simonjayhawkins
Copy link
Member

and the bug was present in to_html(notebook=False) as well

I would argue it is not a bug here. since the display options should not affect the to_html() output. compare with the display options, max_rows, max_columns, show_dimensions, max_colwidth, etc. which only apply to the notebook display.

@pep8speaks
Copy link

pep8speaks commented Mar 29, 2019

Hello @JustinZhengBC! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-03 15:23:23 UTC

@JustinZhengBC
Copy link
Contributor Author

I agree. I have altered the fix so it only applies to NotebookFormatter and not HTMLFormatter, since to_html is used for display purposes in a notebook.

@JustinZhengBC JustinZhengBC changed the title BUG-17280 to_html follows display.precision for column numbers BUG-17280 to_html follows display.precision for column numbers in notebooks Mar 29, 2019


def test_to_html_round_column_headers():
df = DataFrame([1], columns=[0.55555])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number

@@ -491,6 +491,15 @@ class NotebookFormatter(HTMLFormatter):
DataFrame._repr_html_() and DataFrame.to_html(notebook=True)
"""

def __init__(self, formatter, classes=None, border=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to be in __init__, rather have this call a method in the base class which is overriden here, e.g.

self.columns = self._get_columns_formatted_values()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather have this call a method in the base class which is overriden here

in #24651, i separated out the notebook functionality from HTMLFormatter into NotebookFormatter using HTMLFormatter as the base class. the base class of HTMLFormatter is TableFormatter which is also the base class of DataFrameFormatter (used for to_string and to _latex) and LatexFormatter.

I envisaged that at some point in the development cycle it may become desirable to also create a ToHTMLFormatter using HTMLFormatter as the base.

IMO HTMLFormatter and NotebookFormatter are not the appropriate location for non-markup related formatting issues that are common across output-formatting methods. I think this issue is in that category and should ideally be in DataFrameFormatter or TableFormatter.

However, if to close the open issue the fix is somewhere in io/formats/html.py, then i think a TODO to remove the code at a later date should be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the bug was present in to_html(notebook=False) as well

I would argue it is not a bug here. since the display options should not affect the to_html() output. compare with the display options, max_rows, max_columns, show_dimensions, max_colwidth, etc. which only apply to the notebook display.

I think I'm getting a bit confused here. Doesn't this mean that this is not an issue that is common across formatters? Because to_html should only check display preferences when used for display purposes in a notebook and not when generating HTML for a web page?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i can understand why your confused, there is a slight problem with the current class hierarchy. the workaround for max_colwidth was to use with option_context to ignore the display option for to_html, see #24841

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the ideal solution would be to have say a use_display_options attribute in DataFrameFormatter and then in HTMLFormatter we just have self.fmt.use_display_options = False and in NoteBookFormatter we have self.fmt.use_display_options = True and then all the display formatting could be handled in io/formats/format.py.

But this is way outside the scope of this PR. So a with option_context work-around would be fine for now IMO.

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Mar 29, 2019
@simonjayhawkins
Copy link
Member

As for whether this is the appropriate location for a fix

i've opened separate issues for float values in index names #25917, object indexes #25919 and float-like values #25920. So this is probably not the appropriate location but i guess as a bug-fix that would be OK.

@simonjayhawkins
Copy link
Member

I think if, for now, you just change

row.extend(self.columns)

to

row.extend(self.columns.format())

it'll fix the open issue by making the behavior for the single level column labels consistent with the index labels and the multiIndex column labels.

if the behavior should be different in to_html() output than the notebook display then this would probably need to include changing the index labels and the multiIndex column labels, so outside the scope of this PR.

then in the future we probably need a _get_formatted_column_labels method for code parity with DataFrameFormatter

@JustinZhengBC
Copy link
Contributor Author

@simonjayhawkins your fix works. It also changes columns with None to NaN as a side effect. Is that okay? If so we can go with that.

@simonjayhawkins
Copy link
Member

It also changes columns with None to NaN as a side effect. Is that okay? If so we can go with that.

i'm getting nan changed to NaN for None, which is consistent with the index and the multiIndex columns cases.

import pandas as pd
import numpy as np
pd.options.display.precision = 3
labels = np.random.rand(10)
labels[3] = None
p = pd.DataFrame(np.random.randn(10,10),index=labels,columns=labels)
p
0.04785803831057456 0.6401016774531172 0.9164407229270924 nan 0.6926779069059682 0.4610792740713505 0.9600160671031376 0.35015957890745153 0.14431674279365037 0.6268220237028823
0.048 -1.836 -0.269 1.081 0.267 -0.195 -0.586 0.916 2.162 0.572 0.198
0.640 0.757 0.656 1.502 -0.705 0.379 1.254 -0.393 -0.478 0.940 1.735
0.916 0.298 0.816 0.239 -0.166 -0.503 -0.261 2.781 -1.195 -0.613 1.763
NaN -0.792 0.622 -0.738 0.489 -1.947 -0.804 0.578 -0.086 -1.157 0.471
0.693 -0.414 -0.122 -1.248 -0.410 -1.132 0.315 -1.116 -0.746 -0.977 2.093
0.461 2.213 -0.615 0.594 -0.262 1.749 0.994 -0.539 1.187 0.156 0.673
0.960 -1.578 0.199 0.903 -0.578 0.859 0.969 -0.030 -1.371 -0.330 -0.141
0.350 0.023 1.053 0.902 1.489 1.278 -1.473 0.910 0.197 -0.461 0.714
0.144 1.797 0.233 -0.672 -0.288 -0.455 -0.872 -1.535 -2.370 1.545 -0.962
0.627 -2.079 -0.473 -2.075 0.391 -0.228 0.751 0.155 0.648 -0.679 -1.043
p.columns.format()
['0.249',
 '0.833',
 '0.730',
 'NaN  ',
 '0.520',
 '0.545',
 '0.960',
 '0.233',
 '0.247',
 '0.990']
multi = pd.MultiIndex.from_arrays([labels,labels])
p = pd.DataFrame(np.random.randn(10,10),index=multi,columns=multi)
p
0.249 0.833 0.730 NaN 0.520 0.545 0.960 0.233 0.247 0.990
0.249 0.833 0.730 NaN 0.520 0.545 0.960 0.233 0.247 0.990
0.249 0.249 0.419 0.043 -0.141 0.408 -0.040 1.376 -0.041 -1.439 0.192 -1.223
0.833 0.833 0.880 -1.105 -0.302 -0.652 -1.237 0.851 -0.481 1.156 -0.787 1.032
0.730 0.730 -0.939 -0.504 -0.397 0.647 -0.898 -0.219 -0.285 -1.120 -1.268 -0.277
NaN NaN -0.072 0.020 1.558 0.463 -1.322 0.388 0.373 0.107 -1.913 0.370
0.520 0.520 -0.343 -0.121 2.011 0.068 0.409 -0.326 -0.485 1.271 1.244 -0.313
0.545 0.545 -0.977 -0.755 0.145 0.154 0.293 1.243 1.441 -0.198 -0.318 0.339
0.960 0.960 -1.041 1.479 -1.120 0.101 -1.302 1.436 0.572 -0.806 1.306 -1.249
0.233 0.233 -0.021 -0.742 -0.394 0.185 0.028 -0.999 1.373 -1.598 1.140 -0.386
0.247 0.247 0.280 -0.246 0.222 -0.767 -1.433 0.144 -0.466 0.920 0.941 1.195
0.990 0.990 0.633 -1.737 -0.649 0.396 -0.007 -0.897 0.635 1.555 -2.001 0.047

@simonjayhawkins
Copy link
Member

@JustinZhengBC : lgtm. #25914 (comment) still to do?

@JustinZhengBC
Copy link
Contributor Author

@simonjayhawkins sorry about that, thanks for catching it.

@jreback
Copy link
Contributor

jreback commented Apr 4, 2019

@simonjayhawkins this is orthogonal to your other PR? #25983

@simonjayhawkins
Copy link
Member

@simonjayhawkins this is orthogonal to your other PR? #25983

yes. this is ok. probably a merge conflict on the test, but just an accept both.

@jreback jreback added this to the 0.25.0 milestone Apr 4, 2019
@jreback jreback merged commit 013f4b4 into pandas-dev:master Apr 4, 2019
@jreback
Copy link
Contributor

jreback commented Apr 4, 2019

thanks @JustinZhengBC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

display.precision not honored for column headers
4 participants