Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding show() option missingstring to change the string that prints for missing values #1688

Closed
wants to merge 11 commits into from

Conversation

IanButterworth
Copy link

@IanButterworth IanButterworth commented Jan 20, 2019

I've found that it's sometimes visually helpful to print a smaller string than "missing" for missing values. This provides a way to change the string during show(), if desired.

i.e. show(df,missingstring="-")

Further work is needed for non-default outputs (HTML, LaTeX, CSV and TSV)

> df = DataFrame(a=1:3,b=[1,missing,3])
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 1       │
│ 2   │ 2     │ missing │
│ 3   │ 3     │ 3       │

> show(df,missingstring="-")
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 1       │
│ 2   │ 2     │ -       │
│ 3   │ 3     │ 3       │
> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),b = repeat([2, 1], outer=[4]),c = rand([randn(),missing],8));
> gd = groupby(df, :a);
> show(gd,missingstring="-")
GroupedDataFrame with 4 groups based on key: a
First Group (2 rows): a = 1
│ Row │ a     │ b     │ c         │
│     │ Int64 │ Int64 │ Float64⍰  │
├─────┼───────┼───────┼───────────┤
│ 1   │ 1     │ 2     │ 0.0727342 │
│ 2   │ 1     │ 2     │ -         │
⋮
Last Group (2 rows): a = 4
│ Row │ a     │ b     │ c         │
│     │ Int64 │ Int64 │ Float64⍰  │
├─────┼───────┼───────┼───────────┤
│ 1   │ 4     │ 1     │ -         │
│ 2   │ 4     │ 1     │ 0.0727342 │

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the proposal. Here are some general comments that could be included in it (apart from the detailed recommendations I have left inline):

  • even in this limited scope we should also add methods in dataframerow/show;
  • handling CSV and TSV should be easy as printtable already handles this keyword argument;
  • I would not leave HTML and LaTeX for later as we tended to do it in the past and in the long term this leads to inconsistency in the framework that is hard to clean-up as these are small details;
  • all functionality we add should have test cases added

@IanButterworth
Copy link
Author

Thanks @bkamins. I've believe I've added all you listed

@IanButterworth
Copy link
Author

Note that the default for latex is missingstring="" given that this was the default functionality beforehand

@bkamins
Copy link
Member

bkamins commented Jan 21, 2019

Thank you. Please let me know when the tests pass and I will review the changes.

Regarding LaTeX - please change the default to "missing" - this is one of these small inconsistencies we have that need to be cleaned up.

The reason to change it to "missing" is that now nothing and missing print the same in LaTeX which is not desirable and moreover HTML and LaTeX are inconsistent (and the most common use of LaTeX is PDF export from Jupyter notebook so they should be the same).

@IanButterworth
Copy link
Author

@bkamins The latex default is now "missing" and the tests now pass, however I had to make changes to the MIME types that need review

@@ -92,11 +92,11 @@ function html_escape(cell::AbstractString)
return cell
end

Base.show(io::IO, mime::MIME"text/html", df::AbstractDataFrame; summary::Bool=true) =
_show(io, mime, df, summary=summary)
Base.show(io::IO, mime::MIME"text/html", df::AbstractDataFrame; summary::Bool=true, missingstring::AbstractString="missing") =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please correctly align the indentation of the code here and in all places below/

@bkamins
Copy link
Member

bkamins commented Jan 22, 2019

Looks good to me. I have three comments:

MIME

however I had to make changes to the MIME types that need review

I think we should provide methods performing auto-conversion like in Base where we hace (@nalimilan - do you have the same opinion here?):

show(io::IO, m::AbstractString, x) = show(io, MIME(m), x)

(of course with proper type restrictions; one has to be careful here to check that method ambiguities were not introduced)

In particular the PR should be checked on Jupyter notebook + HTML and LaTeX/PDF export to make sure that nothing fails (I can do it if would be a problem for you - please let me know) - the reason is that in the current state the PR is very likely to fail there (but I have not checked; however, it breaks backward compatibility without the fix I propose so there is a risk of failure)

Doumentation

It would be excellent to add a small note in the documentation about the newly introduced features

Rebasing

The test files require rebasing because today a major test code clean-up was performed. The major change was removal of indentation in module block. If you prefer that I do the rebasing then please let me know (in general you have to use the current upstream master and reapply only the changes you have introduced in the PR).

Thank you for your contribution (and sorry that the seemingly simple PR became more work than expected but we are trying hard to hit DataFrames.jl 1.0 release with a clean package).

@IanButterworth
Copy link
Author

Thanks for the guidance @bkamins. I would prefer to not do the rebase, so I graciously accept the offer! :) thanks

test/io.jl Outdated
end

@testset "Huge LaTeX export" begin
df = DataFrame(a=1:1000)
ioc = IOContext(IOBuffer(), :displaysize => (10, 10), :limit => false)
show(ioc, "text/latex", df)
show(ioc, MIME(Symbol("text/latex")), df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbol isn't needed AFAICT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that it should be just "text/latex" if we fix the signatures.

@nalimilan
Copy link
Member

I think we should provide methods performing auto-conversion like in Base where we hace (@nalimilan - do you have the same opinion here?):

show(io::IO, m::AbstractString, x) = show(io, MIME(m), x)

Indeed the fallback method doesn't support keyword arguments. Something like this shouldn't introduce ambiguities:

show(io::IO, m::AbstractString, x::AbstractDataFrame; kwargs...) = show(io, MIME(m), x; kwargs...)

But we should file an issue or PR against Base to improve this.

@bkamins
Copy link
Member

bkamins commented Jan 23, 2019

But we should file an issue or PR against Base to improve this.

I am not sure methods in base are assumed to accept keyword arguments. AFAICT the intended way to do it is to pass the arguments in IOContext object. I did not push on it here because we have this non-standard keyword arguments approach in show anyway. We might consider cleaning it up, but this is another PR and a big change.

My thinking was to define the method also you have proposed - at least short term and decide about IOContext issue later.

@bkamins
Copy link
Member

bkamins commented Jan 23, 2019

@ianshmean I start rebasing it so please do not push anything to this PR in the meantime 😄.

@bkamins
Copy link
Member

bkamins commented Jan 23, 2019

Done. A small remark - it is safer to make a PR in a branch not on master (nothing big, but GitHub complains).

I have also re-introduced repr test, as you have removed it, but I think we should keep it. Now I see that most probably it should be a bit up in the code (before the second test with missingstring="" call) but let us wait for the CI to tell that 😄.

@nalimilan
Copy link
Member

It's not clear whether all of these options should go through IOContext. I think the idea is that it should only be used for settings which should apply recursively to sub-elements, which isn't the case here. Anyway the best way to find out is to discuss this in Base.

@bkamins
Copy link
Member

bkamins commented Jan 23, 2019

I am also not 100% clear 😄 - but I have checked that out of 270 methods for show in Base only one accepts one keyword argument - I do not remember now which, but it was not something typical.

@bkamins
Copy link
Member

bkamins commented Jul 24, 2019

@ianshmean + @nalimilan any opinion what we should do with this PR (note that it requires a serious rebasing since we did a heavy cleanup of show in the meantime)

@nalimilan
Copy link
Member

If @ianshmean can rebase it I think we should add the show method accepting keyword arguments.

@bkamins bkamins added the non-breaking The proposed change is not breaking label Feb 12, 2020
@bkamins
Copy link
Member

bkamins commented Feb 12, 2020

I mark it non-breaking as I understand what is proposed here can be added at any time.

@bkamins
Copy link
Member

bkamins commented Aug 20, 2020

@ianshmean - do you think you would find time to work on this PR to try finalizing it?
@ronisbr - does PettyTables.jl have this option (if yes then probably we can close this PR and recommend PrettyTables.jl)

@IanButterworth
Copy link
Author

@bkamins Sorry, I don't have time to complete this now

@bkamins
Copy link
Member

bkamins commented Aug 20, 2020

OK - thank you for the response and submitting the PR. If PrettyTables.jl has this possibility I will close this PR then.

@ronisbr
Copy link
Member

ronisbr commented Aug 20, 2020

In PrettyTables, missing is shown as “missing”. However it is trivial to use a data formatted to change missing to “-“, for example.

julia> df = DataFrame(a=1:3,b=[1,missing,3]);

julia> pretty_table(df, formatters = ((v,i,j)->ismissing(v) ? "-" : v,))
┌───────┬───────────────────────┐
│     a │                     b │
│ Int64 │ Union{Missing, Int64} │
├───────┼───────────────────────┤
│     11 │
│     2- │
│     33 │
└───────┴───────────────────────┘

@bkamins
Copy link
Member

bkamins commented Aug 20, 2020

Thank you for responding. This is what I have assumed, in this case I close it.

As a reference:

(v,i,j)->ismissing(v) ? "-" : v

can be written as:

(v,i,j) -> coalesce(v, "-")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants