-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Histogram support addition to distplot.py #47
Conversation
@OriolAbril I had to dig a bit deeper into understanding how the plotting actually works as coded, but I think I've managed to get step 1 (computing the histogram) roughly done- I'll probably have to combine these DataArrays into a Dataset and then create the visual element for mapping. Also, I'll remove the Also, I was going through the flow of how density was computed for KDE as an example (in Arviz-Stats). I saw bandwidth computing was done for KDE- but that won't be necessary for plotting histograms, right? |
Changed the plot_axis coords to 'x' and 'y' in place of 'bin_midpoint' and 'bin_height': Also added a visual element function, backend interface and matplotlib implementation with The bar widths are different due to the different scales of the x-axis. I'm not sure how this can be fixed yet if keeping the width a consistent value is desired though. Also, the number of bins is set to 20 by default. Some bins have too few values to be visible as their respective bars are too low in height compared to the taller ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is looking very good already. The computation of the histogram is a bit trickier and would not be part of the gsoc project, but I think it will be helpful to get you more familiar with xarray.
I have tried to focus on the plotting and contributing workflow with the comments. Let me know if you have any doubts related to testing.
src/arviz_plots/backend/__init__.py
Outdated
@@ -99,6 +99,29 @@ def create_plotting_grid( | |||
|
|||
|
|||
# "geoms" | |||
def hist(x, height, target, *, width=1.6, bottom=None, align="center", color=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this is a histogram instead of a bar plot, I would require both left and right bin edges and use those for the bars, therefore removing both width and align parameter. We might consider something like rwidth
like matplotlib (relative width) with a default to 1 in case we didn't want bins to be draws with their actual width, but I would leave that for later (if at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require some changes to the data restructure utility function but I am not too worried about this, at some point we'll have a function in arviz-stats to take care of all this.
src/arviz_plots/plots/distplot.py
Outdated
|
||
# number of bins is set to 20 by default | ||
hist = histogram( | ||
da=var_data, dims=density_dims, bins=20, **stats_kwargs.get("density", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default number of bins should be taken care of by histogram
function. Moreover, if we believed a different default were needed, it would have to happen via stats_kwargs
and setdefault
so a user could modify that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just modified this in the latest commit to do so
Now by default, with no bins provided (azp.plot_dist(schools, kind='hist')
) and the same example dataset, this is produced:
And with bins=40 provided by a user (azp.plot_dist(schools, kind='hist', stats_kwargs={"density": {"bins":40}})
), this is produced:
Will look at your other comments and make changes for those too now
src/arviz_plots/backend/__init__.py
Outdated
def hist(x, height, target, *, width=1.6, bottom=None, align="center", color=None): | ||
"""Interface for a hist plot. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't been completely consistent adding the docstrings to this module, but the idea is for the documentation here so define the behaviour all backends have to comply with. It also isn't a general plotting backend interface but a common interface for our own plots to use which means we can be quite restrictive about the features.
For example, I would only accept array_like for x, height, width... I don't think the extra edge case of floats would be used anywhere. It also means the descriptions should not depend on any backend.
src/arviz_plots/backend/__init__.py
Outdated
color : color or array-like, optional | ||
The colors of the bar faces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might start with the same color for all rectangles, mostly because I am not sure how easy it will be to support the different color rectangles in all backends. If we then see it is feasible and of interest we can add that extra case.
There should also be a facecolor and an edgecolor arguments given there is a color argument and the generated form is "fillable" to match the expectations from https://arviz-plots.readthedocs.io/en/latest/api/backend/index.html#backend-interface-arguments
Thank you, I will take a look at your reviews and advice! |
@OriolAbril Latest commit has changes incorporating advice from your comments. Modified data restructure utility function to keep left and right edges info now until the matplotlib backend calculates midpoints from these at the end to create the bars, and I've updated the docstring and color/facecolor/edgecolor interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment on how to use matplotlib bar function to generate histogram looking plot. Let me know if you need any help to generate the bokeh equivalent
By the way, I commented on the ess and mcse issues but forgot to comment here. There is a histogram function available in |
Altered the code to make use of this function. Also, it seems like point estimate and credible intervals are being plotted so close to the x axis for histogram plots because the default 'y' aesthetic value/s returned by plot collection are pretty low in relation. The maximum y-axis values for the KDE/ECDF plots are way lower so it's not a problem for them. I initially thought there was some issue with this aesthetic mapping when type hist is picked, but it works fine- it's just really small values compared to the histogram heights. Should we mention this in the documentation, asking users to define custom 'y' aesthetic values (in For the plot_dist example where one subplot is created for each variable:
Point estimates when switching from kind="kde" to kind="hist":
This is easily fixable with editing the
We could maybe add this in the documentation so users know to keep this in mind. |
density = distribution.azstats.histogram( | ||
dims=density_dims, **stats_kwargs.get("density", {}) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should pass density=True
as default here. This will normalize the histogram so its area is 1, and will therefore generate histograms with heights similar to those of the kde. That will should take care of the non-reusability of the overlayed example when changing hist/kde. For ecdf/dotplots it will probably break too, but I don't think it matters, ecdf always goes 0-1 whereas dotplots are particularly strange, they should not be expected to act as drop-in replacements of either kde nor hist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah adding that seems to work for the overlay example, without having to alter the np.linspace
parameters.
98e5d1f
to
8f43767
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, go over all comments before starting to address any single one of them
src/arviz_plots/plots/distplot.py
Outdated
@@ -97,6 +99,7 @@ def plot_dist( | |||
|
|||
* "kde" -> passed to :func:`~arviz_plots.visuals.line_xy` | |||
* "ecdf" -> passed to :func:`~arviz_plots.visuals.ecdf_line` | |||
* "hist" -> passed to :func: `~WIP` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update this and add the new visual function to https://github.com/arviz-devs/arviz-plots/blob/main/docs/source/api/visuals.rst
src/arviz_plots/plots/distplot.py
Outdated
@@ -250,6 +253,16 @@ def plot_dist( | |||
**density_kwargs, | |||
) | |||
|
|||
elif kind == "hist": | |||
density = distribution.azstats.histogram( | |||
dims=density_dims, density=True, **stats_kwargs.get("density", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
density=True
should be added to stats_kwargs
with setdefault
. Then, if a user wanted to they can set it to False, keep the axis and use the y axis to look at counts for example. The task itself isn't really important, what is is that it should be possible
src/arviz_plots/plots/distplot.py
Outdated
@@ -352,7 +374,7 @@ def plot_dist( | |||
labeller=labeller, | |||
**title_kwargs, | |||
) | |||
if (kind == "kde") and (plot_kwargs.get("remove_axis", True) is not False): | |||
if (kind in ("kde", "hist")) and (plot_kwargs.get("remove_axis", True) is not False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would modify this logic. Instead, I would set the value of the remove_axis
depending on kind
at the top, right after copying plot_kwargs
.
if kind in ("hist", "ecdf"):
plot_kwargs.setdefault("remove_axis", False)
I think as a default we'll want to remove the axis for kde and quantile dotplot but keep it for the others.
src/arviz_plots/visuals/__init__.py
Outdated
@@ -13,6 +13,21 @@ | |||
from arviz_base.labels import BaseLabeller | |||
|
|||
|
|||
def plot_hist(da, target, backend, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use hist
only to "reserve" the plot_xyz
for batteries included functions, otherwise it might get confusing.
src/arviz_plots/backend/__init__.py
Outdated
@@ -99,6 +99,33 @@ def create_plotting_grid( | |||
|
|||
|
|||
# "geoms" | |||
def hist(y, l_e, r_e, target, *, bottom=None, color=None, facecolor=None, edgecolor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over all potential keywords in https://arviz-plots.readthedocs.io/en/latest/api/backend/index.html#backend-interface-arguments and alpha
should be added here too.
Just pushed changes per last review |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #47 +/- ##
==========================================
- Coverage 85.85% 84.84% -1.02%
==========================================
Files 17 17
Lines 1951 2012 +61
==========================================
+ Hits 1675 1707 +32
- Misses 276 305 +29 ☔ View full report in Codecov by Sentry. |
**artist_kws, | ||
): | ||
"""Interface to Bokeh for a histogram bar plot.""" | ||
artist_kws.setdefault("level", "glyph") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary?
src/arviz_plots/plots/utils.py
Outdated
|
||
|
||
# WIP: dropping bin_midpoints calculation and keeping bin_edges | ||
def restructure_hist_data(hist_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed
This will need to be rebased and double checked to make it compatible with plotly. Take care of these last two comments left and after that I'll rebase and add more commits to the PR so it works with plotly |
…visual element function, backend interface and matplotlib backend function for plotting histogram
…ta and modified docstrings for hist backend interface
…ted hypothesis time limit to 2 seconds and modified backend hist plotting function
…ious commits and updated restructure_hist_data() docstring
…ual element slightly for new returned hist density data structure
…by default for histograms
…ization from test_plot_dist_models
…ing functions, renamed 'plot_hist' to 'hist', modified `remove_axis` logic slightly and set density=True as default in stats_kwargs
It needs a fix in arviz-stats to work
…ith updates fromrootogram plot
2bd3519
to
f7a56c6
Compare
This PR aims to add support for histograms in
distplot.py
(Issue #24 ), allowing marginal densities to be visualized in this new form, apart from preexisting KDE and ECDF forms📚 Documentation preview 📚: https://arviz-plots--47.org.readthedocs.build/en/47/