-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
density contours stroke & fill #948
Conversation
3484ec8
to
774191f
Compare
rebased on main |
I want to have 1-d KDE, but it's not in the scope of this PR, which in my view is complete. |
src/marks/density.js
Outdated
.bandwidth(bandwidth) | ||
.thresholds(thresholds); | ||
|
||
// First pass: seek the maximum density across all facets and series; memoize for performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Memoizing” refers to a caching technique where, given a function that computes a consistent return value for its given arguments, the return values are cached by their arguments so as to avoid invoking the function repeatedly for the same arguments. This isn’t really memoizing here; there’s not a function that represents the computational task. I haven’t quite followed what this code is doing yet but this is probably something like “cache the initial set of contours to avoid recomputing them in the second pass”. I need to read this code more thoroughly (I’m making an edit pass now) but I’ll add more comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cache the contours and reuse those that ended up with the same threshold selection.
In the future, please don’t delete my PR before I’ve had a chance to review your contributions. I understand the desire to rebase to keep things up-to-date, but it’s now harder for me to review this code because I can’t easily see the changes you made on top of the previous PR I put up. Thank you. 🙏 |
56a0e32
to
8dad905
Compare
8286adb
to
22e478c
Compare
note: we have to go around this bug in d3-contour: d3/d3-contour#57
(in the future we might route those to 1-dimensional transforms—KDE)
…composing with the hexbin transform) * avoids a crash when there is no contour
(not entirely sure if it's good practice to have all the yarn.lock changes in the PR, beyond those that are relevant)
8dad905
to
d327591
Compare
Okay, I was able to rebase my PR to main, then this PR to my PR, and we’re back to me being able to review all the changes here. 😅 Thanks for being patient with me. |
src/marks/density.js
Outdated
.thresholds(thresholds) | ||
(index)) | ||
.call(g => g.selectAll() | ||
.data(Array.from(index, i => [i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use applyChannelStyles
instead of applyGroupedChannelStyles
, we won’t need to do this. (I’ll make this change shortly.)
src/marks/density.js
Outdated
const W = channels.weight?.value; | ||
const Z = channels.z?.value; | ||
const {z} = this; | ||
const [cx, cy] = applyFrameAnchor(options, dimensions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applyFrameAnchor
takes this
as the first argument, not options
. (I’ll fix this.)
We might also want to support fillOpacity = density and strokeOpacity = density? Could be left as a future improvement though; definitely not urgent. (I’ll consider implementing this now.) |
src/marks/density.js
Outdated
// normalize colors to a thresholds scale | ||
const m = max * (maxn + 1) / maxn; | ||
if (f) channels.fill.value = channels.fill.value.map(v => v / m); | ||
if (s) channels.stroke.value = channels.stroke.value.map(v => v / m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking maybe we don’t do this, and just leave the units as “points per square pixel” or “value per square pixel” or perhaps “points per 1,000 square pixels” as a more human-readable value. Even though the value is arbitrary and resolution-dependent, it’s still probably more useful than 0 = 0, 1 = maximum observed density because it allows you to have a consistent scale across charts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends if you want to compare distributions or actual quantities, but I'm OK with reverting this. If needed, we could add the normalization as an option, but I'm not convinced it's that useful—I was just trying to make sense of the values.
100 square pixels is probably a better choice for a unit than 1000, as one can picture a 10x10 square hosting 1 or a few points.
I'm not sure about fillOpacity being useful, since the shapes are not exclusive bands the areas are overlaid and having a constant fillOpacity=0.1 is what results in a "transparent to opaque" display. (But not opposed to adding it for consistency.) |
src/marks/density.js
Outdated
// Second pass: generate contours with the thresholds derived above | ||
density.thresholds(thresholds); | ||
for (const {facetIndex, index, c: memoc, top} of memo) { | ||
const c = top < max ? density(index) : memoc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure how much this caching is helping; I think most commonly, only one series (or facet-series) will be cacheable as having the max threshold. All the others will need to be recomputed. I guess that’s still helpful in the common case of only a single series and single facet, but I suspect we could make this significantly faster if d3.contourDensity had a way to compute and store the blurred grid, and then compute individual contours for a particular threshold afterward. That way, we wouldn’t have to recompute the underlying grid when computing new contours, and we could cache individual contours rather than sets.
dcadd28
to
abfd86e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some edits in abfd86e. To summarize:
- For options that are only needed by the initializer, I moved the defaults and coercion into the initializer.
- I fixed a bug where if the fill was specified as “density”, the default stroke is now “none” (not “currentColor”).
- I removed this.path in favor of creating a path in render.
- I added a comment explaining why mark.filter is overridden. I wonder if we also want to change how the channels are grouped: rather than picking the value associated with the first index for each facet-series, we should pick the first defined value? But this is probably fine as-is.
- I adopted applyChannelStyles instead of applyGroupedChannelStyles to avoid constructing wrapper index arrays for each contour.
- I added z to the set of channels that are not automatically grouped (since z is not needed to render the computed contours), and adopted a Set (since the explicit enumeration of channel names in the code was feeling a bit unwieldy with four named channels).
- I adopted the coerceNumbers and identity technique used by the hexbin transform for the x, y, and weight channels.
- I fixed the first argument to applyFrameAnchor.
- I tweaked how the newChannels are constructed: it’s an object rather than an array, and I handle the fill and stroke = “density” cases separately.
- I removed the use of constant(cx) and constant(cy), since d3-contour does that for us.
- I removed then normalization of the color scale values (in the color = density case) and instead scaled by a fix factor of 100. Mostly because this is simpler and faster and I’m not really sure what we should do here.
- I removed the caching logic in favor of skipping the threshold-finding pass when there’s only one facet-series. If we want to make this faster, we should instead consider making changes to the d3-contour API to avoid recomputing the blurred grid.
- I forced the color scale to include zero in the color = density case, rather than the default which is that the color range will start at the smallest contour value.
* density mark * density contours stroke & fill (#948) * density contours as an initializer: first pass * density by z, carrying styles and channels (no reducer, only "first") * fill: "density" * density weight * consistent thresholds across facets and series note: we have to go around this bug in d3-contour: d3/d3-contour#57 * allow initializer composition; move initializer to the class; clean up * density weight example * error if x or y is undefined (in the future we might route those to 1-dimensional transforms—KDE) * document * reduce img * * don't apply the scales if we are already in pixel space (e.g. when composing with the hexbin transform) * avoids a crash when there is no contour * 1d density contours with frameAnchor * adopt [email protected] for https://github.com/d3/d3-contour/releases/tag/v3.0.2 (not entirely sure if it's good practice to have all the yarn.lock changes in the PR, beyond those that are relevant) * replace example image * mike’s edits * fix image dimensions * isDensity * distinct * tweak tests Co-authored-by: Mike Bostock <[email protected]> Co-authored-by: Philippe Rivière <[email protected]>
Superb. Thank you for the detailed list of changes! |
As a complement to #943:
are not interpretable by the reader, so the legend is not recommended—or maybe we need to normalizeare normalized by the maximum density across all facets and series)todo: