Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raster mark #1196

Merged
merged 137 commits into from
Jan 11, 2023
Merged

raster mark #1196

merged 137 commits into from
Jan 11, 2023

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Dec 21, 2022

This is an alternative to the pixel mark #1185; the raster mark takes a set of discrete {x, y, fill} samples and produces a corresponding raster grid (image) using putImageData. The canvas is then converted to a URL for use as an svg:image, which is positioned using {x1, y1, x2, y2} abstract coordinates likewise bound to the x and y scales.

@mbostock mbostock requested a review from Fil December 21, 2022 21:44
@mbostock
Copy link
Member Author

I’m not sure why CI is failing; presumably it’s a non-visible difference in the result of canvas.toDataURL. 😞

@Fil
Copy link
Contributor

Fil commented Dec 22, 2022

I'm not sure this mark works for me, because I don't see the use case (besides copying a raster with the same dimensions (?)), and I don't see how it fits with the general approach of Plot's scales etc. To help clarify what we want to cover, I've started a list of use cases in this notebook: https://observablehq.com/@observablehq/pixel-or-imagedata--dev

Two unrelated remarks:

  • Re: opacity, I think we might have to compose it with the A of rgbA color channel?
  • I still want to pass the facet information (the actual value of fx, fy) somewhere in the call to mark.render (render API #501); this would help make this mark facet-dependent. I'm using "facet-dependent" and not "faceted", in the sense that they would not have a partition index, but could act differently depending on the facet they're drawing.

@mbostock mbostock changed the title image data mark raster mark Dec 23, 2022
@mbostock
Copy link
Member Author

Okay, here’s my third attempt… I hope you like it better. 🙏 It feels very convenient for the continuous f(x, y) case, and supports all (continuous) scale types. And it’s still convenient for the simple case of rendering an existing raster grid (e.g., volcano) with linear scales. I would like to implement a contour mark with the same type of x/y/value specification… but I’m not sure how much work I’ll do over Christmas.

type: "diverging"
},
marks: [
Plot.raster(d3.range(width * height), {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be easier if the signature for fill was f({x,y}, i)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this test needs to be “easy”—the easy form is shown above, and this test isn’t supposed to be representative of recommended usage. The point here is to test the explicit form where fill is a function of data.

@mbostock mbostock added the enhancement New feature or request label Dec 30, 2022
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is going. As mentioned in the comments there's more we could do (in terms of performance, compatibility with opacity, faceting, projections…) but they could be done in follow-ups.

Note that Safari currently doesn't support image-rendering: pixelated on svg images which makes it a bit awkward for us. Should we expect Webkit to fix itself, or look for a workaround (foreignObject+img)?

I'm adding a bit of documentation. I must say that I'm still struggling to understand/explain how the x and y channels work in conjunction with width, x1 and x2 (?). I've been trying as an exercise to create the image of the volcano with x, y, fill as channels, but not very successfully (seems like I need to specify all of x1, x2, width…?).

src/marks/raster.js Outdated Show resolved Hide resolved
test/plots/heatmap.js Show resolved Hide resolved
if (xi < 0 || xi >= width) continue;
const yi = Math.floor((Y[i] - y2) * ky);
if (yi < 0 || yi >= height) continue;
const {r, g, b} = rgb(F[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const {r, g, b} = rgb(F[i]);
// TODO: memoize for performance? We'll usually have a maximum of 128 different shades, but 100x more pixels.
const {r, g, b} = rgb(F[i]);

Copy link
Member Author

@mbostock mbostock Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to do the optimization earlier when the scale is applied, like how Yuri does here:

// Convenience function to create a cached color interpolator that
// returns cached rgb objects, avoiding color string parsing.
cacheInterpolator = (interpolator, n = 250) =>
  d3.scaleQuantize(d3.quantize(pc => d3.rgb(interpolator(pc)), n))

https://observablehq.com/@twitter/density-plot@4159

Like maybe there’s some hint that the mark wants {r, g, b} objects instead of color strings and can instruct Plot to materialize those efficiently when applying the color scale. But in any case I think we should do optimizations as follow-up, as you say.

@mbostock
Copy link
Member Author

mbostock commented Jan 2, 2023

I've been trying as an exercise to create the image of the volcano with x, y, fill as channels, but not very successfully (seems like I need to specify all of x1, x2, width…?).

Yes, you must specify all those options. Like this:

Screenshot 2023-01-02 at 11 45 32 AM

Plot.plot({
  marks: [
    Plot.raster(volcano.values, {
      width: volcano.width,
      height: volcano.height,
      x1: 0,
      y1: 0,
      x2: volcano.width,
      y2: volcano.height,
      x: (_, i) => (i % volcano.width) + 0.5,
      y: (_, i) => Math.floor(i / volcano.width) + 0.5,
      fill: (d) => d
    }),
    Plot.frame()
  ]
})

Alternatively if you want the samples at integer locations, you have to offset the bounds by 0.5:

Screenshot 2023-01-02 at 11 45 13 AM

Plot.plot({
  marks: [
    Plot.raster(volcano.values, {
      width: volcano.width,
      height: volcano.height,
      x1: -0.5,
      y1: -0.5,
      x2: volcano.width - 0.5,
      y2: volcano.height - 0.5,
      x: (_, i) => (i % volcano.width),
      y: (_, i) => Math.floor(i / volcano.width),
      fill: (d) => d
    }),
    Plot.frame()
  ]
})

The x and y channels specify the centroids of the pixels, so you need x1, y1, x2, and y2 to specify the extent of the pixel grid; otherwise the raster mark cannot know how wide a pixel is in abstract coordinates. And you need to specify width and height because this is the size of the raster grid (in grid coordinates, i.e., pixel indexes) which might be a different resolution than the samples.

@mbostock
Copy link
Member Author

mbostock commented Jan 2, 2023

If you would prefer that the samples are taken at integer locations by default (rather than offset by 0.5) then we could apply this patch:

% git diff -p
diff --git a/src/marks/raster.js b/src/marks/raster.js
index 71a0af1c..52864d10 100644
--- a/src/marks/raster.js
+++ b/src/marks/raster.js
@@ -19,10 +19,10 @@ export class Raster extends Mark {
       // If X and Y are not given, we assume that F is a dense array of samples
       // covering the entire grid in row-major order. These defaults allow
       // further shorthand where x and y represent grid column and row index.
-      x1 = x == null ? 0 : undefined,
-      y1 = y == null ? 0 : undefined,
-      x2 = x == null ? width : undefined,
-      y2 = y == null ? height : undefined,
+      x1 = x == null ? -0.5 : undefined,
+      y1 = y == null ? -0.5 : undefined,
+      x2 = x == null ? width - 0.5 : undefined,
+      y2 = y == null ? height - 0.5 : undefined,
       imageRendering,
       pixelRatio = 1,
       fill,
@@ -156,7 +156,6 @@ function sampleFill({fill, fillOpacity, pixelRatio = 1, ...options} = {}) {
     if (h === undefined) h = Math.round(Math.abs(y2 - y1) / pixelRatio);
     const kx = (x2 - x1) / w;
     const ky = (y1 - y2) / h;
-    (x1 += kx / 2), (y2 += ky / 2);
     let F, FO;
     if (fill) {
       F = new Array(w * h);

Do you have a preference? I think it looks better to have the grid start and end on integer boundaries.

@Fil
Copy link
Contributor

Fil commented Jan 2, 2023

Integer boundaries look better, I agree.

My difficulty is in precisely articulating how the various options must be combined, or on the contrary are "not compatible". For example, if you try to replace width: volcano.width with a "finger in the air" pixelRatio in the example above, you will get banding (unless pixelRatio=580 / 87). To document this, we need to mention that there is rounding, and that the pixelRatio is imputed from the width of the frame divided by the width of the data.

@mbostock
Copy link
Member Author

mbostock commented Jan 2, 2023

My difficulty is in precisely articulating how the various options must be combined, or on the contrary are "not compatible".

It’s best to think about this as multiple coordinate systems.

First there are a discrete set of samples in abstract coordinates x and y with fill or fillOpacity (or both). These correspond to the raster mark’s data. (The samples are imputed when fill or fillOpacity is specified as a function and data is null.) These samples are typically in an axis-aligned grid but not necessarily so; arbitrary sample positions will be more useful in the future if and when the raster mark supports different methods of interpolation.

Second there is a raster grid (a.k.a. canvas) with its own pixel coordinates. The aforementioned samples are mapped to a canvas that is width pixels by height pixels, with the origin [0, 0] in the top-left corner. The extent of the canvas in pixel coordinates [0, 0, width, height] corresponds to an abstract extent [x1, y1, x2, y2] in the same abstract coordinate system as the samples. (In some cases x1 and x2 can be flipped, and likewise y1 and y2.) During rendering, the raster mark assigns (bins using Math.floor) each sample in x and y to a pixel.

Lastly there are screen coordinates (really Plot frame coordinates, in the range of the x and y scales). These are needed to place the svg:image in the correct position within the Plot frame.

If you specify an incorrect width in the volcano example, then you expect to see banding because you no longer have exactly one sample per pixel in the raster grid. For example, here three columns are missing samples:

Screenshot 2023-01-02 at 2 26 25 PM

Plot.plot({
  marks: [
    Plot.raster(volcano.values, {
      width: volcano.width + 3,
      height: volcano.height,
      x1: 0,
      y1: 0,
      x2: volcano.width,
      y2: volcano.height,
      x: (_, i) => (i % volcano.width) + 0.5,
      y: (_, i) => Math.floor(i / volcano.width) + 0.5,
      fill: (d) => d
    }),
    Plot.frame()
  ]
})

(They should really be transparent though… I think that’s a regression I introduced when adding support for fillOpacity. Fixed.) If and when the raster mark supports better interpolation—something smarter than just binning the samples into rectangular pixels—then we could e.g. fill those gaps with the closest sample, or a blend of nearby samples.

@mbostock mbostock marked this pull request as ready for review January 2, 2023 22:04
@mbostock
Copy link
Member Author

Last step here is to figure out faceting, or at least document that it doesn’t work yet.

@mbostock mbostock requested a review from Fil January 11, 2023 01:12
@mbostock
Copy link
Member Author

@Fil The latest changes are in 1e22d3d...a745cc3; documentation aside, this should be ready to go! 🚢

src/plot.js Show resolved Hide resolved
@Fil
Copy link
Contributor

Fil commented Jan 11, 2023

I'm tempted to say "ship", but this morning I tried to plug in https://observablehq.com/@jobleonard/pseudo-blue-noise instead of randomLcg(42), and the improvement in quality is really remarkable:

walk-on-spheres (before, after)

barycentric (before, after)

contours with random walk (before, after)

@mbostock
Copy link
Member Author

The pseudo blue noise looks nice but I would like to focus on documenting and releasing this and returning to the axis mark. I’d be willing to expose the built-in spatial interpolation methods as functions so that you can pass in a custom RNG.

@mbostock
Copy link
Member Author

The latest two commits expose the built-in spatial interpolation methods

  • Plot.interpolateNone
  • Plot.interpolateNearest
  • Plot.interpolatorBarycentric({random})
  • Plot.interpolatorRandomWalk({random, minDistance, maxSteps})

So you can now provide your own RNG that uses pseudo blue noise e.g.

@mbostock mbostock merged commit 44b4a1d into main Jan 11, 2023
@mbostock mbostock deleted the mbostock/image-data branch January 11, 2023 16:53
This was referenced Jan 14, 2023
chaichontat pushed a commit to chaichontat/plot that referenced this pull request Jan 14, 2024
* image data mark

* PreTtiER

* handle invalid data; stride, offset

* handle flipped images

* archive test failure artifacts

* skip image data tests, for now

* PreTtiER

* only ignore generated images in CI

* only ignore large generated images

* fillOpacity

* tweak

* fix formula

* PreTtiER

* volcano

* more idiomatic heatmap

* fill as f(x, y)

* pixel midpoints

* PreTtiER

* not pixelated, again

* PreTtiER

* raster

* pixelRatio

* fix aria-label; comments

* Goldstein–Price

* tentative documentation for Plot.raster

* fix partial coverage of sample fill

* raster fillOpacity

* require x1, y1, x2, y2

* validate width, height

* fix for sparse samples

* better error on missing scales

* document

* floor rounded (or floored?)

* exploration for a "nearest" raster interpolate method

* barycentric interpolation
see https://observablehq.com/@visionscarto/igrf-90

* raster tuple shorthand

* barycentric interpolate and extrapolate

* only maybeTuple if isTuples

* allow marks to apply scales selectively (like we do with projections)

* interpolate on values

* 3 interpolation methods for the nearest neighbor: voronoi renderCell, quadree.find, delaunay.find. This is completely gratuitous since they all run in less than 1ms… It's even hard to know which one is the fastest, because if I loop on 100s of them the browser starts to thrash (allocating so much memory for images it immediately discards, I guess…)

* barycentric walmart

* fold mark.project into mark.scale

* fix barycentric extrapolation

* materialize fewer arrays

* use channel names

* don’t pass {r, g, b, a}

* don’t overload x & y channels

* fix inverted x or y; simplify example

* simpler

* fix grid orientation

* only stroke if opaque

* optional x1, y1, x2, y2

* shorten

* fix order

* const

* rasterize

* The performance measurements I had done were just rubbish (I forgot to await on the promises!).
Measuring the three methods on the ca55 dataset I see this order: voronoi cellRender (180ms), delaunay find (220ms), quadtree (500ms).

* rasterize

* tolerance for points that are on a triangle's edge

* use a symbol for values that need extrapolation, simplify and fix a few issues, use a mixing function for categorical interpolation

* rasterize with walk on spheres

* document rasterize

* pixelSize

* default to full frame

* remove ignored options

* reformat options

* fix the ca55 tests (the coordinates represent a planar projection)

* caveat about webkit/safari

* remove console.log

* more built-in rasterizers

* fix walk-on-spheres implementation; remove blur

* port fixes to wos

* adaptive extrapolation

* fillOpacity fixes

* renames walk-on-spheres to random-walk; documents the rasterize option

rationale for the renaming: "random-walk" is more commonly known, and expresses well enough what's happening. Walk on spheres converges much faster than a basic random walk would, and makes it feasible, but it is a question of implementation.

* a constant fillOpacity informs the opacity property on the g element, not the opacity of each pixel

* fix bug with projection clip in indirectStyles

* performance optimizations for randow-walk:
1. use rasterizeNull to boot; if we have more samples (and a costlier delaunay), at least we have less pixels to impute.
2. cache more aggressively the result of delaunay.find: at the beginning of each line, for each pixel, and for each step of the walk
On actual tests it can be up to 2x faster.

* sample pixel centroids

* fix handling of undefined values

* use transform for equirectangular coordinates

* don’t delete

* stroke if constant fillOpacity

* fix test snapshots

* fix typo in test name

* note potential bias caused by stroke

* rename tests

* don’t bootstrap random-walk with none

* terminate walk when minimum distance is reached

* comment re. opacity

* comment re. none order bias

* contour mark

* dense grid contours

* consolidate code

* more code consolidation

* cleaner

* cleaner deferred channels

* interpolate, not rasterize

* blur

* cleaner

* use typed array when possible

* optimize barycentric interpolation

* nicer contours for ca55 with barycentric+blur 3; support raster blur

Contour blurring is unchanged, and blurs the abstract data (with a linear interpolation).
Raster blurring is made with d3.blurImage. Two consequences:
* we can now blur “categorical” colors, if we want to smooth out the image and give it a polished look in the higher variance regions. (This works very well when we have two colors, but with more categories there is a risk of hiding the components of a color, making the image more difficult to understand. Anyway, it’s available as an option to play with.)
* for quantitative data, and with a color scale with continuous scheme and linear transform, this is very close to linear interpolation; but if the underlying data is better rendered with a log color scale, the color interpolation takes this into account (which IMO is better).

* ignore negative blur

* cleaner tests

* for contours, filter points with missing X and Y before calling the interpolate function, and ignore x and y filters on geometries

* fix barycentric interpolate for filtered points

note: the penguins dataset is full of surprises since some points are occluded by others of a different species…

* contour shorthands

* fix contour filtering

* filter value, too

* materialize x and y when needed

* default to nearest

* comment

* remove obsolete opacity trick

* better contour thresholds; fix test

* nullish instead of undefined

* renderBounds

* fix circular import

* a hand-written Peters projection seemed more fun than the sqrt scale; tests the same thing

* update raster documentation with interpolate; document contour

* document Plot.identity

* peters axes

* symmetric Peters

* style tweak

* NaN instead of null

* avoid error when empty quantile domain

* faceted sampler raster

* fix test snapshot

* faceted contour; fix dense faceted raster

… and fix default contour thresholds

* expose spatial interpolators

* pass x, y, step

* error when data undefined, but not null

* d3 7.8.1

Co-authored-by: Philippe Rivière <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants