Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift pixel coordinates of tiles using proper affine transformation #150

Merged
merged 6 commits into from
Jun 7, 2019

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jun 3, 2019

Details matter, even if it's only half a pixel (especially if that pixel is ~250m in length)! This ambitious pull request deals with the mess that goes with converting between xarray's centre-based pixel coordinates and rasterio's corner-based pixel coordinates.

Gridline (corner-based) vs Pixel (centre-based) registration of pixel values

The problem sort of surfaced itself in #88, but I thought it was a bug introduced from rasterio 1.0.13 to 1.0.14 (which I have tried several times unsuccessfully to pin down and fix behind the scenes). Turns out that the unit test for data_prep.get_window_bounds code here was wrong all along! One of the bounds' output maxy (36.5) was greater than the y-range (1-33)!!

Also notice that the extra 0.5 in 36.5 is actually not 'correct' for rasterio's corner-based pixel coordinate system. A minx of 4.5 and maxx of 36.5 means there is a half-pixel shift to the right of where we want our bounding box to lie. This extra 0.5 is because we are reading the netcdf raster using xarray, that uses a centre-based pixel coordinate system, and then passing the task off to rasterio...

So the raster's origin pixel reference needs to be updated when we switch between xarray and rasterio, which can be done by setting the correct affine transformation. Ideally, xarray/netcdf grids would have some way of storing projection information, but since it's a complicated work-in-progress matter, we'll use salem in the meantime to do the xarray-to-rasterio coordinate shift.

Fixes #84 (and #88, #90, #98, #102, #117...).
Might help with resolving #58.

References:

Extends xarray with a special accessor for automated projection parsing! Package repository at https://github.com/fmaussion/salem. Also added required dependency joblib.
@weiji14 weiji14 added bug 🪲 Something isn't working enhancement ✨ New feature or request data 🗃️ Pull requests that update input datasets labels Jun 3, 2019
@weiji14 weiji14 added this to the v0.9.0 milestone Jun 3, 2019
@weiji14 weiji14 self-assigned this Jun 3, 2019
weiji14 added 3 commits June 3, 2019 14:33
Patches 863d3be with a relatively easy fix in data_prep.ascii_to_xyz. [pyproj](https://github.com/pyproj4/pyproj) v2.2.0 now defines a CRS from string and does reprojection a bit differently than before in v2.1.3. See also:
- [Release notes](https://github.com/pyproj4/pyproj/releases/tag/v2.2.0rel)
- [Commits]pyproj4/pyproj@v2.1.3rel...v2.2.0rel)
Bumps [rasterio](https://github.com/mapbox/rasterio) from 1.0.13 to 1.0.23. Supersedes #117. Also show (a part of) failing tests in test_ipynb.ipynb that needs to be resolved in #150.
- [Release notes](https://github.com/mapbox/rasterio/releases)
- [Changelog](https://github.com/mapbox/rasterio/blob/master/CHANGES.txt)
- [Commits](rasterio/rasterio@1.0.13...1.0.23)
Adjusting the bounding box positions of (some) tiles by half a pixel (roughly 500m). Fixed a poorly written unit test for the data_prep.get_window_bounds function introduced in ab0295e right before release v0.4.0, and refactored the function to make it work correctly with newer libraries. Strangely enough, after recreating the geojson tiles, the coordinate changes are only for the Operation Ice Bridge/CRESIS Basler and TO grids, and we still have exactly 2347 tiles!

Specifically, we've switched from using xarray.open_rasterio to xarray.open_dataarray as xarray's rasterio backend fails to read NetCDF georeferencing information properly with rasterio >= 1.0.14 (see #84). The projection information is monkey-patched into the xarray grid using salem. Most important bit is to ensure that we go from xarray's centre-based pixel coordinate system to rasterio's corner-based pixel coordinate system (accomplished with dataset.salem.grid.corner_grid) via a manually set Affine transformation.
To make the data_prep.selective_tile more understandable, I've 'simplified' the doctest with a diagonal array, and actually used more geographically correct (corner-based) pixel bounds. Also updated test_data_prep.py to use the v0.7.0 release grid instead of v0.4.0.

Issue with rasterio.open not having proper affine transformation on netcdf files solved via the most ludricrous method of all - importing xarray before rasterio... Will submit a bug report after this, but in the meantime, the code can stay mostly intact, phew! Next step is to possibly remove salem, and refactor the selective_tile function a bit more actually, by using xarray.open_rasterio and the xarray.DataArray.sel by xy slice method.
Fragments appear to be the same with 63b3ef1?!! Just reuploading to quilt (new hash is 4d3b5a17f63b35212b5a0a210d9b88059a9ff88ad2d5b2de15d9535f48002e13 at https://quiltdata.com/package/weiji14/deepbedmap), not sure if there might be some minor changes. Gave up on refactoring data_prep.get_window_bounds (salem still seems to be needed...) and data_prep.selective_tile (lazy to figure out resampling of xarray grids). Anyways, time to get on with some neural network model training!
@weiji14 weiji14 merged commit eaa3a16 into refactor_grdtrack Jun 7, 2019
weiji14 added a commit that referenced this pull request Jun 7, 2019
Closes #150 Shift pixel coordinates of tiles using proper affine transformation.
@weiji14 weiji14 deleted the shifting_tiles branch June 7, 2019 08:47
weiji14 added a commit that referenced this pull request Jun 13, 2019
Oh the two weeks of background work just to come to this, pygmt.grdtrack and whatnot! Our srgan_train.get_deepbedmap_test_result function now evaluates the trained neural network model's predicted grid directly in memory instead of on a file! Lots of temporary file related boilerplate code removed, woohoo! There's been plenty of background work to get this in-memory grid to be georeferenced correctly (see #150, 7fd3345, 4a074d9), but it all ends up making the evaluation more accurate, and hopefully faster to run and cleaner to scale up.

Note also that the deepbedmap.get_image_and_bounds function has been refactored and renamed to get_image_with_bounds. The previous bounds was using xarray's centre-based pixel coordinates when it should have returned rasterio-style corner-based pixel coordinates used by data_prep.selective_tile. This is resolved using good ol' salem, and there's some extra code to handle getting the bounds for multiple inputs (instead of relying on salem's .extent function). The bounds themselves are now stored as an attribute inside the groundtruth xarray grid. As we are returning an xarray.DataArray grid instead of a numpy.array, we can use xarray's .plot() method to plot merged groundtruth grids (that are not on a regular grid) in a less funny way. There is also an 'indexers' parameter introduced to enable manually getting bounding boxes exactly divisible by 4, a quirky requirement of DeepBedMap...

The hardcoded bounding box view of Antarctica used in our deepbedmap.feature integration test has been updated in alignment with all the changes above (pixel offsets, manual crops, etc). New 'weiji14/deebedmap/model/test' tiles have been uploaded to quilt, and the new quilt hash to use is df0d28b24283c642f5dbe1a9baa22b605d8ae02ec1875c2edd067a614e99e5a4. Also patching 4a074d9 to fix data_prep.selective_tile's masking not handling fp16 NaN conversions as the Synthetic HighRes geotiff was acting up, and remove NaN checks after gapfilling as it trips up the big DeepBedMap tiller. What else? Phew!
weiji14 added a commit that referenced this pull request Jun 13, 2019
Oh the two weeks of background work just to come to this, pygmt.grdtrack and whatnot! Our srgan_train.get_deepbedmap_test_result function now evaluates the trained neural network model's predicted grid directly in memory instead of on a file! Lots of temporary file related boilerplate code removed, woohoo! There's been plenty of background work to get this in-memory grid to be georeferenced correctly (see #150, 7fd3345, 4a074d9), but it all ends up making the evaluation more accurate, and hopefully faster to run and cleaner to scale up.

Note also that the deepbedmap.get_image_and_bounds function has been refactored and renamed to get_image_with_bounds. The previous bounds was using xarray's centre-based pixel coordinates when it should have returned rasterio-style corner-based pixel coordinates used by data_prep.selective_tile. This is resolved using good ol' salem, and there's some extra code to handle getting the bounds for multiple inputs (instead of relying on salem's .extent function). The bounds themselves are now stored as an attribute inside the groundtruth xarray grid. As we are returning an xarray.DataArray grid instead of a numpy.array, we can use xarray's .plot() method to plot merged groundtruth grids (that are not on a regular grid) in a less funny way. There is also an 'indexers' parameter introduced to enable manually getting bounding boxes exactly divisible by 4, a quirky requirement of DeepBedMap...

The hardcoded bounding box view of Antarctica used in our deepbedmap.feature integration test has been updated in alignment with all the changes above (pixel offsets, manual crops, etc). New 'weiji14/deebedmap/model/test' tiles have been uploaded to quilt, and the new quilt hash to use is df0d28b24283c642f5dbe1a9baa22b605d8ae02ec1875c2edd067a614e99e5a4. Also patching 4a074d9 to fix data_prep.selective_tile's masking not handling fp16 NaN conversions as the Synthetic HighRes geotiff was acting up, and remove NaN checks after gapfilling as it trips up the big DeepBedMap tiller. What else? Phew!
weiji14 added a commit that referenced this pull request Sep 2, 2019
Towards even better pixel registration at the expense of more computing cost! Patches #150 properly. The xyz_to_grid function has been giving us gridline node registered grids since forever, because that's `gmt surface`'s default setting. Now converting it to pixel node registration to align better with rasterio and other geospatial packages. Also using simple slicing for our high resolution tiles to avoid interpolating to NaNs at the edges! Total tile counts have increased from 3379 to 4028, O.O, mostly from the Basler grid around the Siple Coast region. New training tiles updated on Quilt with hash af86cf135ffe5ed9f78fc65231e3aa0bfc90f45e33b6bccda9f08f392c090113.

Though `surface` does have an `-r` setting to set pixel node registration directly, I looked at it and it gave strange diagonal strips for 2007tx.nc, so nope, we'll use `grdsample` instead, the only fault being we lose data lineage/provenance when checking the grid using `gmt grdinfo`. Hopefully the half pixel ghost doesn't haunt us ever again. Also note to self, to write up wrappers for `gmt blockmedian` and `gmt grdsample` to reduce boilerplate code in xyz_to_grid function.

Reason for doing this was because the more precise selective_tile script (since e7936db) was so exact, that it realized our bounding boxes was outside of the (gridline registered) groundtruth grids and gave us NULL values! This problem didn't surface itself until I was trying to train the neural network, and was strangely getting NaN values in the discriminator loss after just one epoch, no matter what hyperparameters I used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working data 🗃️ Pull requests that update input datasets enhancement ✨ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant