-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RasterIO backend #1260
Merged
Merged
Add RasterIO backend #1260
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
1643ce5
rasterio checkin
067dedb
temp fixes
NicWayand 7906cfd
update rasterio reader, no lazy loading, no decoding of coords
2e1b528
keep band dim even for single band. Fix longitude typo
NicWayand 532c5d3
Fix lat/lon decoding. Remove requirment comment
8bc3da3
Attr error suppression. DataArray to Variable objects. CI requirment …
77cc0ca
remove >= requirment
776cbd9
added conda-forge channel to CI check
eb739de
add scipy requirement
3a394ae
roll back ci requirements. Rename vars
061b8fd
roll back
c0962fa
fixed coord spacing bug where x and y were +1 dim than raster. Uses n…
NicWayand 7275ffa
change test env to py36
fmaussion 51a60af
first tests
fmaussion 5155634
other tests
fmaussion 09196ee
fix test
fmaussion e2b6786
get the order right
fmaussion 228a5a3
some progress with indexing
fmaussion 4b57d1c
cosmetic changes
fmaussion 2d21b4b
Conflicts
fmaussion c2cb927
More rebase
fmaussion f86507e
looking good now. Testing
fmaussion e1a5b31
docs
fmaussion 7cb8baf
whats new
fmaussion 48c7268
fix test
fmaussion 70bd03a
reviews
fmaussion 4d49195
Merge branch 'master' into feature-rasterio
fmaussion 3f18144
docs
fmaussion 3e3e6fb
Merge remote-tracking branch 'upstream/master' into feature-rasterio
fmaussion 1ae2d9b
more reviews
fmaussion 955f6b9
chunking and caching
fmaussion 7d8fe4d
Merge remote-tracking branch 'upstream/master' into feature-rasterio
fmaussion 223ce0c
Final tweaks
fmaussion 6cf2ce9
Lock-doc tweaks
fmaussion 2cd0386
Merge branch 'master' into feature-rasterio
fmaussion 9193a2b
Add rasterio to other test suites
fmaussion 4bf4b6a
Merge remote-tracking branch 'origin/feature-rasterio' into feature-r…
fmaussion c778948
use context managers in tests for windows
fmaussion 4299957
Change example to use an accessor
fmaussion fcdd894
Reviews
fmaussion 1ca6e38
Merge branch 'master' into feature-rasterio
fmaussion d5c964e
typo
fmaussion File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ dependencies: | |
- scipy | ||
- seaborn | ||
- toolz | ||
- rasterio | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,4 @@ dependencies: | |
- scipy | ||
- seaborn | ||
- toolz | ||
- rasterio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ dependencies: | |
- scipy | ||
- seaborn | ||
- toolz | ||
- rasterio | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,4 @@ dependencies: | |
- scipy | ||
- seaborn | ||
- toolz | ||
- rasterio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ dependencies: | |
- scipy | ||
- seaborn | ||
- toolz | ||
- rasterio | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# -*- coding: utf-8 -*- | ||
""" | ||
.. _recipes.rasterio: | ||
|
||
================================= | ||
Parsing rasterio's geocoordinates | ||
================================= | ||
|
||
|
||
Converting a projection's cartesian coordinates into 2D longitudes and | ||
latitudes. | ||
|
||
These new coordinates might be handy for plotting and indexing, but it should | ||
be kept in mind that a grid which is regular in projection coordinates will | ||
likely be irregular in lon/lat. It is often recommended to work in the data's | ||
original map projection. | ||
""" | ||
|
||
import os | ||
import urllib.request | ||
import numpy as np | ||
import xarray as xr | ||
import cartopy.crs as ccrs | ||
import matplotlib.pyplot as plt | ||
from rasterio.warp import transform | ||
|
||
|
||
# Download the file from rasterio's repository | ||
url = 'https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif' | ||
urllib.request.urlretrieve(url, 'RGB.byte.tif') | ||
|
||
# Read the data | ||
da = xr.open_rasterio('RGB.byte.tif') | ||
|
||
# Compute the lon/lat coordinates with rasterio.warp.transform | ||
ny, nx = len(da['y']), len(da['x']) | ||
x, y = np.meshgrid(da['x'], da['y']) | ||
|
||
# Rasterio works with 1D arrays | ||
lon, lat = transform(da.crs, {'init': 'EPSG:4326'}, | ||
x.flatten(), y.flatten()) | ||
lon = np.asarray(lon).reshape((ny, nx)) | ||
lat = np.asarray(lat).reshape((ny, nx)) | ||
da.coords['lon'] = (('y', 'x'), lon) | ||
da.coords['lat'] = (('y', 'x'), lat) | ||
|
||
# Compute a greyscale out of the rgb image | ||
greyscale = da.mean(dim='band') | ||
|
||
# Plot on a map | ||
ax = plt.subplot(projection=ccrs.PlateCarree()) | ||
greyscale.plot(ax=ax, x='lon', y='lat', transform=ccrs.PlateCarree(), | ||
cmap='Greys_r', add_colorbar=False) | ||
ax.coastlines('10m', color='r') | ||
plt.show() | ||
|
||
# Delete the file | ||
os.remove('RGB.byte.tif') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
import os | ||
from collections import OrderedDict | ||
from distutils.version import LooseVersion | ||
import numpy as np | ||
|
||
from .. import DataArray | ||
from ..core.utils import DunderArrayMixin, NdimSizeLenMixin, is_scalar | ||
from ..core import indexing | ||
try: | ||
from dask.utils import SerializableLock as Lock | ||
except ImportError: | ||
from threading import Lock | ||
|
||
RASTERIO_LOCK = Lock() | ||
|
||
_ERROR_MSG = ('The kind of indexing operation you are trying to do is not ' | ||
'valid on rasterio files. Try to load your data with ds.load()' | ||
'first.') | ||
|
||
|
||
class RasterioArrayWrapper(NdimSizeLenMixin, DunderArrayMixin): | ||
"""A wrapper around rasterio dataset objects""" | ||
def __init__(self, rasterio_ds): | ||
self.rasterio_ds = rasterio_ds | ||
self._shape = (rasterio_ds.count, rasterio_ds.height, | ||
rasterio_ds.width) | ||
self._ndims = len(self.shape) | ||
|
||
@property | ||
def dtype(self): | ||
dtypes = self.rasterio_ds.dtypes | ||
if not np.all(np.asarray(dtypes) == dtypes[0]): | ||
raise ValueError('All bands should have the same dtype') | ||
return np.dtype(dtypes[0]) | ||
|
||
@property | ||
def shape(self): | ||
return self._shape | ||
|
||
def __getitem__(self, key): | ||
|
||
# make our job a bit easier | ||
key = indexing.canonicalize_indexer(key, self._ndims) | ||
|
||
# bands cannot be windowed but they can be listed | ||
band_key = key[0] | ||
n_bands = self.shape[0] | ||
if isinstance(band_key, slice): | ||
start, stop, step = band_key.indices(n_bands) | ||
if step is not None and step != 1: | ||
raise IndexError(_ERROR_MSG) | ||
band_key = np.arange(start, stop) | ||
# be sure we give out a list | ||
band_key = (np.asarray(band_key) + 1).tolist() | ||
|
||
# but other dims can only be windowed | ||
window = [] | ||
squeeze_axis = [] | ||
for i, (k, n) in enumerate(zip(key[1:], self.shape[1:])): | ||
if isinstance(k, slice): | ||
start, stop, step = k.indices(n) | ||
if step is not None and step != 1: | ||
raise IndexError(_ERROR_MSG) | ||
elif is_scalar(k): | ||
# windowed operations will always return an array | ||
# we will have to squeeze it later | ||
squeeze_axis.append(i+1) | ||
start = k | ||
stop = k+1 | ||
else: | ||
k = np.asarray(k) | ||
start = k[0] | ||
stop = k[-1] + 1 | ||
ids = np.arange(start, stop) | ||
if not ((k.shape == ids.shape) and np.all(k == ids)): | ||
raise IndexError(_ERROR_MSG) | ||
window.append((start, stop)) | ||
|
||
out = self.rasterio_ds.read(band_key, window=window) | ||
if squeeze_axis: | ||
out = np.squeeze(out, axis=squeeze_axis) | ||
return out | ||
|
||
|
||
def open_rasterio(filename, chunks=None, cache=None, lock=None): | ||
"""Open a file with rasterio (experimental). | ||
|
||
This should work with any file that rasterio can open (most often: | ||
geoTIFF). The x and y coordinates are generated automatically from the | ||
file's geoinformation. | ||
|
||
Parameters | ||
---------- | ||
filename : str | ||
Path to the file to open. | ||
|
||
Returns | ||
------- | ||
data : DataArray | ||
The newly created DataArray. | ||
chunks : int, tuple or dict, optional | ||
Chunk sizes along each dimension, e.g., ``5``, ``(5, 5)`` or | ||
``{'x': 5, 'y': 5}``. If chunks is provided, it used to load the new | ||
DataArray into a dask array. | ||
cache : bool, optional | ||
If True, cache data loaded from the underlying datastore in memory as | ||
NumPy arrays when accessed to avoid reading from the underlying data- | ||
store multiple times. Defaults to True unless you specify the `chunks` | ||
argument to use dask, in which case it defaults to False. | ||
lock : False, True or threading.Lock, optional | ||
If chunks is provided, this argument is passed on to | ||
:py:func:`dask.array.from_array`. By default, a global lock is | ||
used to avoid issues with concurrent access to the same file when using | ||
dask's multithreaded backend. | ||
""" | ||
|
||
import rasterio | ||
riods = rasterio.open(filename, mode='r') | ||
|
||
if cache is None: | ||
cache = chunks is None | ||
|
||
coords = OrderedDict() | ||
|
||
# Get bands | ||
if riods.count < 1: | ||
raise ValueError('Unknown dims') | ||
coords['band'] = np.asarray(riods.indexes) | ||
|
||
# Get geo coords | ||
nx, ny = riods.width, riods.height | ||
dx, dy = riods.res[0], -riods.res[1] | ||
x0 = riods.bounds.right if dx < 0 else riods.bounds.left | ||
y0 = riods.bounds.top if dy < 0 else riods.bounds.bottom | ||
coords['y'] = np.linspace(start=y0, num=ny, stop=(y0 + (ny - 1) * dy)) | ||
coords['x'] = np.linspace(start=x0, num=nx, stop=(x0 + (nx - 1) * dx)) | ||
|
||
# Attributes | ||
attrs = {} | ||
if hasattr(riods, 'crs'): | ||
# CRS is a dict-like object specific to rasterio | ||
# We convert it back to a PROJ4 string using rasterio itself | ||
attrs['crs'] = riods.crs.to_string() | ||
# Maybe we'd like to parse other attributes here (for later) | ||
|
||
data = indexing.LazilyIndexedArray(RasterioArrayWrapper(riods)) | ||
|
||
# this lets you write arrays loaded with rasterio | ||
data = indexing.CopyOnWriteArray(data) | ||
if cache and (chunks is None): | ||
data = indexing.MemoryCachedArray(data) | ||
|
||
result = DataArray(data=data, dims=('band', 'y', 'x'), | ||
coords=coords, attrs=attrs) | ||
|
||
if chunks is not None: | ||
from dask.base import tokenize | ||
# augment the token with the file modification time | ||
mtime = os.path.getmtime(filename) | ||
token = tokenize(filename, mtime, chunks) | ||
name_prefix = 'open_rasterio-%s' % token | ||
if lock is None: | ||
lock = RASTERIO_LOCK | ||
result = result.chunk(chunks, name_prefix=name_prefix, token=token, | ||
lock=lock) | ||
|
||
# Make the file closeable | ||
result._file_obj = riods | ||
|
||
return result |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in #1260, consider adding caching here. At the least, I think we want to use copy on write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean here, sorry :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub seems to lose the specific comment link -- if you click edit on my comment here you'll find it.
To be more concrete, after this line I would suggest adding:
cache
would be an optional boolean argument, like it is onopen_dataset
.This also argues for adding a
chunks
argument, which is set uses dask to chunk the data and disables the cache. This should probably create a token for dask similar to how we do it foropen_dataset
:xarray/xarray/backends/api.py
Lines 244 to 254 in f517be7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing
cache
went well (appart from this bug).For chunking I have a question:
DataArray.chunk()
doesn't have thename_prefix
,token
andlock
keywords. Any reason for this?