-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic chunking of arrays ? #4055
Comments
so Lines 1736 to 1737 in bd84186
A PR would be very welcome if you have the time, @AndrewWilliams3142 |
Oh ok I didn't know about this, I'll take a look and read the contribution docs tomorrow ! It'll be my first PR so may need a bit of hand-holding when it comes to tests. Willing to try though! |
Awesome! Please see https://xarray.pydata.org/en/stable/contributing.html for docs on contributing |
Agreed, this would be very welcome!
|
Cheers! Just had a look, is it as simple as just changing this line to the following, @dcherian ? if isinstance(chunks, Number) or chunks=='auto':
chunks = dict.fromkeys(self.dims, chunks) This seems to work fine in a lot of cases, except automatic chunking isn't implemented for One option is to automatically use |
Also, the contributing docs have been super clear so far! Thanks! :) |
Can we catch this error and re-raise specifying "automatic chunking fails for object arrays. These include cftime DataArrays" or something like that? |
Nice, that's neater! Would this work, in the def maybe_chunk(name, var, chunks):
chunks = selkeys(chunks, var.dims)
if not chunks:
chunks = None
if var.ndim > 0:
# when rechunking by different amounts, make sure dask names change
# by provinding chunks as an input to tokenize.
# subtle bugs result otherwise. see GH3350
token2 = tokenize(name, token if token else var._data, chunks)
name2 = f"{name_prefix}{name}-{token2}"
try:
return var.chunk(chunks, name=name2, lock=lock)
except NotImplementedError as err:
raise Exception("Automatic chunking fails for object arrays."
+ "These include cftime DataArrays.")
else:
return var |
The error message from dask is already pretty descriptive: I don't think we have much to add on top of that? |
I also thought that, after the dask error message it's pretty easy to then look at the In general though, is that the type of layout you'd suggest for catching and re-raising errors? Using |
If we think can improve an error message by adding additional context, the right solution is to use On the other hand, if xarray doesn't have anything more to add on top of the original error message, it is best not to add any wrapper at all. Users will just see the original error from dask. |
Hi there,
Hopefully this turns out to be a basic issue, but I was wondering why the
chunks='auto'
that dask seems to provide (https://docs.dask.org/en/latest/array-chunks.html#automatic-chunking) isn't an option for xarray? I'm not 100% sure of how dask decides how to automatically chunk its arrays, so maybe there's a compatibility issue?I get the impression that the dask method automatically tries to prevent the issues of "too many chunks" or "too few chunks" which can sometimes happen when choosing chunk sizes automatically. If so, it would maybe be a useful thing to include in future versions?
Happy to be corrected if I've misunderstood something here though, still getting my head around how the dask/xarray compatibility really works...
Cheers!
The text was updated successfully, but these errors were encountered: