-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid auth error when serializing files across processes #276
Conversation
Hey James, thanks for another valuable contribution! I'm not familiar enough yet to review this, but I think it'll get capable eyes on soon. We're happy to see so much engagement from the community, and we don't have any firm plans yet, but there's interest in bringing on community maintainers and leaders to make this project more sustainable as it continues to grow. Would you be interested, or know anyone who would be? |
@MattF-NSIDC yeah, I'd be happy to help out where I can. My main OSS activity is as a maintainer of Dask. Though I also interact regularly with other projects that interface with Dask like Xarray, Zarr, S3Fs, etc. |
Wonderful! ❤️ FWIW both Luis and I had seen you around GitHub before you started contributing here, and we're very excited to have you helping out :) |
@jrbourbeau we just had a really productive team discussion about bringing on community maintainers. We have team agreement that it would be amazing to bring you on as a maintainer! Welcome to the team ❤️ For now, we're envisioning a simple governance model where This is not something NSIDC has institutional experience with, and @betolink and I have been pushing for developing that experience. We hope to use |
@jrbourbeau is there anything left to do on this one? I think we can merge just had one question about the execution of |
I think this one should be good to go (I've been working off this branch for a while without issue)
I think I missed this -- what's your question? |
@jrbourbeau nice! the question was about if this code is meant to run only in Dask workers. If we are not using Dask we won't hit this right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, this should avoid auth issues when we use distributed workloads
# Attempt to re-authenticate | ||
if not earthaccess.__auth__.authenticated: | ||
earthaccess.__auth__ = auth | ||
earthaccess.login() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create the store in the scheduler why make_instance()
has to execute a login()
again? is this code running only on the scheduler or in the workers? The idea is that we create the store only once and since it's authenticated we send them to the workers right?
Yeah, that's mostly correct. However, this also helps when running |
With #259, when serializing files, we started re-checking the filesystem type (HTTPS vs. S3) and, if we are on AWS but not using S3 files, we now re-open the file using an
s3fs
filesystem.This is good for performance, but when running on a remote Dask cluster, users now need to have all nodes in their cluster (i.e. client, scheduler, and workers) authenticated with
earthaccess
. For example, if only the client session is authenticated, like in this example:users encounter the following error
Full traceback:
Because other nodes, in this case the scheduler, are attempting to check if they're on AWS, but aren't authed with
earthaccess
, so things error.While it's possible to authenticate the other nodes in the cluster with
earthaccess
with some additional client-side code, most users won't want to / know how to handle this additional authentication step.This PR proposes we handle this for them by forwarding the existing client-side
earthaccess
auth object alongside data files, and then re-auth (only if needed) if we need to re-open the file.This seems sensible to me, but is just one possible approach. I'd welcome any thoughts others may have.
cc @betolink @MattF-NSIDC