Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can KubernetesRun.image and DaskExecutor...image differ from Prefect Agent image? #40

Open
cisaacstern opened this issue Feb 17, 2022 · 3 comments

Comments

@cisaacstern
Copy link
Member

cisaacstern commented Feb 17, 2022

Dynamically selecting an versioned Docker image for a given recipe execution seems quite important. We currently have multiple released versions of pangeo-forge-recipes, and even if current recipe developers only use the latest version, that doesn't solve the problem of released feedstocks having been built with specific older versions. We always need to be able re-run all released feedstocks.

As a concrete example, our released NOAA OISST feedstock was released with pangeo-forge-recipes==0.5.0, which is already many versions behind the bakery image (pinned to pangeo-forge-recipes==0.6.1) that was used to deploy our current bakery:

BAKERY_IMAGE="pangeo/pangeo-forge-bakery-images:pangeonotebook-2021.07.17_prefect-0.14.22_pangeoforgerecipes-0.6.1"

My question for @tracetechnical and @sharkinsspatial is, should we expect that, if at the time of registration we change the value of image in the KubernetesRun and DaskExecutor...

flow.run_config = KubernetesRun(
job_template=job_template,
image=os.environ["BAKERY_IMAGE"],

flow.executor = DaskExecutor(
cluster_class="dask_kubernetes.KubeCluster",
cluster_kwargs={
"pod_template": make_pod_spec(
image=os.environ["BAKERY_IMAGE"],

...that this will create workers which can execute recipes with a different environment from the Prefect Agent?

As far as I can tell, the only place that the BAKERY_IMAGE used to deploy the bakery is actually used is in the Prefect Agent deployment spec

...so dynamically setting the image for the KubernetesRun and DaskExecutor should be possible? I've been experimenting with this for ... a few hours now 🙃 ... and not quite able to get it to work yet, so just wanted to check that I wasn't overlooking something obvious.

cc @rabernat @jhamman

@sharkinsspatial
Copy link
Contributor

sharkinsspatial commented Feb 17, 2022

@cisaacstern I have been searching for the issue which details this but can't seem to locate it. In our discussions with the Prefect team there are 2 versioning compatibility issues to account for. The first is the Python version used for flow serialization, this must match the Python version used on the container where the Prefect agent is running for flow deserialization to function properly. The second is the Prefect Agent version. According to the Prefect support team this should be backward compatible so the Prefect Agent will support flows created with any =< version.

I'd like to find a canonical link to Prefect documentation or an issue discussion which confirms this so we can reference it here.

Dynamically setting the image for KubernetesRun and the DaskExecutor is what happens in pangeo-forge-prefect currently for AKS bakeries so I can confirm this should be working, what is the error you are seeing in your environment. See here and here for examples of this dynamic registration.

@sharkinsspatial
Copy link
Contributor

@cisaacstern For reference this is a previous comment where I discussed some of our image versioning concerns pangeo-forge/roadmap#18 (comment)

@cisaacstern
Copy link
Member Author

cisaacstern commented Feb 17, 2022

Thanks for weighing in, @sharkinsspatial. I now have a version of this idea working in https://github.com/pangeo-forge/registrar/pull/12.

what is the error you are seeing in your environment

I was seeing an version incompatibility problem in the worker container (the specifics might not matter, but it was an s3fs + aiobotocore mismatch). Upgrading fsspec/s3fs/gcsfs in the 0.5.0 worker image seems to have solved this; xref pangeo-forge/pangeo-forge-bakery-images#28.

I am now unsure if this was actually an image incompatibility problem, or rather that the prior release of the 0.5.0 image was itself broken, and required an fsspec/s3fs/gcsfs upgrade just to work at all. These two scenarios are hard to disambiguate now that I have the following three images

graph LR;
    registrar -->  agent --> worker;
Loading

all using fsspec/s3fs/gcsfs == 2011.11.0.

What I'm not clear on, however, is what would happen if one or multiple of the above images had different fsspec/s3fs/gcsfs versions than the others. We will encounter this scenario quite soon, though, because recent releases of pangeo-forge-recipes use later versions of fsspec et al than 2011.11.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants