-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoization Storage #3587
Comments
I think we can use the Artifact drivers to store the caching metadata. |
Interesting idea. We just need storage and this is a good option. |
This is required for caching large outputs because etcd places a limit on the maximum size of a configmap. Piggy-backing on the artifact storage sounds like it should be feasible to me. |
Hi, Is anybody working on this issue? |
I'm interested in this. I want to try it |
I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this? |
Here is the current memoization implementation document https://github.com/argoproj/argo-workflows/blob/master/docs/memoization.md |
Okay. Thank You . |
If you'd like to do this as part of GSoC, you'll need to sign up here: https://summerofcode.withgoogle.com GSoC does not start for several months, so if you're instead looking to make impact today, and don't need the benefits of GSoC (see their website for the details), then mentoring might the right approach for you. |
@alexec The applications for participants will open in April right? I'm actually a bit new to all of this. |
@A-Muk Please take a look at the links available in https://github.com/argoproj/argo-workflows/blob/master/docs/mentoring.md#how-to-participate-google-summer-of-code |
Thank you @terrytangyuan |
How can I apply for this idea for GSOC? is there any communication channel with the mentors? |
@alexec Hey, can you please help me understand how should I go into mentoring? I've been working with Argo-workflows for 1 year. |
Has there been any progress on making DB/Storage the output location for steps and enabling caching of the steps? if I were to have a step that outputs artifcats to S3 as its final step, and the same step is used in another workflow, or if the same workflow is rerun, does the current ConfigMap based cache implemenation understand that this same step has run earlier, and skip it and use the cache to continue to next step? If so, a simple solution could be to simply write to s3 location with some identification from user side for cache version, and use the same S3 path as input in the next dependent step to kind of emulate caching. Am i right or wrong? |
Summary
Memoization is a feature that allows users to run workflows faster by avoiding repeating work that has already been done.
Currently memoization uses a Kubernetes config map for storage. This will not scale to large number of entries, it requires elevated RBAC. Instead, we should provide the option to use a alternative database to store these in.
Motivation
Large workflows.
Proposal
Options:
See #944
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: