Memoization Storage #3587

alexec · 2020-07-24T16:38:40Z

Summary

Memoization is a feature that allows users to run workflows faster by avoiding repeating work that has already been done.

Currently memoization uses a Kubernetes config map for storage. This will not scale to large number of entries, it requires elevated RBAC. Instead, we should provide the option to use a alternative database to store these in.

Motivation

Large workflows.

Proposal

Options:

Use the database.
Use any artifact storage.

See #944

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Ark-kun · 2020-09-27T02:00:58Z

I think we can use the Artifact drivers to store the caching metadata.
We can store the cache entry as an artifact using the same artifact location configuration. For example, in s3://<some_bucket>/artifacts/<cache_key>/cache_entries.yaml.
P.S. There are some benefits to allow multiple entries for the same cache_key, because even with exact same inputs, a volatile component can produce different results and in some scenarios all of them should be cached.

alexec · 2020-09-28T15:54:46Z

Interesting idea. We just need storage and this is a good option.

mkjpryor-stfc · 2020-11-29T22:35:15Z

This is required for caching large outputs because etcd places a limit on the maximum size of a configmap. Piggy-backing on the artifact storage sounds like it should be feasible to me.

lowc1012 · 2021-03-28T04:44:03Z

Hi, Is anybody working on this issue?
I'm interested in working on this. Could I take it forward?

leonharetd · 2022-01-05T11:31:22Z

I'm interested in this. I want to try it

attreyee-muk · 2022-02-13T05:43:49Z

I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?

sarabala1979 · 2022-02-13T06:05:24Z

I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?

Here is the current memoization implementation document https://github.com/argoproj/argo-workflows/blob/master/docs/memoization.md

attreyee-muk · 2022-02-13T07:42:38Z

Okay. Thank You .

alexec · 2022-02-14T16:26:37Z

If you'd like to do this as part of GSoC, you'll need to sign up here:

https://summerofcode.withgoogle.com

GSoC does not start for several months, so if you're instead looking to make impact today, and don't need the benefits of GSoC (see their website for the details), then mentoring might the right approach for you.

attreyee-muk · 2022-02-14T18:12:02Z

@alexec The applications for participants will open in April right? I'm actually a bit new to all of this.

terrytangyuan · 2022-02-14T18:13:59Z

@A-Muk Please take a look at the links available in https://github.com/argoproj/argo-workflows/blob/master/docs/mentoring.md#how-to-participate-google-summer-of-code

attreyee-muk · 2022-02-15T13:26:10Z

Thank you @terrytangyuan

Mostafa-wael · 2022-03-18T23:13:17Z

How can I apply for this idea for GSOC? is there any communication channel with the mentors?

sudhanshu456 · 2022-04-01T20:26:20Z

@alexec Hey, can you please help me understand how should I go into mentoring? I've been working with Argo-workflows for 1 year.

print-sid8 · 2024-10-05T12:29:04Z

Has there been any progress on making DB/Storage the output location for steps and enabling caching of the steps?

if I were to have a step that outputs artifcats to S3 as its final step, and the same step is used in another workflow, or if the same workflow is rerun, does the current ConfigMap based cache implemenation understand that this same step has run earlier, and skip it and use the cache to continue to next step?

If so, a simple solution could be to simply write to s3 location with some identification from user side for cache version, and use the same S3 path as input in the next dependent step to kind of emulate caching.

Am i right or wrong?

alexec added type/feature Feature request epic: controller enhancements labels Jul 24, 2020

alexec mentioned this issue Sep 18, 2020

Introduce LRU cache #4071

Closed

simster7 self-assigned this Sep 30, 2020

whynowy assigned alexec and unassigned simster7 Oct 7, 2020

alexec removed their assignment Oct 14, 2020

terrytangyuan mentioned this issue May 6, 2021

docs: Add FAQs in memoization doc and add link in the err message #5847

Merged

1 task

alexec removed the epic/controller-enhancements label Jun 26, 2021

sarabala1979 added this to the 2022 Q1 milestone Jan 19, 2022

terrytangyuan mentioned this issue Feb 7, 2022

Inconsistent permission behaviour with memoization #7381

Closed

alexec added the area/memoization label Feb 7, 2022

alexec removed this from the 2022 Q1 milestone Feb 8, 2022

alexec added the google-summer-of-code Google Summer of Code contributions label Feb 8, 2022

LaloLoop mentioned this issue Feb 10, 2022

I would like a ~mentor~ GSoC #7849

Closed

terrytangyuan removed the google-summer-of-code Google Summer of Code contributions label Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoization Storage #3587

Memoization Storage #3587

alexec commented Jul 24, 2020 •

edited

Loading

Ark-kun commented Sep 27, 2020

alexec commented Sep 28, 2020

mkjpryor-stfc commented Nov 29, 2020

lowc1012 commented Mar 28, 2021

leonharetd commented Jan 5, 2022

attreyee-muk commented Feb 13, 2022

sarabala1979 commented Feb 13, 2022

attreyee-muk commented Feb 13, 2022

alexec commented Feb 14, 2022

attreyee-muk commented Feb 14, 2022

terrytangyuan commented Feb 14, 2022

attreyee-muk commented Feb 15, 2022

Mostafa-wael commented Mar 18, 2022

sudhanshu456 commented Apr 1, 2022

print-sid8 commented Oct 5, 2024 •

edited

Loading

Memoization Storage #3587

Memoization Storage #3587

Comments

alexec commented Jul 24, 2020 • edited Loading

Summary

Motivation

Proposal

Ark-kun commented Sep 27, 2020

alexec commented Sep 28, 2020

mkjpryor-stfc commented Nov 29, 2020

lowc1012 commented Mar 28, 2021

leonharetd commented Jan 5, 2022

attreyee-muk commented Feb 13, 2022

sarabala1979 commented Feb 13, 2022

attreyee-muk commented Feb 13, 2022

alexec commented Feb 14, 2022

attreyee-muk commented Feb 14, 2022

terrytangyuan commented Feb 14, 2022

attreyee-muk commented Feb 15, 2022

Mostafa-wael commented Mar 18, 2022

sudhanshu456 commented Apr 1, 2022

print-sid8 commented Oct 5, 2024 • edited Loading

alexec commented Jul 24, 2020 •

edited

Loading

print-sid8 commented Oct 5, 2024 •

edited

Loading