chunked: fix reuse of the layers cache #2024

giuseppe · 2024-07-12T21:32:54Z

the global singleton was never updated, causing the cache to be always recreated for each layer.

It is not possible to keep the layersCache mutex for the entire load() since it calls into some store APIs causing a deadlock since findDigestInternal() is already called while some store locks are held.

Closes: #2023

openshift-ci · 2024-07-12T21:32:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2024-07-13T11:38:11Z

LGTM
@mtrmac PTAL

mtrmac

Thanks for handling this!

A fairly brief skim for now…

(It would be also convenient to have the locking rules documented in detail — e.g. which fields of layersCache and layers are immutable; which are protected by a lock (and which one).

Especially the lock hierarchy of layersCache.mutex vs. the locks of store/layerStore seems fairly non-obvious to me, given how far the code locations are from each other and how dissimilar the situation seems.

But all of that is blocked on having settled on design in the first place.)

mtrmac · 2024-07-15T17:23:03Z

pkg/chunked/cache_linux.go

@@ -111,7 +111,7 @@ func getLayersCacheRef(store storage.Store) *layersCache {
 		cache.refs++
 		return cache
 	}
-	cache := &layersCache{
+	cache = &layersCache{


(I wonder about making this kind of mistake harder to do … naming the global cacheSingleton, globalCache, or something, might make it less likely to conflict with a local. OTOH that’s very likely an overreaction.)

pkg/chunked/cache_linux.go

TomSweeneyRedHat · 2024-07-24T14:04:36Z

@giuseppe @mtrmac this too looks like a good one to get in for the vendor dance. Would it be possible to wrap this up in the next day or so, or should we consider it for the 1.55.1 release (1.55.0 is the vendor)?

TomSweeneyRedHat · 2024-07-25T22:00:10Z

If this is merged on or before August 12, 2024, please cherry-pick this to the release-1.55 branch

cgwalters · 2024-07-29T13:27:57Z

@giuseppe @mtrmac this too looks like a good one to get in for the vendor dance.

This is just my opinion but basically this is attempting to optimize zstd:chunked some, and introduces some open questions around concurrency. It doesn't make sense to rush it, and if it merged it's the type of change that should only be cherry picked if there was a known reason to do so.

giuseppe · 2024-09-12T09:36:56Z

I think the new version is a reasonable solution to the deadlock issues.

I've added a patch to split ApplyDiffWithDiffer so that now the heavy IO part (and everything that calls into the store) is done without the callers holding any lock on the store itself.

There are some other nice side effects:

ApplyDiffWithDiffer can be used in parallel
It solves the problem where multiple load() happens together doing the same job

mtrmac

I like the approach but I don’t think it works with the current API.

(Start with ⚠️ , the other review comments are ~irrelevant at this point.)

store.go

layers.go

store.go

layers.go

giuseppe · 2024-09-17T06:36:47Z

@mtrmac thanks for the review. Addressed the comments in the last version

store.go

mtrmac

LGTM. Thanks!

store.go

Signed-off-by: Giuseppe Scrivano <[email protected]>

it is not clear if it is needed, so simplify it. Signed-off-by: Giuseppe Scrivano <[email protected]>

Signed-off-by: Giuseppe Scrivano <[email protected]>

the global singleton was never updated, causing the cache to be always recreated for each layer. It is not possible to keep the layersCache mutex for the entire load() since it calls into some store APIs causing a deadlock since findDigestInternal() is already called while some store locks are held. Another benefit is that now only one goroutine can run load() preventing multiple calls to load() to happen in parallel doing the same work. Closes: containers#2023 Signed-off-by: Giuseppe Scrivano <[email protected]>

Signed-off-by: Giuseppe Scrivano <[email protected]>

mtrmac

/lgtm

Thanks again!

openshift-ci bot added the do-not-merge/work-in-progress label Jul 12, 2024

openshift-ci bot added the approved label Jul 12, 2024

giuseppe mentioned this pull request Jul 12, 2024

chunked layersCache seems to never be actually reused #2023

Closed

giuseppe marked this pull request as ready for review July 15, 2024 11:45

giuseppe changed the title ~~[WIP] chunked: fix reuse of the layers cache~~ chunked: fix reuse of the layers cache Jul 15, 2024

openshift-ci bot removed the do-not-merge/work-in-progress label Jul 15, 2024

mtrmac reviewed Jul 15, 2024

View reviewed changes

TomSweeneyRedHat added the 5.2 Wanted for Podman v5.2 label Jul 24, 2024

cgwalters removed the 5.2 Wanted for Podman v5.2 label Jul 29, 2024

giuseppe force-pushed the fix-reuse-of-cache branch 2 times, most recently from c324874 to 2b77871 Compare September 12, 2024 09:36

giuseppe force-pushed the fix-reuse-of-cache branch 2 times, most recently from ee8e7f4 to 6dc4f68 Compare September 12, 2024 12:03

mtrmac reviewed Sep 12, 2024

View reviewed changes

giuseppe force-pushed the fix-reuse-of-cache branch 3 times, most recently from 918081c to ed6b4c1 Compare September 16, 2024 07:44

mtrmac reviewed Sep 16, 2024

View reviewed changes

store.go Outdated Show resolved Hide resolved

layers.go Outdated Show resolved Hide resolved

giuseppe force-pushed the fix-reuse-of-cache branch from ed6b4c1 to 58b5e88 Compare September 17, 2024 06:36

mtrmac reviewed Sep 17, 2024

View reviewed changes

store.go Outdated Show resolved Hide resolved

giuseppe force-pushed the fix-reuse-of-cache branch 2 times, most recently from 455b8dc to 7fceea5 Compare September 17, 2024 19:12

mtrmac reviewed Sep 18, 2024

View reviewed changes

store.go Outdated Show resolved Hide resolved

giuseppe force-pushed the fix-reuse-of-cache branch from d1ea3b4 to 33e8277 Compare September 19, 2024 06:53

giuseppe added 6 commits September 19, 2024 09:33

store: fix comment

bc125af

Signed-off-by: Giuseppe Scrivano <[email protected]>

chunked: drop timeout mechanism for cache

b1d9489

it is not clear if it is needed, so simplify it. Signed-off-by: Giuseppe Scrivano <[email protected]>

store: deprecate ApplyDiffWithDiffer

c962df5

Signed-off-by: Giuseppe Scrivano <[email protected]>

cmd: use PrepareStagedLayer

5f68205

Signed-off-by: Giuseppe Scrivano <[email protected]>

drivers: drop unused args from ApplyDiffWithDiffer

a315bbf

Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe force-pushed the fix-reuse-of-cache branch from 33e8277 to a315bbf Compare September 19, 2024 07:33

mtrmac reviewed Sep 19, 2024

View reviewed changes

openshift-ci bot assigned mtrmac Sep 19, 2024

openshift-ci bot added the lgtm label Sep 19, 2024

openshift-merge-bot bot merged commit 1959722 into containers:main Sep 19, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunked: fix reuse of the layers cache #2024

chunked: fix reuse of the layers cache #2024

giuseppe commented Jul 12, 2024

openshift-ci bot commented Jul 12, 2024

rhatdan commented Jul 13, 2024

mtrmac left a comment

mtrmac Jul 15, 2024

TomSweeneyRedHat commented Jul 24, 2024

TomSweeneyRedHat commented Jul 25, 2024

cgwalters commented Jul 29, 2024

giuseppe commented Sep 12, 2024

mtrmac left a comment

giuseppe commented Sep 17, 2024

mtrmac left a comment

mtrmac left a comment

chunked: fix reuse of the layers cache #2024

chunked: fix reuse of the layers cache #2024

Conversation

giuseppe commented Jul 12, 2024

openshift-ci bot commented Jul 12, 2024

rhatdan commented Jul 13, 2024

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac Jul 15, 2024

Choose a reason for hiding this comment

TomSweeneyRedHat commented Jul 24, 2024

TomSweeneyRedHat commented Jul 25, 2024

cgwalters commented Jul 29, 2024

giuseppe commented Sep 12, 2024

mtrmac left a comment

Choose a reason for hiding this comment

giuseppe commented Sep 17, 2024

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac left a comment

Choose a reason for hiding this comment