Initial implementation of consistency checks #1569

nalind · 2023-04-13T13:35:42Z

Add initial Check() and Repair() methods to Stores.

Check() checks for inconsistencies between the layers which the lower-level storage driver claims to know about and the ones which we know we're managing. It checks that layers referenced by layers, images, and containers are known to us and that images referenced by containers are known to us. It checks that data which we store alongside layers, images, and containers is still present, and to the extent which we store other information about that data (frequenly just the size of the data), verifies that it matches recorded expectations. Lastly, it checks that layers which are part of images (and which we therefore know what they should have in them) have the expected content, and nothing else.

Repair() removes any containers, images, and layers which have any errors associated with them. This is destructive, so its use should be considered and deliberate.

Add initial Check() and Repair() methods to Stores. Check() checks for inconsistencies between the layers which the lower-level storage driver claims to know about and the ones which we know we're managing. It checks that layers referenced by layers, images, and containers are known to us and that images referenced by containers are known to us. It checks that data which we store alongside layers, images, and containers is still present, and to the extent which we store other information about that data (frequenly just the size of the data), verifies that it matches recorded expectations. Lastly, it checks that layers which are part of images (and which we therefore know what they should have in them) have the expected content, and nothing else. Repair() removes any containers, images, and layers which have any errors associated with them. This is destructive, so its use should be considered and deliberate. Signed-off-by: Nalin Dahyabhai <[email protected]>

rhatdan · 2023-04-16T11:05:23Z

Nice work.
LGTM
@giuseppe @vrothberg @saschagrunert @flouthoc @umohnani8 @mtrmac PTAL

giuseppe

LGTM

mtrmac

This is a fairly brief single-pass skim; please treat it mostly as unbaked low-priority suggestions.

mtrmac · 2023-04-17T17:55:38Z

types/errors.go

+
+	// ErrLayerUnaccounted describes a layer that is present in the lower-level storage driver,
+	// but which is not known to or managed by the higher-level driver-agnostic logic.
+	ErrLayerUnaccounted = errors.New("layer in lower level storage driver not accounted for")


In many of these cases, I think it would be useful to use an error type (even if that type is currently an empty struct); that would allow later adding more information (like which store, or which layer) is involved to the error type as an API.

Once this is an errors.New, adding more fields would be a silent breaking change (users would need to switch from errors.Is to errors.As).

OTOH that’s a fairly generic concern; I can very well accept the argument that no callers are expected to inspect the details of these specific errors in code anyway.

mtrmac · 2023-04-17T18:02:39Z

store.go

+// It returns the return value of fn, or its own error initializing the store.
+func (s *store) readContainerStore(fn func() (bool, error)) (bool, error) {
+	if err := s.containerStore.startReading(); err != nil {
+		return true, err


The pattern of returning done == true looks a bit out of place here; I think we can do without.

Let me dust off a Go 1.18 generics-based update to these helpers, that should make things nicer.

mtrmac · 2023-04-17T18:04:51Z

cmd/containers-storage/check.go

+			flags.BoolVar(&jsonOutput, []string{"-json", "j"}, jsonOutput, "Prefer JSON output")
+			flags.StringVar(&maximumUnreferencedLayerAge, []string{"-max", "m"}, "24h", "Maximum allowed age for unreferenced layers")


These flags are not documented; should they be? At least the latter one; -json seems not to be documented for most of the commands.

mtrmac · 2023-04-17T18:07:11Z

cmd/containers-storage/check.go

+	if err != nil {
+		return 1, err
+	}
+	outputNonJSON := func(report storage.CheckReport) {


(I’m not sure what’s the purpose of this nested function — it seems to me it could be just open-coded inside that if below, or an external function. But this works just fine.)

mtrmac · 2023-04-17T18:08:32Z

cmd/containers-storage/check.go

+		return 1, err
+	}
+	outputNonJSON := func(report storage.CheckReport) {
+		for id, errs := range report.Layers {


If all the report entries have this regular structure, a helper vaguely like reportPerItemErrors(os.Stdout, "layer", report.Layers) would make things shorter.

mtrmac · 2023-04-17T19:12:52Z

check.go

+	// If the driver can tell us about which layers it knows about, we should have previously
+	// examined all of them.  Any that we didn't are probably just wasted space.
+	// Note: if the driver doesn't support enumerating layers, it returns ErrNotSupported.
+	if err := s.startUsingGraphDriver(); err != nil {


Hum, should the graph lock be held for the lifetime of this function, to ensure a consistent view?

mtrmac · 2023-04-17T19:14:19Z

check.go

+	if err != nil && !errors.Is(err, drivers.ErrNotSupported) {
+		return CheckReport{}, err
+	}
+	if !errors.Is(err, drivers.ErrNotSupported) {


Maybe

switch { case errors.Is(…): case err != nil: default: }

mtrmac · 2023-04-17T19:17:28Z

check.go

+	for id := range report.Layers {
+		layersToDelete = append(layersToDelete, id)
+	}
+	depth := func(id string) int {


(This is O(N^2) in the worst case; building a map[layerId]depth would allow making this O(N). Probably not worth worrying about now…)

mtrmac · 2023-04-17T19:18:33Z

check.go

+		}
+		return d
+	}
+	isUnaccounted := func(errs []error) bool {


This will be called for every single layer at least once — so building a map[layerID]bool would mean it is called O(N) times instead of O(N log N). times.

mtrmac · 2023-04-17T19:21:26Z

check.go

+		if _, ok := deletedLayers[id]; ok {
+			continue
+		}
+		for _, reportedErr := range report.Layers[id] {


This could try to remove the same layer multiple times, is that intentional?

I’d expect a single attempt, at the driver or higher level depending on isUnaccounted.

nalind force-pushed the check branch from dc5091f to 9a2f7b0 Compare April 13, 2023 13:42

nalind force-pushed the check branch from 9a2f7b0 to cabf1b9 Compare April 13, 2023 14:38

nalind changed the title ~~[WIP] Initial implementation of consistency checks~~ Initial implementation of consistency checks Apr 13, 2023

giuseppe approved these changes Apr 17, 2023

View reviewed changes

giuseppe merged commit a9ace5f into containers:main Apr 17, 2023

nalind deleted the check branch April 17, 2023 13:11

mtrmac reviewed Apr 17, 2023

View reviewed changes

haircommander mentioned this pull request Jul 27, 2023

make use of c/storage Check() and Repair() functions cri-o/cri-o#7177

Closed

kwilczynski mentioned this pull request Jul 26, 2024

Use custom storage check options for CRI-O internal wipe cri-o/cri-o#8417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of consistency checks #1569

Initial implementation of consistency checks #1569

nalind commented Apr 13, 2023

rhatdan commented Apr 16, 2023

giuseppe left a comment

mtrmac left a comment

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023 •

edited

Loading

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

mtrmac Apr 17, 2023

		flags.BoolVar(&jsonOutput, []string{"-json", "j"}, jsonOutput, "Prefer JSON output")
		flags.StringVar(&maximumUnreferencedLayerAge, []string{"-max", "m"}, "24h", "Maximum allowed age for unreferenced layers")

Initial implementation of consistency checks #1569

Initial implementation of consistency checks #1569

Conversation

nalind commented Apr 13, 2023

rhatdan commented Apr 16, 2023

giuseppe left a comment

Choose a reason for hiding this comment

mtrmac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac Apr 17, 2023 •

edited

Loading