-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of consistency checks #1569
Conversation
Add initial Check() and Repair() methods to Stores. Check() checks for inconsistencies between the layers which the lower-level storage driver claims to know about and the ones which we know we're managing. It checks that layers referenced by layers, images, and containers are known to us and that images referenced by containers are known to us. It checks that data which we store alongside layers, images, and containers is still present, and to the extent which we store other information about that data (frequenly just the size of the data), verifies that it matches recorded expectations. Lastly, it checks that layers which are part of images (and which we therefore know what they should have in them) have the expected content, and nothing else. Repair() removes any containers, images, and layers which have any errors associated with them. This is destructive, so its use should be considered and deliberate. Signed-off-by: Nalin Dahyabhai <[email protected]>
Nice work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly brief single-pass skim; please treat it mostly as unbaked low-priority suggestions.
|
||
// ErrLayerUnaccounted describes a layer that is present in the lower-level storage driver, | ||
// but which is not known to or managed by the higher-level driver-agnostic logic. | ||
ErrLayerUnaccounted = errors.New("layer in lower level storage driver not accounted for") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many of these cases, I think it would be useful to use an error type (even if that type is currently an empty struct); that would allow later adding more information (like which store, or which layer) is involved to the error type as an API.
Once this is an errors.New
, adding more fields would be a silent breaking change (users would need to switch from errors.Is
to errors.As
).
OTOH that’s a fairly generic concern; I can very well accept the argument that no callers are expected to inspect the details of these specific errors in code anyway.
// It returns the return value of fn, or its own error initializing the store. | ||
func (s *store) readContainerStore(fn func() (bool, error)) (bool, error) { | ||
if err := s.containerStore.startReading(); err != nil { | ||
return true, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern of returning done == true
looks a bit out of place here; I think we can do without.
Let me dust off a Go 1.18 generics-based update to these helpers, that should make things nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flags.BoolVar(&jsonOutput, []string{"-json", "j"}, jsonOutput, "Prefer JSON output") | ||
flags.StringVar(&maximumUnreferencedLayerAge, []string{"-max", "m"}, "24h", "Maximum allowed age for unreferenced layers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These flags are not documented; should they be? At least the latter one; -json
seems not to be documented for most of the commands.
if err != nil { | ||
return 1, err | ||
} | ||
outputNonJSON := func(report storage.CheckReport) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I’m not sure what’s the purpose of this nested function — it seems to me it could be just open-coded inside that if
below, or an external function. But this works just fine.)
return 1, err | ||
} | ||
outputNonJSON := func(report storage.CheckReport) { | ||
for id, errs := range report.Layers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the report entries have this regular structure, a helper vaguely like reportPerItemErrors(os.Stdout, "layer", report.Layers)
would make things shorter.
// If the driver can tell us about which layers it knows about, we should have previously | ||
// examined all of them. Any that we didn't are probably just wasted space. | ||
// Note: if the driver doesn't support enumerating layers, it returns ErrNotSupported. | ||
if err := s.startUsingGraphDriver(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, should the graph lock be held for the lifetime of this function, to ensure a consistent view?
if err != nil && !errors.Is(err, drivers.ErrNotSupported) { | ||
return CheckReport{}, err | ||
} | ||
if !errors.Is(err, drivers.ErrNotSupported) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
switch {
case errors.Is(…):
case err != nil:
default:
}
for id := range report.Layers { | ||
layersToDelete = append(layersToDelete, id) | ||
} | ||
depth := func(id string) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is O(N^2) in the worst case; building a map[layerId]depth
would allow making this O(N). Probably not worth worrying about now…)
} | ||
return d | ||
} | ||
isUnaccounted := func(errs []error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called for every single layer at least once — so building a map[layerID]bool
would mean it is called O(N) times instead of O(N log N). times.
if _, ok := deletedLayers[id]; ok { | ||
continue | ||
} | ||
for _, reportedErr := range report.Layers[id] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could try to remove the same layer multiple times, is that intentional?
I’d expect a single attempt, at the driver or higher level depending on isUnaccounted
.
Add initial Check() and Repair() methods to Stores.
Check() checks for inconsistencies between the layers which the lower-level storage driver claims to know about and the ones which we know we're managing. It checks that layers referenced by layers, images, and containers are known to us and that images referenced by containers are known to us. It checks that data which we store alongside layers, images, and containers is still present, and to the extent which we store other information about that data (frequenly just the size of the data), verifies that it matches recorded expectations. Lastly, it checks that layers which are part of images (and which we therefore know what they should have in them) have the expected content, and nothing else.
Repair() removes any containers, images, and layers which have any errors associated with them. This is destructive, so its use should be considered and deliberate.