Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc update: implement memory.checkBeforeUpdate #3579

Merged
merged 1 commit into from
Nov 3, 2022

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 29, 2022

This is aimed at solving the problem of cgroup v2 memory controller
behavior which is not compatible with that of cgroup v1.

In cgroup v1, if the new memory limit being set is lower than the
current usage, setting the new limit fails.

In cgroup v2, same operation succeeds, and the container is OOM killed.

Introduce a new setting, memory.checkBeforeUpdate, and use it to mimic
cgroup v1 behavior.

Note that this is not 100% reliable because of TOCTOU, but this is the
best we can do.

Fixes: #3509

Currently a draft, pending opencontainers/runtime-spec#1158 merge.

@@ -174,7 +174,11 @@ func unifiedResToSystemdProps(cm *dbusConnManager, res map[string]string) (props
return props, nil
}

func genV2ResourcesProperties(r *configs.Resources, cm *dbusConnManager) ([]systemdDbus.Property, error) {
func genV2ResourcesProperties(dirPath string, r *configs.Resources, cm *dbusConnManager) ([]systemdDbus.Property, error) {
if err := fs2.CheckMemoryUsage(dirPath, r); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check before setting systemd properties, otherwise the container is OOM-killed and the systemd unit is removed before we get to fsMgr.Set()

@kolyshkin kolyshkin requested review from AkihiroSuda and cyphar and removed request for AkihiroSuda September 9, 2022 20:48
@kolyshkin kolyshkin marked this pull request as ready for review September 9, 2022 20:49
@kolyshkin kolyshkin force-pushed the v2-low-mem branch 3 times, most recently from 862beb6 to 4b05d9a Compare September 11, 2022 22:38
@kolyshkin
Copy link
Contributor Author

The test case was failing:

not ok 167 update memory vs CheckBeforeUpdate
....
# runc update test_update --memory 1024 (status=1):
# time="2022-09-10T02:16:40Z" level=warning msg="Setting back cgroup configs failed due to error: unable to set unit properties: Unit runc-cgroups-integration-test-32127.scope not found., your state.json and actual configs might be inconsistent."
# time="2022-09-10T02:16:40Z" level=error msg="openat2 /sys/fs/cgroup/user.slice/user-2000.slice/[email protected]/machine.slice/runc-cgroups-integration-test-32127.scope/memory.max: no such file or directory"

This is where we set memory limit too low without using checkBeforeUpdate. In this case, the container is OOM-killed (and its cgroup and systemd unit are removed) in the midst of runc update. Depending on timing, runc might succeed or fail, since it's a race between runc, systemd, and the kernel.

Modified the test case to not check the error code from this run.

@kolyshkin
Copy link
Contributor Author

Rebased; PTAL @opencontainers/runc-maintainers

@kolyshkin
Copy link
Contributor Author

PTAL @opencontainers/runc-maintainers

@kolyshkin kolyshkin added this to the 1.2.0 milestone Nov 2, 2022
@@ -699,3 +699,43 @@ EOF
runc resume test_update
[ "$status" -eq 0 ]
}

@test "update memory vs CheckBeforeUpdate" {
requires cgroups_v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen on cgroups_v1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This knob is for cgroup v2, it is ignored (not used) on v1. Since we're testing the knob here, and not the kernel cgroup behavior, the test requires cgroup v2.

I don't know how to write a test to ensure the knob is ignored. In general, runc ignores all the parameters it's not aware of, and cgroup v1 code is not aware of CheckBeforeUpdate.

This is aimed at solving the problem of cgroup v2 memory controller
behavior which is not compatible with that of cgroup v1.

In cgroup v1, if the new memory limit being set is lower than the
current usage, setting the new limit fails.

In cgroup v2, same operation succeeds, and the container is OOM killed.

Introduce a new setting, memory.checkBeforeUpdate, and use it to mimic
cgroup v1 behavior.

Note that this is not 100% reliable because of TOCTOU, but this is the
best we can do.

Add some test cases.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin merged commit 2a8e5d3 into opencontainers:main Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroup v1/v2 compatibility issue when setting memory below the current usage
3 participants