-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carry #781: Add support for cgroup namespace #1184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I didn't unshare(CLONE_NEWCGROUP) in nsexec.c is that unshare() makes all your current cgroups your new cgroup root. If I unshare the child process to join a new cgroup ns before father progress puts it in the proper cgroup subdirectories(in function p.manager.Apply(p.pid())), then we could see part of cgroup paths(not / as expected) inside a container.
The correct solution for this is to make sure that we do p.manager.Apply
executes before we hit the unshare
code. Which is not really nice to deal with but it can be done with a simple pipe that the nsexec.c
code will block on until we've setup cgroups.
But yeah, the solution is not to do Unshare
in Go code. That's just not going to end well.
Aside from that the code looks okay (I'll review it again after you make the above fix). Thanks for carrying this for me 😸.
libcontainer/standard_init_linux.go
Outdated
|
||
// Use Unshare() to enter a new cgroup_namespace before InitializeMountNamespace() | ||
if l.config.Config.Namespaces.Contains(configs.NEWCGROUP) && l.config.Config.Namespaces.PathOf(configs.NEWCGROUP) == "" { | ||
if err := syscall.Unshare(configs.Syscall_CLONE_NEWCGROUP); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is seriously not safe. Do not ever call Unshare
from inside Go code. The Unshare
interfaces do not work in a multithreaded process (and Go programs are multithreaded). This is why nsexec.c
exists in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I also have considered the way of using pipe to block nsexec.c
code but got stuck someway, so I just sent this implementation for discussion. Though I would try to work it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to give you a hand if you have any questions, I didn't mean to scare you off. 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yeah, I know. I'm going to play around with it myself first. I would turn to you for help without hesitation if I have any questions. Thanks in advance 😜
libcontainer/specconv/spec_linux.go
Outdated
@@ -190,6 +195,17 @@ func CreateLibcontainerConfig(opts *CreateOpts) (*configs.Config, error) { | |||
if config.Namespaces.Contains(t) { | |||
return nil, fmt.Errorf("malformed spec file: duplicated ns %q", ns) | |||
} | |||
// if this container is going to have shared cgroup namespace with host, | |||
// we just skip the following .Add() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found that docker
just uses delNamespace()
to delete a namespace shared with host. If a user runs a container without adding cgroup ns in config.json
, then this container will share cgroup ns with host and its cgroups are bind-mounted. I think it is the same with the scenario that we setup cgroup ns in config.json
but set cgroup path as /proc/1/ns/cgroup
. So I chose not to add cgroup ns into config.Namespaces
. Then its cgroups will be bind-mounted without adding more codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that you're adding more code to special case something that I don't understand why you need to special case it. If you do a setns(...)
for a namespace that you've already joined then it doesn't matter anyway.
But also I don't understand why you're comparing the link to /proc/1/ns/cgroup
-- how do you know that runC is inside the same cgroup namespace as the init in it's current PID namespace (that doesn't need to be true)? From what I can see you're going to skip joining cgroups people explicitly asked you to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I added this "skipping" logic because I thought a container's cgroups should be bind-mounted if it shares cgroup ns with host, but now it seems that I have went the wrong way. 😥
So, you mean that it makes no difference whether a container shares cgroup ns with host or not? And, we also need to mount new cgroups (not bind-mount) in a container which shares cgroup ns with host?
16f09da
to
6c4f233
Compare
@cyphar updated! 😉 |
Oops, it seems that our |
@yummypeng Yeah, I would make the tests conditional (do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments. My main gripes are just stylistic, but there is one thing about the whole "skipping" logic that I don't like.
@@ -8,13 +8,17 @@ func (n *Namespace) Syscall() int { | |||
return namespaceInfo[n.Type] | |||
} | |||
|
|||
// This is not yet in the Go stdlib. | |||
const Syscall_CLONE_NEWCGROUP = (1 << 25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be exported.
defer remove(rootfs) | ||
|
||
l, err := os.Readlink("/proc/1/ns/cgroup") | ||
ok(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, this should be a t.Skip
if os.IsNotExist(err)
.
libcontainer/nsenter/nsexec.c
Outdated
@@ -38,6 +38,9 @@ enum sync_t { | |||
SYNC_ERR = 0xFF, /* Fatal error, no turning back. The error code follows. */ | |||
}; | |||
|
|||
/* Synchronisation value for cgroup namespace setup */ | |||
#define SYNC_PRIVATE_CGNS 0x80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this synchronisation is different to the previous ones, we should consider naming it something distinctive. I actually didn't catch that this was a different type of synchronisation to the previous one on my first reading of this.
libcontainer/nsenter/nsexec.c
Outdated
@@ -46,6 +49,9 @@ enum sync_t { | |||
/* JSON buffer. */ | |||
#define JSON_MAX 4096 | |||
|
|||
/* buffer for synchronisation value */ | |||
#define SYNC_BUF_LEN 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since SYNC_PRIVATE_CGNS
is a single byte, can you just follow the example of all of the other synchronisation code and not use a buffer (use a single character local variable).
libcontainer/nsenter/nsexec.c
Outdated
len = read(pipenum, buf, SYNC_BUF_LEN); | ||
if (len < 0) | ||
bail("read synchronisation value failed"); | ||
value = strtol(buf, &endptr, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this shouldn't be necessary if you just write a single byte. Also, why is value not local to this block?
libcontainer/process_linux.go
Outdated
@@ -19,6 +19,9 @@ import ( | |||
"github.com/opencontainers/runc/libcontainer/utils" | |||
) | |||
|
|||
// Synchronisation value for cgroup namespace setup. | |||
const privateCgroupns int = (1 << 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit mixed on this. On the one hand I think this should be implemented with an #import
using cgo
because I don't really want to duplicate this (especially since it's not really possible to grep
for both usages of this constant). But then cgo
is bad to use for something like that. 😉
If we have to leave it here then please name it something similar to SYNC_PRIVATE_CGNS
(after you rename that as per my comments above). Then at least it will be possible to grep
for this constant.
libcontainer/nsenter/nsexec.c
Outdated
if (*endptr != '\0') | ||
bail("unable to parse synchronisation value"); | ||
if (value == SYNC_PRIVATE_CGNS) { | ||
if (unshare(CLONE_NEWCGROUP) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that RHEL might have issues with this (WDYT @mrunalp). Also this is a bit ugly but I'm guessing you can't combine this with the rest of the unshare
code because we're currently sending the PID of JUMP_INIT
not JUMP_CHILD
to the .Apply
code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as far as I can see, unshare(CLONE_NEWCGROUP)
must be done in the final child process. Or if you have any better suggestions? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you /could/ do it by passing two different PIDs to the parent. But that's just a bad idea.
This is fine, don't worry about my gripes.
libcontainer/process_linux.go
Outdated
@@ -257,6 +260,13 @@ func (p *initProcess) start() error { | |||
if err := p.manager.Apply(p.pid()); err != nil { | |||
return newSystemErrorWithCause(err, "applying cgroup configuration for process") | |||
} | |||
// Now it's time to setup cgroup namesapce | |||
if p.config.Config.Namespaces.Contains(configs.NEWCGROUP) && p.config.Config.Namespaces.PathOf(configs.NEWCGROUP) == "" { | |||
if _, err := p.parentPipe.WriteString(strconv.Itoa(privateCgroupns)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned this should be a single byte write.
libcontainer/specconv/spec_linux.go
Outdated
@@ -190,6 +195,17 @@ func CreateLibcontainerConfig(opts *CreateOpts) (*configs.Config, error) { | |||
if config.Namespaces.Contains(t) { | |||
return nil, fmt.Errorf("malformed spec file: duplicated ns %q", ns) | |||
} | |||
// if this container is going to have shared cgroup namespace with host, | |||
// we just skip the following .Add() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that you're adding more code to special case something that I don't understand why you need to special case it. If you do a setns(...)
for a namespace that you've already joined then it doesn't matter anyway.
But also I don't understand why you're comparing the link to /proc/1/ns/cgroup
-- how do you know that runC is inside the same cgroup namespace as the init in it's current PID namespace (that doesn't need to be true)? From what I can see you're going to skip joining cgroups people explicitly asked you to.
When you do a setns(2) to a cgroup namespace, you're going to share So if setns(2) doesn't fail then mount -t cgroup shouldn't fail either So the only time we should be bindmounting is when we haven't created a I could actually check the kernel code if it made you feel better (the P.S.: GitHub is horrible, I can't load your comment in the web-view so I Aleksa Sarai |
If I run a container with cgroup ns path set as root@ubuntu:~/tools# runc run test
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/user/0.user/9.session/test
12:pids:/test
11:hugetlb:/user/0.user/9.session/test
10:net_prio:/user/0.user/9.session/test
9:perf_event:/user/0.user/9.session/test
8:net_cls:/user/0.user/9.session/test
7:freezer:/user/0.user/9.session/test
6:devices:/user/0.user/9.session/test
5:memory:/user/0.user/9.session/test
4:blkio:/user/0.user/9.session/test
3:cpuacct:/user/0.user/9.session/test
2:cpu:/user/0.user/9.session/test
1:cpuset:/test And since its cgroup scoping is shared with host, so its root@runc:~# ls /sys/fs/cgroup/pids/
cgroup.clone_children cgroup.procs cgroup.sane_behavior docker notify_on_release release_agent tasks test Here, |
I feel like if someone intentionally decides to But even if we decide that we want to always make I feel like the cgroup namespace |
6c4f233
to
558404f
Compare
@cyphar Once again updated! I just removed the whole "skipping" logic this time. |
@yummypeng If it's ready for review, you can remove |
558404f
to
4a06d79
Compare
Sorry for late update. ping @opencontainers/runc-maintainers Any comments? Thank you : ) |
4a06d79
to
8c6d4d6
Compare
This is a very simple implementation because it doesn't require any configuration unlike the other namespaces, and in its current state it only masks paths. This feature is available in Linux 4.6+ and is enabled by default for kernels compiled with CONFIG_CGROUP=y. Signed-off-by: Aleksa Sarai <[email protected]>
Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <[email protected]>
8c6d4d6
to
d3516db
Compare
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. - Also added couple of APIs to return container type and name. Signed-off-by: Ashutosh Mehra <[email protected]>
Yikes, this has been around for a while. I can re-carry this PR for @yummypeng. |
@@ -8,13 +8,17 @@ func (n *Namespace) Syscall() int { | |||
return namespaceInfo[n.Type] | |||
} | |||
|
|||
// This is not yet in the Go stdlib. | |||
const syscall_CLONE_NEWCGROUP = (1 << 25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been added to the unix library.
Thanks @cyphar 😜 |
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. Signed-off-by: Ashutosh Mehra <[email protected]>
Add support in port library for detecting if we are running in docker container or not. This is required for cgroup related APIs in port library, such as omrsysinfo_cgroup_get_memlimit, to work properly. This is because when running in a docker container, the cgroups listed in /proc/self/cgroup do not match the cgroups of the process in the container. /proc/self/cgroup actually shows the cgroups as seen from the host. Eg if a process is running in a docker container, its /proc/PID/cgroup on docker will show something like: $ cat /proc/8364/cgroup 11:cpuset:/docker/<container id> 10:perf_event:/docker/<container id> 9:memory:/dockera/<container id> ... but the actual cgroup of the process would be "/". Once we figure out that we are running in docker container, then cgroup APIs in port library would use "/" as cgroup name for reading resource limits. Note that as and when docker containers start supporting cgroup namespace, then above mentioned mismatch would also be fixed. Following issues and PRs are related to adding support for cgroup namespace in docker container: opencontainers/runc#1184 opencontainers/runtime-spec#62 opencontainers/runtime-spec#66 There is no direct mechanism to detect if we are running in docker container. Most reliable mechanism that I have come across is by checking the cgroup for PID 1 (by reading /proc/1/cgroup file) for all subsystems. If all cgroups are "/", then it indicates the process is running on host system. Else we conclude it to be running in a docker container. Note that this mechanism will break when docker containers start supporting cgroup namespace. Signed-off-by: Ashutosh Mehra <[email protected]>
I'm rebasing and caring this :) |
Heh, thanks. I started rebasing it and it was pretty ugly from memory. |
Carrying #781. Thanks for @cyphar 's work. 😄
Cgroup namespace can be configured in
config.json
as othernamespaces. Here is an example:
Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper
CgroupsPath
of both containers(the second container's cgrouppath must be the subdirectory of the first one). Or there might be
some unexpected results.
Signed-off-by: Yuanhong Peng [email protected]