Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard links losing ownership information when run with user namespaces #1257

Closed
jdieter opened this issue Jun 12, 2022 · 7 comments · Fixed by #1258
Closed

Hard links losing ownership information when run with user namespaces #1257

jdieter opened this issue Jun 12, 2022 · 7 comments · Fixed by #1258
Labels

Comments

@jdieter
Copy link

jdieter commented Jun 12, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When running containers with user namespaces (either using --uidmap or --userns=auto), ownership isn't set correctly for some hardlinks. The original file has the correct ownership, but any links belong to nobody.nobody.

Steps to reproduce the issue:

  1. Run
[root@f36 freeipa-container]# docker run -ti --userns=auto freeipa/freeipa-server:fedora-36 /bin/bash
[root@3b4e9a7eda14 /]# ls -l /data-template/etc/
total 12
drwxr-xr-x. 1 root   root     26 Jun 10 04:47 authselect
drwxr-xr-x. 1 root   root      0 Jun 10 04:46 certmonger
drwxr-xr-x. 1 root   root     24 May 30 07:47 dirsrv
drwxr-xr-x. 1 root   root      0 Jun 10 04:46 gssproxy
drwxr-xr-x. 1 root   root     30 Jun 10 04:47 httpd
drwxr-xr-x. 1 root   root     62 Jun 10 04:46 ipa
-rw-r--r--. 4 nobody nobody  880 Apr  5 19:23 krb5.conf
drwxr-xr-x. 1 root   root      0 Apr  5 19:28 krb5.conf.d
-rw-r--r--. 3 nobody nobody    0 May  6 10:11 machine-id
drwxr-xr-x. 1 root   root      0 Jun 10 04:47 named
-rw-r-----. 4 nobody nobody 1722 Jun 10 04:46 named.conf
lrwxrwxrwx. 1 root   root     29 Jun 10 04:47 nsswitch.conf -> /etc/authselect/nsswitch.conf
drwxr-xr-x. 1 root   root     22 Jun 10 04:46 openldap
drwxr-xr-x. 1 root   root    126 Jun 10 04:47 pam.d
drwxr-xr-x. 1 root   root     14 Jun 10 04:47 pkcs11
drwxr-xr-x. 1 root   root     26 Jun 10 04:47 pki
drwxr-xr-x. 1 root   root      0 Jun 10 04:46 samba
drwx------. 1 root   root     18 Jun 12 18:55 sssd
drwxr-xr-x. 1 root   root     50 Jun 10 04:46 sysconfig
drwxr-xr-x. 1 root   root     12 Jun 10 04:47 systemd
drwxr-xr-x. 1 root   root      0 Jun  3 08:23 tmpfiles.d
  1. Run
[root@f36 freeipa-container]# docker run -ti freeipa/freeipa-server:fedora-36 /bin/bash
[root@1c7d1779bf38 /]# ls -l /data-template/etc/
total 12
drwxr-xr-x. 1 root root    26 Jun 10 04:47 authselect
drwxr-xr-x. 1 root root    30 Jun 10 04:46 certmonger
drwxr-xr-x. 1 root root    24 May 30 07:47 dirsrv
drwxr-xr-x. 1 root root    98 Jun 10 04:46 gssproxy
drwxr-xr-x. 1 root root    30 Jun 10 04:47 httpd
drwxr-xr-x. 1 root root    62 Jun 10 04:46 ipa
-rw-r--r--. 4 root root   880 Apr  5 19:23 krb5.conf
drwxr-xr-x. 1 root root    30 Apr  5 19:28 krb5.conf.d
-rw-r--r--. 3 root root     0 May  6 10:11 machine-id
drwxr-xr-x. 1 root root     0 Jun 10 04:47 named
-rw-r-----. 4 root named 1722 Jun 10 04:46 named.conf
lrwxrwxrwx. 1 root root    29 Jun 10 04:47 nsswitch.conf -> /etc/authselect/nsswitch.conf
drwxr-xr-x. 1 root root    40 Jun 10 04:46 openldap
drwxr-xr-x. 1 root root   292 Jun 10 04:47 pam.d
drwxr-xr-x. 1 root root    14 Jun 10 04:47 pkcs11
drwxr-xr-x. 1 root root    26 Jun 10 04:47 pki
drwxr-xr-x. 1 root root    62 Jun 10 04:46 samba
drwx------. 1 root root    18 Jun  2 12:05 sssd
drwxr-xr-x. 1 root root   270 Jun 10 04:46 sysconfig
drwxr-xr-x. 1 root root    12 Jun 10 04:47 systemd
drwxr-xr-x. 1 root root     0 Jun  3 08:23 tmpfiles.d

Describe the results you received:
Notice that a number of the files are owned by nobody.nobody when running with --userns=auto, but those same files are owned by root.root or root.named when run without user namespaces.

Describe the results you expected:
Ownership should be the same (root.root or root.named) whether running with or without user namespaces

Additional information you deem important (e.g. issue happens only occasionally):
This does seem to be tied to hard links in the containers. freeipa does some interesting things with hardlinks, and you end up with 3 or 4 hard links of these configuration files. It's only the files with hard links that seem to be having this problem.

In this example, the hard links are created as part of the build process, not in an entrypoint.

Output of podman version:

Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18
Built:        Fri May  6 17:15:54 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 75.26
    systemPercent: 12.77
    userPercent: 11.97
  cpus: 2
  distribution:
    distribution: fedora
    variant: workstation
    version: "36"
  eventLogger: journald
  hostname: mmbox01.local.jdieter.net
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.17.12-300.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 441036800
  memTotal: 8206266368
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.5-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 7322841088
  swapTotal: 8206151680
  uptime: 47h 46m 34.64s (Approximately 1.96 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 5
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 498387124224
  graphRootUsed: 31283048448
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 73
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1651853754
  BuiltTime: Fri May  6 17:15:54 2022
  GitCommit: ""
  GoVersion: go1.18
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.1.0-1.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical machine running Fedora 36

@vrothberg
Copy link
Member

Thanks for reaching out, @jdieter!

@giuseppe PTAL

@giuseppe giuseppe transferred this issue from containers/podman Jun 13, 2022
@giuseppe
Copy link
Member

when we create a container based on a remapped image we lose all the data overlay stored internally in the $WORK/work/index directory so basically voiding the effect of index=on.
Unfortunately, it doesn't seem enough to just copy these files as AFAICS they embed references to previous upper dir.

@rhvgoyal how can we reuse as a lower dir an overlay mount that uses index=on?

What we would like to achieve it is to create a copy of an image (where we chown files accordingly to the user namespace configuration), and use that as a base for containers using that mapping.

@nalind PTAL

@nalind
Copy link
Member

nalind commented Jun 13, 2022

My first guess would have been that this was #1143, but that version of podman would have included a version of this package that included #1144, which fixed it. I don't think I understand how we're breaking the kernel's index=on behavior. I'm not sure we're even turning it on.

@giuseppe
Copy link
Member

my mistake, index=on is not turned on by default, I had it enabled it locally only.

In this case I think it is expected behavior and it behaves as documented for the overlay file system:

# sudo podman run -it --rm freeipa/freeipa-server:fedora-36 /bin/bash

# stat /data-template/.configfiles/etc/named.conf /data-template/etc/named.conf
  File: /data-template/.configfiles/etc/named.conf
  Size: 1722      	Blocks: 8          IO Block: 4096   regular file
Device: 0,96	Inode: 275654411   Links: 4
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (   25/   named)
Access: 2022-06-10 04:46:30.000000000 +0000
Modify: 2022-06-10 04:46:30.000000000 +0000
Change: 2022-06-13 12:21:45.277620905 +0000
 Birth: 2022-06-13 12:21:45.269620863 +0000
  File: /data-template/etc/named.conf
  Size: 1722      	Blocks: 8          IO Block: 4096   regular file
Device: 0,96	Inode: 275654411   Links: 4
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (   25/   named)
Access: 2022-06-10 04:46:30.000000000 +0000
Modify: 2022-06-10 04:46:30.000000000 +0000
Change: 2022-06-13 12:21:45.277620905 +0000
 Birth: 2022-06-13 12:21:45.269620863 +0000

# chown 1:1 /data-template/.configfiles/etc/named.conf

# stat /data-template/.configfiles/etc/named.conf /data-template/etc/named.conf
  File: /data-template/.configfiles/etc/named.conf
  Size: 1722      	Blocks: 8          IO Block: 4096   regular file
Device: 0,96	Inode: 833778342   Links: 1
Access: (0640/-rw-r-----)  Uid: (    1/     bin)   Gid: (    1/     bin)
Access: 2022-06-10 04:46:30.000000000 +0000
Modify: 2022-06-10 04:46:30.000000000 +0000
Change: 2022-06-13 15:54:28.352748266 +0000
 Birth: 2022-06-13 15:54:28.351748260 +0000
  File: /data-template/etc/named.conf
  Size: 1722      	Blocks: 8          IO Block: 4096   regular file
Device: 0,96	Inode: 275654411   Links: 4
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (   25/   named)
Access: 2022-06-10 04:46:30.000000000 +0000
Modify: 2022-06-10 04:46:30.000000000 +0000
Change: 2022-06-13 12:21:45.277620905 +0000
 Birth: 2022-06-13 12:21:45.269620863 +0000

and it breaks the hard link. So I need to fixup #1144 to consider this case

giuseppe added a commit to giuseppe/storage that referenced this issue Jun 13, 2022
Add an additional check before deciding whether the chown must be
skipped.  If the underlying storage (as it could be overlay with
index=off) breaks the hard link on copy up then we cannot skip the
chown even if the dev/ino was already encountered.

containers#1144 added the initial
check.

Closes: containers#1257

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

PR here: #1258

It doesn't fix the case where index=on. I am not sure how we should deal with it

@jdieter
Copy link
Author

jdieter commented Jun 13, 2022

I've tested the patch at #1258 in a scratch build for F36 available at https://koji.fedoraproject.org/koji/taskinfo?taskID=88236616, and it has fixed the issue for me!

giuseppe added a commit to giuseppe/storage that referenced this issue Jun 13, 2022
If the inode was already encountered and chowned, use link(2) instead
of chown(2).

This is needed when the underlying storage (as it could be overlay
with index=off) breaks the hard link on copy up.

containers#1144 added the initial
check.

Closes: containers#1257

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Jun 14, 2022
If the inode was already encountered and chowned, use link(2) instead
of chown(2).

This is needed when the underlying storage (as it could be overlay
with index=off) breaks the hard link on copy up.

containers#1144 added the initial
check.

Closes: containers#1257

Signed-off-by: Giuseppe Scrivano <[email protected]>
@rhvgoyal
Copy link
Contributor

@giuseppe I am not sure I understand what's the problem with index=on. My understanding is that you will download an image layers, create union using overlayfs (with index=on) and then do chown. Hardlinks will not be broken as index=on has been used.

Now will you not use this overlay as lower for next overlay mount for container. I think all the metadata index is in workdir/index dir and underlying overlayfs will understand that and report link count, and inode numbers etc correctly.

If you are trying to get rid of lower overlay and move reusing upper/ as a lower dir for container overlayfs that probably will be a problem.

giuseppe added a commit to giuseppe/storage that referenced this issue Jun 14, 2022
If the inode was already encountered and chowned, use link(2) instead
of chown(2).

This is needed when the underlying storage (as it could be overlay
with index=off) breaks the hard link on copy up.

containers#1144 added the initial
check.

Closes: containers#1257

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Jun 14, 2022
If the inode was already encountered and chowned, use link(2) instead
of chown(2).

This is needed when the underlying storage (as it could be overlay
with index=off) breaks the hard link on copy up.

containers#1144 added the initial
check.

Closes: containers#1257

Signed-off-by: Giuseppe Scrivano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants