Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsgroup is not set on volume provided by Vmware CSI #370

Closed
Anil-YadavK8s opened this issue Sep 18, 2020 · 26 comments
Closed

fsgroup is not set on volume provided by Vmware CSI #370

Anil-YadavK8s opened this issue Sep 18, 2020 · 26 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Anil-YadavK8s
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:
When we are deploying a Pod with security context fstype and non-root user to access Vmware volume (PV/PVC).
Fsgroup failed to assigned setgid in the files on the volumes

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:
Pod presented with Vmware CSI's PV/PVC , unable to fsgroup on the data volume.

What you expected to happen:

Vmware CSI's PV/PVC should support fsgroup for less privileges Pod.

How to reproduce it (as minimally and precisely as possible):

kind: StatefulSet
metadata:
name: alpine-privileged
labels:
app: alpine-privileged
spec:
replicas: 1
selector:
matchLabels:
app: alpine-privileged
template:
metadata:
labels:
app: alpine-privileged
spec:
serviceAccountName: test-sa-psp
securityContext:
runAsUser: 1000
fsGroup: 2000
containers:
- name: alpine-privileged
image: alpine:3.9
command: ["sleep", "1800"]
volumeMounts:
- name: data
mountPath: /data
securityContext:
readOnlyRootFilesystem: false
volumeClaimTemplates:

  • apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    creationTimestamp: null
    name: data
    spec:
    accessModes:
    • ReadWriteOnce
      resources:
      requests:
      storage: 1Gi
      storageClassName: test
      volumeMode: Filesystem

ccdadmin@seliicbl01481-ns2-testbed2-m1:~> kubectl -n=test-1 exec alpine-privileged-0 -it -- /bin/sh
/ $ ls -lrth
total 8
drwxr-xr-x 11 root root 125 Apr 23 13:10 var
drwxr-xr-x 7 root root 66 Apr 23 13:10 usr
drwxrwxrwt 2 root root 6 Apr 23 13:10 tmp
drwxr-xr-x 2 root root 6 Apr 23 13:10 srv
drwx------ 2 root root 6 Apr 23 13:10 root
drwxr-xr-x 2 root root 6 Apr 23 13:10 opt
drwxr-xr-x 2 root root 6 Apr 23 13:10 mnt
drwxr-xr-x 5 root root 44 Apr 23 13:10 media
drwxr-xr-x 5 root root 185 Apr 23 13:10 lib
drwxr-xr-x 2 root root 6 Apr 23 13:10 home
drwxr-xr-x 2 root root 4.0K Apr 23 13:10 sbin
drwxr-xr-x 2 root root 4.0K Apr 23 13:10 bin
dr-xr-xr-x 13 root root 0 Sep 14 11:21 sys
drwxr-xr-x 1 root root 21 Sep 18 09:21 run
dr-xr-xr-x 587 root root 0 Sep 18 09:21 proc
drwxr-xr-x 1 root root 66 Sep 18 09:21 etc
drwxr-xr-x 5 root root 360 Sep 18 09:21 dev
drwxr-xr-x 3 root root 18 Sep 18 09:21 data
/ $ cd data/
/data $ ls
demo
/data $ ls -lrth
total 4
drwxr-xr-x 3 root root 4.0K Sep 18 09:21 demo
/data $
/data $
/data $ mkdir test
mkdir: can't create directory 'test': Permission denied
/data $
/data $

Anything else we need to know?:

Environment:

  • csi-vsphere version: vmware/vsphere-block-csi-driver:v2.0.0

  • vsphere-cloud-controller-manager version: gcr.io/cloud-provider-vsphere/cpi/release/manager:latest

  • Kubernetes version: 1.17.3

  • vSphere version: 6.7U3

  • OS (e.g. from /etc/os-release): SUSE Linux Enterprise Server 15 SP1

  • Kernel (e.g. uname -a): Linux master-node 4.12.14-197.45-default Create a SECURITY_CONTACTS file. #1 SMP Thu Jun 4 11:06:04 UTC 2020 (2b6c749) x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:

@Anil-YadavK8s
Copy link
Author

@RaunakShah @divyenpatel Any insight on above issue.

@Anil-YadavK8s
Copy link
Author

@RaunakShah On analysis We could see issue with image: quay.io/k8scsi/csi-provisioner:v2.0.0-rc2. This image resolved our Volume going in released state. But it is not honoring Fsgroup.

@RaunakShah
Copy link
Contributor

@Anil-YadavK8s can you try doing this without CSI involved? Basically exec into the Pod and try mounting manually..

@Anil-YadavK8s
Copy link
Author

@RaunakShah Yes with empty dir/hostvolume , fsgroup is working.

@RaunakShah
Copy link
Contributor

@Anil-YadavK8s Can you share all the YAML files (maybe as a github gist). We can try to reproduce this issue on a local setup and get back to you.

@dickeyf
Copy link

dickeyf commented Oct 21, 2020

I also encountered this issue.

Here's the statefulset I used:
stateful-fsgroup.txt

And I ran a shell with kubectl exec -it sh
to go to the mount path:
cd /usr/sw/adb
touch test (This fails)
mkdir test (This fails)

Provisioner used: csi.vsphere.vmware.com

(Note that I am not getting this issue with Portworx, Ceph, EBS, etc. when I apply the exact same statefulset yaml)

It is expected that an unprivileged Pod running as a non-root UID can access/delete/created file/dir in the mounted PVCs when fsgroup is specified in the pod's security context. Yet with "csi.vsphere.vmware.com" this is not the case.

@RaunakShah
Copy link
Contributor

@dickeyf thanks for the YAML. Will get back to you shortly.

@RobbieJVMW
Copy link

Seeing this too - sample code here:
https://github.com/RobbieJVMW/Kubernetes-PV-Test

@dickeyf
Copy link

dickeyf commented Oct 22, 2020

Since 1.19 Kubernetes does a check here to verify whether a CSI Driver supports fsGroup:
https://github.com/kubernetes/kubernetes/blob/dd466bccde8176bd390fcf712c0752ae94444742/pkg/volume/csi/csi_mounter.go#L374

The field it checks ultimately comes from the CSIDriver object's spec. (https://kubernetes-csi.github.io/docs/csi-driver-object.html, see fsGroupPolicy field). However the default value seems OK, and according to source, seems to retain old behavior.

Could this be related to it? We were using K8S 1.19 when testing this.

@Anil-YadavK8s Do you remember what version you used?

@dickeyf
Copy link

dickeyf commented Oct 22, 2020

This is the CSIDriver object we had when testing:

kind: CSIDriver
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: csi.vsphere.vmware.com
spec:
  attachRequired: true
  podInfoOnMount: false
  volumeLifecycleModes:
    - Persistent

I did confirm this versions would have the proper defaulting:
https://github.com/kubernetes/kubernetes/blob/96c057ab48a367270bf6e34b8595809dc87f00da/pkg/apis/storage/v1beta1/defaults.go#L43

	if obj.Spec.FSGroupPolicy == nil && utilfeature.DefaultFeatureGate.Enabled(features.CSIVolumeFSGroupPolicy) {
		obj.Spec.FSGroupPolicy = new(storagev1beta1.FSGroupPolicy)
		*obj.Spec.FSGroupPolicy = storagev1beta1.ReadWriteOnceWithFSTypeFSGroupPolicy
	}

@RaunakShah
Copy link
Contributor

RaunakShah commented Oct 22, 2020

@dickeyf @RobbieJVMW @Anil-YadavK8s thanks for all the updates.

I was able to reproduce this issue locally. I used the sts yaml provided in #370 (comment) on a default storage class on my setup.

This is the original storage class spec i used:

root@k8s-master-18:~# kubectl describe sc example-vanilla-block-sc
Name:                  example-vanilla-block-sc
IsDefaultClass:        Yes
Annotations:           storageclass.kubernetes.io/is-default-class=true
Provisioner:           csi.vsphere.vmware.com
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

As you can see, the parameters section is empty.
Since no fsType is provided in the sc, the CSI driver applies a default storage class of ext4 and formats the filesystem. Ref code: https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v2.0.0/pkg/csi/service/common/util.go#L112

However according to Kubelet's internal spec, no fsType is specified, so it ignores the fsGroup (irrespective of k8s version)!
Here's a log:

Oct 22 10:42:51 k8s-node-0900 kubelet[16339]: I1022 10:42:51.396454   16339 csi_mounter.go:383] kubernetes.io/csi: mounter.SetupAt WARNING: skipping fsGroup, fsType not provided

And here's the reference code - https://github.com/kubernetes/kubernetes/blob/v1.18.9/pkg/volume/csi/csi_mounter.go#L383

I fixed the issue locally, by specifying the fsType in my storage class and retrying the original steps. Here's an example storage class, where you have to specify csi.storage.k8s.io/fstype: "ext4" as a parameter - https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v2.0.0/example/vanilla-k8s-block-driver/example-sc.yaml#L11

Once i do that, fsGroup is applied by Kubelet!

Oct 22 12:13:41 k8s-node-0900 kubelet[16339]: I1022 12:13:41.793198   16339 csi_mounter.go:407] kubernetes.io/csi: mounter.SetupAt fsGroup [10001] applied successfully to 57022f15-7f65-45af-9ee4-97389764caca 

And i can touch files at the mount point:

root@k8s-master-18:~# kubectl exec test-pod-0 -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ $ cd /usr/sw/adb
/usr/sw/adb $ touch test
/usr/sw/adb $ mkdir file
/usr/sw/adb $ ls
file  test
/usr/sw/adb $ ls -al
total 12
drwxrwsr-x    3 root     10001         4096 Oct 22 19:15 .
drwxr-xr-x    6 root     root          4096 Oct 22 19:13 ..
drwxr-sr-x    2 10002    10001         4096 Oct 22 19:15 file
-rw-r--r--    1 10002    10001            0 Oct 22 19:15 test

Can you try to apply the csi.storage.k8s.io/fstype to your storage classes and let me know if that works?

@RaunakShah
Copy link
Contributor

Here's another link that's informative - https://github.com/kubernetes-csi/external-provisioner/blob/8b0707649212d770624008edbd127f312121aff9/cmd/csi-provisioner/csi-provisioner.go#L77

If external-provisioner fsType isn't set, and SC fsType isn't set, then none is assumed.

@dickeyf
Copy link

dickeyf commented Oct 22, 2020

Will try that, thanks a lot! I believe this was our issue all along.

If external-provisioner fsType isn't set, and SC fsType isn't set, then none is assumed.

So if either one is set it would fix the issue?

@RaunakShah
Copy link
Contributor

RaunakShah commented Oct 22, 2020

Yes it would. Setting it on external-provisioner helps avoid setting it on each individual storage class. Setting it on a storage class supercedes whatever is set on the external-provisioner. Note that you need to be using external-provisioner v2.0.0 for this feature.

I've verified both options, they work. Can you give it a try on your setup too and let me know how it goes?

@RaunakShah
Copy link
Contributor

@dickeyf @Anil-YadavK8s can you guys also tell me what external-provisioner version you are using?

@RobbieJVMW
Copy link

I've tested this against my VSAN lab on TKG 1.2 and its working successfully when you provide fsType. @dickeyf this should fix the issue we have been having.

@RobbieJVMW
Copy link

If you don't provide this param to the storageclass definition should be generating a 'permissions' error from inside the container or throwing the issue out before we reach that point ? Maybe simply highlighting it in docs ?
I'm asking what the desired behaviour should be here ?

@RobbieJVMW
Copy link

/bug

@RaunakShah
Copy link
Contributor

Updated the YAMLs with default fs type for now, which is the short term fix suggested by the community.
/close

@k8s-ci-robot
Copy link
Contributor

@RaunakShah: Closing this issue.

In response to this:

Updated the YAMLs with default fs type for now, which is the short term fix suggested by the community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kingnarmer
Copy link

I am having same issue without any luck with applying work around.
Still getting error after adding parameter csi.storage.k8s.io/fstype=ext4 and added security context on statefulset side.
This is new install with driver 2.1 on rancher 2.5.7 running kubernetes1.20.5

Appreciate feed back.
Thanks

@RaunakShah
Copy link
Contributor

@true64gurus can you paste the sts yaml that you are attempting to deploy?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 24, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 23, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants