Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting static provisioned PV with nfsvers in mount options causes mount to fail #1692

Closed
cailyoung opened this issue Feb 12, 2024 · 10 comments

Comments

@cailyoung
Copy link

What happened:
Restarted a pod post-AKS-initiated upgrade from driver 1.29.1 to 1.29.2 that mounts a premium NFS File share with these options:

  mountOptions:                                                                                                                       
  - nfsvers=4.1                                                                                                                       
  - nconnect=4                                                                                                                        
  - lookupcache=positive

Pod stayed stuck in pending/init, with these errors:

MountVolume.MountDevice failed for volume "redacted" : │
│  rpc error: code = Internal desc = volume(redacted-prod#redacted#redacted###redacted) mount redacted. │
│ file.core.windows.net:/redacted/redacted on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/f │
│ 2aabae68a23cc910581f5b902983e3790adc3dadd5a621efae2b0f45bbbefef/globalmount failed with mount failed: exit status 32               │
│ Mounting command: mount                                                                                                            │
│ Mounting arguments: -t nfs -o lookupcache=positive,nconnect=4,nfsvers=4.1,vers=4,minorversion=1,sec=sys,noresvport,actimeo=30 reda │
│ cted.file.core.windows.net:/redacted/redacted /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.a │
│ zure.com/f2aabae68a23cc910581f5b902983e3790adc3dadd5a621efae2b0f45bbbefef/globalmount                                              │
│ Output: mount.nfs: multiple version options not permitted

What you expected to happen:
Volume should mount as per before with older CSI image version

How to reproduce it:

  • Create Premium file share
  • Create custom storage class without mount options, binding mode immediate
  • Create static persistent volume referring to the existing file share, mount options as above
  • Create PVC with ReadWriteMany referring to the static PV
  • Launch a deployment mounting the PVC

Anything else we need to know?:

Environment:

  • CSI Driver version: 1.29.1
  • Kubernetes version (use kubectl version): 1.28.3
  • OS (e.g. from /etc/os-release): Ubuntu 22.04 in AKS
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@JoeyC-Dev
Copy link

JoeyC-Dev commented Feb 12, 2024

Before: I am not k8s developer.

The reason why this issue is happening is due to: #1594

I have tried use ver=4 insead of nfsvers=4.1, it did not overwrite default ver settings and popped up the same error. The result will include something like: ...,vers=4,vers=4,... (Duplicated arguments with same value)

Affected versions: v1.28.7 on v1.26.10 on AKS 1.26, AKS 1.27, v1.29.2 on AKS 1.28.

For workaround:

  1. Since Azure fileshare only supports v4.1, you can give up setting “vers” or “nfsvers”.
    Source:
    a. “NFS file shares currently only support most features from the 4.1 protocol specification”: https://learn.microsoft.com/en-us/azure/storage/files/files-nfs-protocol
    b. “Azure Files only supports NFS v4.1”: https://learn.microsoft.com/en-us/azure/storage/files/storage-files-how-to-mount-nfs-shares?tabs=portal#mount-options

  2. Deploy your Pod in real short time:
    a. Patch the version: kubectl patch daemonset csi-azurefile-node -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"azurefile","image":"mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi:v1.29.1"}]}}}}'
    b. Repeatedly use loop function in bash script to execute the 1st command for every 3 seconds
    c. Redeploy your Pod in a very short time, because AKS control plane will roll the version back real quick

Additional: Error log

kind: Event
apiVersion: v1
metadata:
  name: statefulset-azurefile-3-0.17b3107e8b4aa08b
  namespace: default
  uid: cca48778-fa71-40cc-8f26-df8bc29ee7da
  resourceVersion: '20219'
  creationTimestamp: '2024-02-12T08:33:39Z'
  managedFields:
    - manager: kubelet
      operation: Update
      apiVersion: v1
      time: '2024-02-12T08:33:55Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:count: {}
        f:firstTimestamp: {}
        f:involvedObject: {}
        f:lastTimestamp: {}
        f:message: {}
        f:reason: {}
        f:reportingComponent: {}
        f:reportingInstance: {}
        f:source:
          f:component: {}
          f:host: {}
        f:type: {}
involvedObject:
  kind: Pod
  namespace: default
  name: statefulset-azurefile-3-0
  uid: a8f404f0-afb6-4ae2-8e03-d55cc8b06d36
  apiVersion: v1
  resourceVersion: '20129'
reason: FailedMount
message: >-
  MountVolume.MountDevice failed for volume
  "pvc-8231b898-f6de-4794-9b5a-34736f2e5799" : rpc error: code = Internal desc =
  volume(joeylab_csibug#csibugpremiumfs#pvcn-8231b898-f6de-4794-9b5a-34736f2e5799###default)
  mount
  csibugpremiumfs.file.core.windows.net:/csibugpremiumfs/pvcn-8231b898-f6de-4794-9b5a-34736f2e5799
  on
  /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/b058d8a66ec7830e101daa58feac721f184cf783d7e4b4dc3cf6c480c9040eb7/globalmount
  failed with mount failed: exit status 32

  Mounting command: mount

  Mounting arguments: -t nfs -o
  lookupcache=positive,nconnect=4,vers=4,vers=4,minorversion=1,sec=sys
  csibugpremiumfs.file.core.windows.net:/csibugpremiumfs/pvcn-8231b898-f6de-4794-9b5a-34736f2e5799
  /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/b058d8a66ec7830e101daa58feac721f184cf783d7e4b4dc3cf6c480c9040eb7/globalmount

  Output: mount.nfs: multiple version options not permitted


  Please refer to http://aka.ms/filemounterror for possible causes and solutions
  for mount errors.
source:
  component: kubelet
  host: aks-agentpool-27411117-vmss000001
firstTimestamp: '2024-02-12T08:33:39Z'
lastTimestamp: '2024-02-12T08:33:55Z'
count: 6
type: Warning
eventTime: null
reportingComponent: kubelet
reportingInstance: aks-agentpool-27411117-vmss000001

@andyzhangx
Copy link
Member

@cailyoung the azure file driver already appends vers=4,minorversion=1 mount options by default, so it's not necessary to append nfsvers=4.1 any more, could you remove that mount option in pv, and make remount happen?

@JoeyC-Dev
Copy link

JoeyC-Dev commented Feb 12, 2024

@andyzhangx But even set with vers=4, it still did not overwrite the default option.
As documented in: https://learn.microsoft.com/en-us/azure/aks/azure-files-csi#create-nfs-file-share-storage-class

Create a file named nfs-sc.yaml and copy the manifest below. For a list of supported mountOptions, see NFS mount options

The link at the end of sentence points to "Mount options", and this makes user believe that "vers" is something can be set. Does current behavior "if user set vers then it pops error" make sense? The current document makes vers like something configurable.

@andyzhangx
Copy link
Member

@andyzhangx But even set with vers=4, it still did not overwrite the default option. As documented in: https://learn.microsoft.com/en-us/azure/aks/azure-files-csi#create-nfs-file-share-storage-class

Create a file named nfs-sc.yaml and copy the manifest below. For a list of supported mountOptions, see NFS mount options

The link at the end of sentence points to "Mount options", and this makes user believe that "vers" is something can be set. Does current behavior "if user set vers then it pops error" make sense? The current document makes vers like something configurable.

@JoeyC-Dev vers is not configurable in azure file storage class since it's a fixed setting and already set by azure file driver.

@JoeyC-Dev
Copy link

JoeyC-Dev commented Feb 12, 2024

mountOptions = util.JoinMountOptions(mountFlags, []string{"vers=4,minorversion=1,sec=sys"})

I get same response that this is by design. Thanks.

@cailyoung
Copy link
Author

@andyzhangx we removed the nfsvers mount option from the PV and this is resolved for us. The docs update that @JoeyC-Dev has suggested will help others in case they run into this.

Some background: We are only in this position because this was originally a simple nfs volume, which we then migrated to CSI in order to use storage expansion. So it's unlikely that many customers will run into it, but it's worth documenting anyway.

@JoeyC-Dev
Copy link

JoeyC-Dev commented Feb 13, 2024

@cailyoung The AKS content developer is from different team. Relative team will look into this. I have submitted the PR (MicrosoftDocs/azure-docs#119780) and I believe the reviewer will see this soon.

@andyzhangx
Copy link
Member

@cailyoung The AKS content developer is from different team. Relative team will look into this. I have submitted the PR (MicrosoftDocs/azure-docs#119780) and I believe the reviewer will see this soon.

@JoeyC-Dev can you work out a doc PR into https://github.com/MicrosoftDocs/azure-docs-pr/blob/main/articles/aks/azure-files-csi.md directly? and then I could approve that PR, thanks.

@JoeyC-Dev
Copy link

@cailyoung The AKS content developer is from different team. Relative team will look into this. I have submitted the PR (MicrosoftDocs/azure-docs#119780) and I believe the reviewer will see this soon.

@JoeyC-Dev can you work out a doc PR into https://github.com/MicrosoftDocs/azure-docs-pr/blob/main/articles/aks/azure-files-csi.md directly? and then I could approve that PR, thanks.

@andyzhangx I am not MSFT FTE so no access. But based on past experience, @MGoedtel is a very diligent and good person so I believe this PR will be merged soon.

@andyzhangx
Copy link
Member

thanks folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants