rbd NodeUnpublishVolume: does not detach #86

pohly · 2018-10-10T13:36:26Z

I'm running on a Kubernetes 1.11 cluster, Linux kernel 4.18.

While running the E2E test (= create PVC, use it first in one pod, then another) I noticed that the second pod got stuck in NodePublishVolume because the image was still in use (= had watchers) by the node on which the first pod ran.

After adding a bit more debug output I traced it down to:

	devicePath, cnt, err := mount.GetDeviceNameFromMount(mounter, targetPath)
	if err != nil {
		return nil, status.Error(codes.Internal, err.Error())
	}

	// Unmounting the image
	err = mounter.Unmount(targetPath)
	if err != nil {
		return nil, status.Error(codes.Internal, err.Error())
	}

	cnt--
	if cnt != 0 {
	 	return &csi.NodeUnpublishVolumeResponse{}, nil
	}

In my case, cnt is 1 in that if check and thus the image remains attached to /dev/rbd0.

What exactly is the logic behind this if check? NodePublishVolume always attaches the images, doesn't it? IMHO NodeUnpublishVolume then also always must detach, because once it returns a success status without detaching, there won't be another NodeUnpublishVolume call and the image will remain attached forever.

I've tentatively commented out the if check and now the test passes, i.e. detachRBDDevice succeeds and the image can be used in the second pod.

Note that this only seems to be a problem when the second pod gets scheduled to a different node than the first one. When trying this out manually, I could see that /dev/rbd0 was still present after deleting the first pod. When starting the same pod again on the same node, /dev/rbd0 got mounted and the pod started fine.

The text was updated successfully, but these errors were encountered:

rootfs · 2018-10-10T15:03:51Z

if cnt != 0 then there is another mount point for this rbd device and mostly likely used by other pods. In such case, we cannot detach the device (that'll cause data loss). Only when the last mount point removed, we can safely detach it.

do you see what is the other mountpoint when cnt > 0?

pohly · 2018-10-10T16:14:31Z

Huamin Chen <[email protected]> writes:

`if cnt != 0` then there is another mount point for this rbd device and mostly likely used by other pods.

What can lead to this situation, and what is the plan to recover from this if it happens?

In such case, we cannot detach the device (that'll cause data loss). Only when the last mount point removed, we can safely detach it. do you see what is the other mountpoint when cnt > 0?

There is none. "mount | grep rbd" shows nothing and "lsblk" also confirms that the device isn't mounted.

rootfs · 2018-10-10T17:02:49Z

can you add a debug message around here to print out the target path and device when there is a refcount?
https://github.com/ceph/ceph-csi/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount.go#L239

rootfs · 2018-10-10T17:06:48Z

is that just an e2e problem or do you see the same issue on manual test? i couldn't reproduce it on my test locally

pohly · 2018-10-10T17:16:20Z

Huamin Chen <[email protected]> writes:

is that just an e2e problem or do you see the same issue on manual test? i couldn't reproduce it on my test locally

Yes. See the note in the description. That was manual, but wasn't a problem when using only a single node. Have you ensured that the pods get to run on different nodes?

rootfs · 2018-10-10T17:21:15Z

mountpoint cnt > 0 only happens on the same node, this is different from rbd status which detects multiple attacher.

pohly · 2018-10-11T06:34:03Z

Here's the debug output from GetDeviceNameFromMount:

oim-ceph-rbd-xzk2m/rbdplugin: refCount 0, mps {Device:/dev/rbd0 Path:/var/lib/kubelet/pods/362c59f6-cd1d-11e8-a828-deadbeef0100/volumes/kubernetes.io~csi/pvc-27c2c568-cd1d-11e8-a828-deadbeef0100/mount Type:ext4 Opts:[rw relatime stripe=1024] Freq:0 Pass:0} matches
oim-ceph-rbd-xzk2m/rbdplugin: refCount 1, mps {Device:/dev/rbd0 Path:/host/var/lib/kubelet/pods/362c59f6-cd1d-11e8-a828-deadbeef0100/volumes/kubernetes.io~csi/pvc-27c2c568-cd1d-11e8-a828-deadbeef0100/mount Type:ext4 Opts:[rw relatime stripe=1024] Freq:0 Pass:0} matches

It's basically the same mount, just appearing under different paths inside the container because both host and pods-mount-dir bind-mound the same directory:

ceph-csi/deploy/rbd/kubernetes/csi-rbdplugin.yaml

Lines 65 to 72 in e761d69

    
           - name: pods-mount-dir 
        
             mountPath: /var/lib/kubelet/pods 
        
             mountPropagation: "Bidirectional" 
        
           - mountPath: /dev 
        
             name: host-dev 
        
           - mountPath: /host 
        
             name: host-rootfs             
        
           - mountPath: /sys

I'm still not sure whether it works for you. Have you tried this on a multi-node cluster and then used the same PV first on one node, then on another?

I think this confusion is a result of doing the attach and mount both inside NodePublishVolume. It might be cleaner to do the attach+mkfs in NodeStageVolume and only bind-mount in NodePublishVolume.

rootfs · 2018-10-11T12:10:10Z

how did you run your kubelet? inside a container?

rootfs · 2018-10-11T12:26:50Z

i only have one mountpoint. can you show your output using the following cmd?

# kubectl exec -ti csi-rbdplugin-4xpl7 -c csi-rbdplugin -- sh -c "grep rbd0 /proc/1/mountinfo"
885 1543 251:0 / /var/lib/kubelet/pods/1cf68e71-cd4f-11e8-a50e-00259003b6e8/volumes/kubernetes.io~csi/pvc-1cf554bccd4f11e8/mount rw,relatime shared:163 - ext4 /dev/rbd0 rw,stripe=1024,data=ordered

rootfs · 2018-10-11T12:47:07Z

in any case, the code is buggy to me as the following:

mounter

the mounter shouldn't be a blindly mount.New("") here

kubelet starts up in a more robust way as the following

        mounter := mount.New(s.ExperimentalMounterPath)
        var pluginRunner = exec.New()
        if s.Containerized {
                glog.V(2).Info("Running kubelet in containerized mode")
                ne, err := nsenter.NewNsenter(nsenter.DefaultHostRootFsPath, exec.New())
                if err != nil {
                        return nil, err
                }
                mounter = mount.NewNsenterMounter(s.RootDirectory, ne)
                // an exec interface which can use nsenter for flex plugin calls
                pluginRunner = nsenter.NewNsenterExecutor(nsenter.DefaultHostRootFsPath, exec.New())
        }

host path

the host path should be bind mounted to container's /rootfs, instead of just /host. This is the default path for nsenter to list host's mountpoint. https://github.com/kubernetes/kubernetes/blob/master/pkg/util/nsenter/nsenter.go#L37

pohly · 2018-10-11T14:31:38Z

Huamin Chen <[email protected]> writes:

how did you run your kubelet? inside a container?

No, inside a VM. The cluster itself gets brought up with kubeadm.

pohly · 2018-10-11T14:57:17Z

Huamin Chen <[email protected]> writes:

i only have one mountpoint. can you show your output using the following cmd? ``` # kubectl exec -ti csi-rbdplugin-4xpl7 -c csi-rbdplugin -- sh -c "grep rbd0 /proc/1/mountinfo" 885 1543 251:0 / /var/lib/kubelet/pods/1cf68e71-cd4f-11e8-a50e-00259003b6e8/volumes/kubernetes.io~csi/pvc-1cf554bccd4f11e8/mount rw,relatime shared:163 - ext4 /dev/rbd0 rw,stripe=1024,data=ordered

``` $ kubectl exec -ti oim-ceph-rbd-5wdmr -c rbdplugin -- sh -c "grep rbd0 /proc/1/mountinfo" 1801 2007 253:0 / /var/lib/kubelet/pods/840d1682-cd65-11e8-9fba-deadbeef0100/volumes/kubernetes.io~csi/pvc-8017a31e-cd65-11e8-9fba-deadbeef0100/mount rw,relatime shared:547 - ext4 /dev/rbd0 rw,stripe=1024 1804 1896 253:0 / /host/var/lib/kubelet/pods/840d1682-cd65-11e8-9fba-deadbeef0100/volumes/kubernetes.io~csi/pvc-8017a31e-cd65-11e8-9fba-deadbeef0100/mount rw,relatime master:547 - ext4 /dev/rbd0 rw,stripe=1024 ``` Does your deployment have the `host` volume? It's not obvious to me why that is needed.

pohly · 2018-10-17T10:32:32Z

I know, a bit late, but I can also confirm that this fixed the issue :-) Any chance of publishing a new version? Or has the v0.3.0 image been updated?

I'm still wondering whether it wouldn't be better to use NodeStageVolume.

rootfs · 2018-10-17T11:59:23Z

yes, new container image is published, please try it out, thanks!

pohly · 2018-10-17T14:27:22Z

Huamin Chen <[email protected]> writes:

yes, new container image is published, please try it out, thanks!

Hmm, it was still failing. I suspect that the overlapping gRPC might have been to blame, but did not investigate in detail. What works for me is master+PR #92.

Bug 2060720: rbd: return unimplemented error for block-mode reclaimspace req

rootfs mentioned this issue Oct 15, 2018

support nsmounter when running in containerized mode #89

Merged

rootfs closed this as completed in #89 Oct 16, 2018

pohly mentioned this issue Oct 17, 2018

rbdplugin: provision seperately from node controller #93

Closed

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this issue Jun 20, 2022

Merge pull request ceph#86 from Rakshith-R/blkdiscard

818710a

Bug 2060720: rbd: return unimplemented error for block-mode reclaimspace req

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rbd NodeUnpublishVolume: does not detach #86

rbd NodeUnpublishVolume: does not detach #86

pohly commented Oct 10, 2018 •

edited

Loading

rootfs commented Oct 10, 2018

pohly commented Oct 10, 2018 via email

rootfs commented Oct 10, 2018

rootfs commented Oct 10, 2018

pohly commented Oct 10, 2018 via email

rootfs commented Oct 10, 2018

pohly commented Oct 11, 2018

rootfs commented Oct 11, 2018

rootfs commented Oct 11, 2018

rootfs commented Oct 11, 2018

pohly commented Oct 11, 2018 via email

pohly commented Oct 11, 2018 via email

pohly commented Oct 17, 2018

rootfs commented Oct 17, 2018

pohly commented Oct 17, 2018 via email

rbd NodeUnpublishVolume: does not detach #86

rbd NodeUnpublishVolume: does not detach #86

Comments

pohly commented Oct 10, 2018 • edited Loading

rootfs commented Oct 10, 2018

pohly commented Oct 10, 2018 via email

rootfs commented Oct 10, 2018

rootfs commented Oct 10, 2018

pohly commented Oct 10, 2018 via email

rootfs commented Oct 10, 2018

pohly commented Oct 11, 2018

rootfs commented Oct 11, 2018

rootfs commented Oct 11, 2018

rootfs commented Oct 11, 2018

mounter

host path

pohly commented Oct 11, 2018 via email

pohly commented Oct 11, 2018 via email

pohly commented Oct 17, 2018

rootfs commented Oct 17, 2018

pohly commented Oct 17, 2018 via email

pohly commented Oct 10, 2018 •

edited

Loading