Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd NodeUnpublishVolume: does not detach #86

Closed
pohly opened this issue Oct 10, 2018 · 15 comments
Closed

rbd NodeUnpublishVolume: does not detach #86

pohly opened this issue Oct 10, 2018 · 15 comments

Comments

@pohly
Copy link
Contributor

pohly commented Oct 10, 2018

I'm running on a Kubernetes 1.11 cluster, Linux kernel 4.18.

While running the E2E test (= create PVC, use it first in one pod, then another) I noticed that the second pod got stuck in NodePublishVolume because the image was still in use (= had watchers) by the node on which the first pod ran.

After adding a bit more debug output I traced it down to:

	devicePath, cnt, err := mount.GetDeviceNameFromMount(mounter, targetPath)
	if err != nil {
		return nil, status.Error(codes.Internal, err.Error())
	}

	// Unmounting the image
	err = mounter.Unmount(targetPath)
	if err != nil {
		return nil, status.Error(codes.Internal, err.Error())
	}

	cnt--
	if cnt != 0 {
	 	return &csi.NodeUnpublishVolumeResponse{}, nil
	}

In my case, cnt is 1 in that if check and thus the image remains attached to /dev/rbd0.

What exactly is the logic behind this if check? NodePublishVolume always attaches the images, doesn't it? IMHO NodeUnpublishVolume then also always must detach, because once it returns a success status without detaching, there won't be another NodeUnpublishVolume call and the image will remain attached forever.

I've tentatively commented out the if check and now the test passes, i.e. detachRBDDevice succeeds and the image can be used in the second pod.

Note that this only seems to be a problem when the second pod gets scheduled to a different node than the first one. When trying this out manually, I could see that /dev/rbd0 was still present after deleting the first pod. When starting the same pod again on the same node, /dev/rbd0 got mounted and the pod started fine.

@rootfs
Copy link
Member

rootfs commented Oct 10, 2018

if cnt != 0 then there is another mount point for this rbd device and mostly likely used by other pods. In such case, we cannot detach the device (that'll cause data loss). Only when the last mount point removed, we can safely detach it.

do you see what is the other mountpoint when cnt > 0?

@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2018 via email

@rootfs
Copy link
Member

rootfs commented Oct 10, 2018

can you add a debug message around here to print out the target path and device when there is a refcount?
https://github.com/ceph/ceph-csi/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount.go#L239

@rootfs
Copy link
Member

rootfs commented Oct 10, 2018

is that just an e2e problem or do you see the same issue on manual test? i couldn't reproduce it on my test locally

@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2018 via email

@rootfs
Copy link
Member

rootfs commented Oct 10, 2018

mountpoint cnt > 0 only happens on the same node, this is different from rbd status which detects multiple attacher.

@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2018

Here's the debug output from GetDeviceNameFromMount:

oim-ceph-rbd-xzk2m/rbdplugin: refCount 0, mps {Device:/dev/rbd0 Path:/var/lib/kubelet/pods/362c59f6-cd1d-11e8-a828-deadbeef0100/volumes/kubernetes.io~csi/pvc-27c2c568-cd1d-11e8-a828-deadbeef0100/mount Type:ext4 Opts:[rw relatime stripe=1024] Freq:0 Pass:0} matches
oim-ceph-rbd-xzk2m/rbdplugin: refCount 1, mps {Device:/dev/rbd0 Path:/host/var/lib/kubelet/pods/362c59f6-cd1d-11e8-a828-deadbeef0100/volumes/kubernetes.io~csi/pvc-27c2c568-cd1d-11e8-a828-deadbeef0100/mount Type:ext4 Opts:[rw relatime stripe=1024] Freq:0 Pass:0} matches

It's basically the same mount, just appearing under different paths inside the container because both host and pods-mount-dir bind-mound the same directory:

- name: pods-mount-dir
mountPath: /var/lib/kubelet/pods
mountPropagation: "Bidirectional"
- mountPath: /dev
name: host-dev
- mountPath: /host
name: host-rootfs
- mountPath: /sys

I'm still not sure whether it works for you. Have you tried this on a multi-node cluster and then used the same PV first on one node, then on another?

I think this confusion is a result of doing the attach and mount both inside NodePublishVolume. It might be cleaner to do the attach+mkfs in NodeStageVolume and only bind-mount in NodePublishVolume.

@rootfs
Copy link
Member

rootfs commented Oct 11, 2018

how did you run your kubelet? inside a container?

@rootfs
Copy link
Member

rootfs commented Oct 11, 2018

i only have one mountpoint. can you show your output using the following cmd?

# kubectl exec -ti csi-rbdplugin-4xpl7 -c csi-rbdplugin -- sh -c "grep rbd0 /proc/1/mountinfo"
885 1543 251:0 / /var/lib/kubelet/pods/1cf68e71-cd4f-11e8-a50e-00259003b6e8/volumes/kubernetes.io~csi/pvc-1cf554bccd4f11e8/mount rw,relatime shared:163 - ext4 /dev/rbd0 rw,stripe=1024,data=ordered

@rootfs
Copy link
Member

rootfs commented Oct 11, 2018

in any case, the code is buggy to me as the following:

mounter

the mounter shouldn't be a blindly mount.New("") here

kubelet starts up in a more robust way as the following

        mounter := mount.New(s.ExperimentalMounterPath)
        var pluginRunner = exec.New()
        if s.Containerized {
                glog.V(2).Info("Running kubelet in containerized mode")
                ne, err := nsenter.NewNsenter(nsenter.DefaultHostRootFsPath, exec.New())
                if err != nil {
                        return nil, err
                }
                mounter = mount.NewNsenterMounter(s.RootDirectory, ne)
                // an exec interface which can use nsenter for flex plugin calls
                pluginRunner = nsenter.NewNsenterExecutor(nsenter.DefaultHostRootFsPath, exec.New())
        }

host path

the host path should be bind mounted to container's /rootfs, instead of just /host. This is the default path for nsenter to list host's mountpoint. https://github.com/kubernetes/kubernetes/blob/master/pkg/util/nsenter/nsenter.go#L37

@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2018 via email

@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2018 via email

@pohly
Copy link
Contributor Author

pohly commented Oct 17, 2018

I know, a bit late, but I can also confirm that this fixed the issue :-) Any chance of publishing a new version? Or has the v0.3.0 image been updated?

I'm still wondering whether it wouldn't be better to use NodeStageVolume.

@rootfs
Copy link
Member

rootfs commented Oct 17, 2018

yes, new container image is published, please try it out, thanks!

@pohly
Copy link
Contributor Author

pohly commented Oct 17, 2018 via email

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this issue Jun 20, 2022
Bug 2060720: rbd: return unimplemented error for block-mode reclaimspace req
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants