Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd node service: flattern image when it has references to parent #1543

Closed
pkalever opened this issue Sep 29, 2020 · 11 comments
Closed

rbd node service: flattern image when it has references to parent #1543

pkalever opened this issue Sep 29, 2020 · 11 comments
Assignees
Labels
component/rbd Issues related to RBD wontfix This will not be worked on

Comments

@pkalever
Copy link

Describe the bug

Currently, as part of node service, we add rbd flatten task for new PVC creates. Ideally, we should add a flatten task only for snapshots/cloned PVCs as required.


[0] pkalever 😎 rbd✨ kubectl describe pods csi-rbd-demo-pod
[...]
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               10m                  default-scheduler        Successfully assigned default/csi-rbd-demo-pod to minikube
  Normal   SuccessfulAttachVolume  10m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-cec90cb9-8273-4eb1-b2fa-32d3206a1f7d"
  Warning  FailedMount             113s (x12 over 10m)  kubelet, minikube        MountVolume.MountDevice failed for volume "pvc-cec90cb9-8273-4eb1-b2fa-32d3206a1f7d" : rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [rbd task add flatten rbd-pool/csi-vol-1dae0d96-0238-11eb-93fe-0242ac110004 --id admin --keyfile=***stripped*** -m 192.168.121.136]
  Warning  FailedMount             93s (x4 over 8m21s)  kubelet, minikube        Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-2xr4n]: timed out waiting for the condition

Environment details

[root@minikube /]# cephcsi --version
Cephcsi Version: canary
Git Commit: fd4328cd5333f4275be52c604c30801fc612fa75
Go Version: go1.15
Compiler: gc
Platform: linux/amd64
Kernel: 4.19.114
[root@minikube /]# 

[root@ceph-node1 ~]# rados lspools
rbd-pool
cephfs-datapool
cephfs-metapool
[root@ceph-node1 ~]# rbd ls -l rbd-pool
NAME                                         SIZE  PARENT FMT PROT LOCK 
csi-vol-e15f6413-023f-11eb-93fe-0242ac110004 1 GiB          2           
image1                                       1 GiB          2           
[root@ceph-node1 ~]# rbd info rbd-pool/csi-vol-e15f6413-023f-11eb-93fe-0242ac110004
rbd image 'csi-vol-e15f6413-023f-11eb-93fe-0242ac110004':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 10d75105603c
        block_name_prefix: rbd_data.10d75105603c
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features: 
        flags: 
        create_timestamp: Tue Sep 29 10:38:15 2020
        access_timestamp: Tue Sep 29 10:38:15 2020
        modify_timestamp: Tue Sep 29 10:38:15 2020
[root@ceph-node1 ~]# 

Steps to reproduce

[0] pkalever 😎 rbd✨ git diff 
index d4305b58e..2bab278f0 100644
--- a/examples/rbd/storageclass.yaml
+++ b/examples/rbd/storageclass.yaml
[...]
    # (optional) RBD image features, CSI creates image with image-format 2
    # CSI RBD currently supports only `layering` feature.
-   imageFeatures: layering
+   # imageFeatures: layering


[0] pkalever 😎 rbd✨ kubectl create -f storageclass.yaml
storageclass.storage.k8s.io/csi-rbd-sc created

[0] pkalever 😎 rbd✨ kubectl create -f pvc.yaml
persistentvolumeclaim/rbd-pvc created

[0] pkalever 😎 rbd✨ kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
rbd-pvc   Bound    pvc-cec90cb9-8273-4eb1-b2fa-32d3206a1f7d   1Gi        RWO            csi-rbd-sc     5s

[0] pkalever 😎 rbd✨ kubectl create -f pod.yaml 
pod/csi-rbd-demo-pod created                     

[0] pkalever 😎 rbd✨ kubectl get pods
NAME                                         READY   STATUS              RESTARTS   AGE
csi-rbd-demo-pod                             0/1     ContainerCreating   0          10m
csi-rbdplugin-hbpq5                          3/3     Running             0          17m
csi-rbdplugin-provisioner-75485f85db-5frvk   6/6     Running             0          17m
csi-rbdplugin-provisioner-75485f85db-sfkl4   6/6     Running             0          17m
csi-rbdplugin-provisioner-75485f85db-zwsfp   6/6     Running             0          17m
vault-867cf4b4d4-qqdt2                       1/1     Running             0          17m
vault-init-job-96d9q                         0/1     Completed           0          17m


[0] pkalever 😎 rbd✨ kubectl describe pods csi-rbd-demo-pod
[...]
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               10m                  default-scheduler        Successfully assigned default/csi-rbd-demo-pod to minikube
  Normal   SuccessfulAttachVolume  10m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-cec90cb9-8273-4eb1-b2fa-32d3206a1f7d"
  Warning  FailedMount             113s (x12 over 10m)  kubelet, minikube        MountVolume.MountDevice failed for volume "pvc-cec90cb9-8273-4eb1-b2fa-32d3206a1f7d" : rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [rbd task add flatten rbd-pool/csi-vol-1dae0d96-0238-11eb-93fe-0242ac110004 --id admin --keyfile=***stripped*** -m 192.168.121.136]
  Warning  FailedMount             93s (x4 over 8m21s)  kubelet, minikube        Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-2xr4n]: timed out waiting for the condition

Actual results

Flatten task is added for new PVC

Expected behavior

No Flattern task

pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Sep 29, 2020
Currently as part of node service we add rbd flatten task for new PVC
creates. Ideally we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: ceph#1543
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Oct 1, 2020
Currently as part of node service we add rbd flatten task for new PVC
creates. Ideally we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: ceph#1543
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
@nixpanic nixpanic added the component/rbd Issues related to RBD label Oct 2, 2020
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Oct 13, 2020
Currently as part of node service we add rbd flatten task for new PVC
creates. Ideally we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: ceph#1543
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Madhu-1 pushed a commit to pkalever/ceph-csi that referenced this issue Oct 20, 2020
Currently as part of node service we add rbd flatten task for new PVC
creates. Ideally we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: ceph#1543
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 27, 2020

@pkalever i tried to reproduce it on ceph octopus but am not able to do it

i.imagename:csi-vol-aee99548-1809-11eb-a07a-826f7defc52c csi.volname:pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1])
I1027 04:05:43.332769       1 rbd_journal.go:435] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 generated Volume ID (0001-0009-rook-ceph-0000000000000002-aee99548-1809-11eb-a07a-826f7defc52c) and image name (csi-vol-aee99548-1809-11eb-a07a-826f7defc52c) for request name (pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1)
I1027 04:05:43.332861       1 rbd_util.go:200] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 rbd: create replicapool/csi-vol-aee99548-1809-11eb-a07a-826f7defc52c size 1024M (features: []) using mon 10.107.158.84:6789
I1027 04:05:43.353568       1 controllerserver.go:465] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 created volume pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 backed by image csi-vol-aee99548-1809-11eb-a07a-826f7defc52c
I1027 04:05:43.375052       1 omap.go:136] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 set omap keys (pool="replicapool", namespace="", name="csi.volume.aee99548-1809-11eb-a07a-826f7defc52c"): map[csi.imageid:113e70f8d035])
sh-4.4# rbd info csi-vol-aee99548-1809-11eb-a07a-826f7defc52c --pool=replicapool
rbd image 'csi-vol-aee99548-1809-11eb-a07a-826f7defc52c':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 113e70f8d035
	block_name_prefix: rbd_data.113e70f8d035
	format: 2
	features: layering
	op_features: 
	flags: 
	create_timestamp: Tue Oct 27 04:05:43 2020
	access_timestamp: Tue Oct 27 04:05:43 2020
	modify_timestamp: Tue Oct 27 04:05:43 2020
sh-4.4# ceph version
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)

this is with a cephcsi canary image. let me know still you are able to reproduce it. I would like to check a few things.

@cjheppell
Copy link

Is there any update on this?

I discussed seeing flattening happening in #1800 but it was mentioned that none of the operations there would cause flattening.

As it stands right now, if I create a PVC, then snapshot that PVC, then create a clone of the snapshot and try to mount it I'm getting this error from a kubectl describe of a pod trying to use that cloned PVC:

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               73s                default-scheduler        Successfully assigned rook/busybox-sleep to minikube
  Normal   SuccessfulAttachVolume  74s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d9e9bde4-ec56-4f6b-8d0c-2928b66df5d7"
  Warning  FailedMount             35s (x6 over 58s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-d9e9bde4-ec56-4f6b-8d0c-2928b66df5d7" : rpc error: code = Internal desc = flatten in progress: flatten is in progress for image csi-vol-b3f39645-709a-11eb-8f85-0242ac110010

It is eventually able to enter a running state, but this is due to the flatten operation completing.

I don't want any flattening to occur, as it defeats the point of me using cloning altogether 😞

I did see this in the output of my csi-rbdplugin logs though:

E0216 20:11:48.159792    6417 util.go:232] kernel 4.19.157 does not support required features
E0216 20:11:48.753455    6417 utils.go:136] ID: 274 Req-ID: 0001-0005-rook-0000000000000002-2e26c0b6-7093-11eb-be63-0242ac110010 GRPC error: rpc error: code = Internal desc = flatten in progress: flatten is in progress for image csi-vol-2e26c0b6-7093-11eb-be63-0242ac110010

Is this flattening happening because I'm running a kernel that doesn't support deep flatten? 🤔

@cjheppell
Copy link

Ah, it seems this comment suggests you must have kernel 5.1+ to avoid a full flatten: #693 (comment)

Presumably this is the problem, as minikube is using 4.19?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 17, 2021

@cjheppell kernel less than 5.1+ does not support mapping of rbd images with deep-flatten image feature for that we need to flatten the image first and map it on the node.

@cjheppell
Copy link

Was this a change between v2.1.x and v3?

As described in #1800, when I performed the same actions on v2.1.2 I didn't see this flattening behaviour.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 17, 2021

Yes, this is a change in v3.x as we reworked the rbd snapshot and clone implementation.

@cjheppell
Copy link

Presumably that's what the "Snapshot Alpha is no longer supported" in the v3.0.0 release notes is referring to? https://github.com/ceph/ceph-csi/releases/tag/v3.0.0

I must admit, this is very surprising and completely unexpected behaviour as a user.

It seems that unless I'm on a kernel 5.1+ then cloning from snapshots is fundamentally not performing the copy-on-write behaviour that Ceph claims to offer. Even moreso, that's very hidden from me as from glancing at the behaviour in Kubernetes it appears that cloning is working. But it's only when I mount the clone that the flatten is revealed to me.

If that snapshot contains hundreds of gigabytes of data, then that operation is likely to take a very long time.

Even moreso, the only way I was able to determine that I needed a 5.1+ kernel is by digging through issues and pull request comments.

Could this perhaps be documented more clearly somewhere? It would've saved me an awful lot of time from digging through the lines of code and various pull requests associated with this behaviour.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 17, 2021

Presumably that's what the "Snapshot Alpha is no longer supported" in the v3.0.0 release notes is referring to? https://github.com/ceph/ceph-csi/releases/tag/v3.0.0

I must admit, this is very surprising and completely unexpected behaviour as a user.

It seems that unless I'm on a kernel 5.1+ then cloning from snapshots is fundamentally not performing the copy-on-write behaviour that Ceph claims to offer. Even moreso, that's very hidden from me as from glancing at the behaviour in Kubernetes it appears that cloning is working. But it's only when I mount the clone that the flatten is revealed to me.

in kubernetes, both snapshot and pvc are the independent objects. this is a new design (v3.x+) to handle that. rbd clone will be created when a user requests kubernetes snapshots.

If that snapshot contains hundreds of gigabytes of data, then that operation is likely to take a very long time.

Even moreso, the only way I was able to determine that I needed a 5.1+ kernel is by digging through issues and pull request comments.

yes as the clones are created with the deep-flatten feature if the kernel version is less than 5.1 the nodeplugin tries to flatten the image and then maps it. you also have an option to flatten the image during the snapshot create operation itself rbdsoftmaxclonedepth need to be set to 1 for that.

Could this perhaps be documented more clearly somewhere? It would've saved me an awful lot of time from digging through the lines of code and various pull requests associated with this behaviour.

Yes will update the documentation for the minimum required kernel version to support snapshot and clone

@cjheppell
Copy link

in kubernetes, both snapshot and pvc are the independent objects. this is a new design (v3.x+) to handle that. rbd clone will be created when a user requests kubernetes snapshots.

Quite right, but given I'm using a Ceph driver to fulfil the operations of k8s concept of snapshot/clone I'd still expect the behaviour to represent that documented in Ceph's own snapshot/clone semantics. It appears this is true for kernels 5.1+ on v3.x.x, and it was true for kernels <5.1 on releases v2.1.x but is no longer the case for kernels <5.1 on v3.x.x releases.

My point is that as a user, one of the important features Ceph offers is unavailable to me unless some prerequisites are met, and those prerequisites aren't clear.

Perhaps this behaviour could be also be opt-in? I'm aware that kubernetes presents the relationship between snapshot and pvc as independent, but if I consciously acknowledge that that hidden relationship is present then we could avoid the need for flatten for kernels <5.1 on v3.x.x releases?

Yes will update the documentation for the minimum required kernel version to support snapshot and clone

Many thanks. That will be very helpful.

@stale
Copy link

stale bot commented Jul 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 21, 2021
@github-actions
Copy link

github-actions bot commented Sep 4, 2021

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as completed Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants