Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: flattern image when it has references to parent #1544

Closed
wants to merge 1 commit into from

Conversation

pkalever
Copy link

Currently, as part of node service we add rbd flatten task for new PVC
creates. Ideally, we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: #1543
Signed-off-by: Prasanna Kumar Kalever [email protected]

@pkalever
Copy link
Author

/retest all

@nixpanic nixpanic added the component/rbd Issues related to RBD label Sep 30, 2020
@nixpanic
Copy link
Member

Not sure what causes the problem, but there seems to be a component missing in the PV-created-from-snapshot:

Sep 30 05:26:36.941: INFO: At 2020-09-30 05:16:53 +0100 BST - event for csi-rbd-restore-demo-pod: {kubelet minikube} FailedMount: MountVolume.MountDevice failed for volume "pvc-8b836dd9-8b2d-49ea-91e8-a2fb02a8df83" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 6) occurred while running rbd args: [--id csi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=***stripped*** map replicapool/e2e-ns/csi-vol-ba2559b8-02d3-11eb-994a-0242ac110015 --device-type krbd], rbd error output: rbd: sysfs write failed
rbd: map failed: (6) No such device or address

@pkalever
Copy link
Author

Not sure what causes the problem, but there seems to be a component missing in the PV-created-from-snapshot:

Yes Neils, and for me, that looked irrelevant to this change. (Or maybe I missing something)
I'm yet to look at the other logs though.

@nixpanic
Copy link
Member

for me, that looked irrelevant to this change

For me too, but I doubt it is coincidence that the rbd e2e tests fail, whereas cephfs works.

@nixpanic
Copy link
Member

@Mergifyio rebase

Currently as part of node service we add rbd flatten task for new PVC
creates. Ideally we should add flatten task only for snapshots/cloned
PVCs as required.

Fixes: ceph#1543
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
@mergify
Copy link
Contributor

mergify bot commented Oct 20, 2020

Command rebase: success

Branch has been successfully rebased

@nixpanic
Copy link
Member

/retest all

Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking for request changes to get analysis

return transaction, err
}

if feature && (depth != 0) {
err = volOptions.flattenRbdImage(ctx, cr, true, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkalever flattenRbdImage already handles this error can you please check why its not working?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Madhu, I remember this is on me, will get back.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 looks like it is well handled in flattenRbdImage but as for my understanding it is skipping getCloneDepth check because we are calling flattenRbdImage with forceFlatten=true.

Is there any specific reason why stageTransaction should call flattenRbdImage with forceFlatten=true ?

If not we should rather call flattenRbdImage with forceFlatten=false, which should fix it all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 here is the change:

diff --git a/internal/rbd/nodeserver.go b/internal/rbd/nodeserver.go
index a9499df95..ed5664aa0 100644
--- a/internal/rbd/nodeserver.go
+++ b/internal/rbd/nodeserver.go
@@ -274,7 +274,7 @@ func (ns *NodeServer) stageTransaction(ctx context.Context, req *csi.NodeStageVo
                                return transaction, err
                        }
                        if feature {
-                               err = volOptions.flattenRbdImage(ctx, cr, true, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth)
+                               err = volOptions.flattenRbdImage(ctx, cr, false, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth)
                                if err != nil {
                                        return transaction, err
                                }
(END)

which solves the issue:

[0] pkalever 😎 ceph-cluster✨ kubectl describe pod csi-rbd-demo-pod
[...]
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               67s                default-scheduler        Successfully assigned default/csi-rbd-demo-pod to minikube
  Normal   SuccessfulAttachVolume  67s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-4d0d467d-f98c-4a77-ae34-b6720a45add3"
  Warning  FailedMount             15s (x7 over 55s)  kubelet, minikube        MountVolume.MountDevice failed for volume "pvc-4d0d467d-f98c-4a77-ae34-b6720a45add3" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 6) occurred while running rbd args: [--id admin -m 192.168.121.120,192.168.121.229 --keyfile=***stripped*** map rbd-pool/csi-vol-a24be4f3-1476-11eb-86e1-0242ac110004 --device-type krbd], rbd error output: rbd: sysfs write failed
rbd: map failed: (6) No such device or address

But I'm unsure if this change causes any regressions, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just calling volOptions.flatten() in the nodeserver is sufficient. The image can be flattened inline, and not through the TaskManager which flattenRbdImage() does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but ideally the task is scheduled when this image got provisioned (in the controllerserver). If it was not done yet for some reason, doing it inline before mounting might result in more predictable performance during runtime (no flattening while the image is mounted and actively used).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only to support the kernels which do not support the flattening feature. we don't want to block the RPC call for flattening the image( as we can face other issues if we block the RPC calls). as @pkalever mentioned the time taken for flattening depends on the size of the data. I will check why the current code is not working and will update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic hmm, yes it makes sense up to some extent, especially when the PVC create and attach happens one after the other immediately. But, In case if provisioning of the volume is done much ahead of time before attaching it to the application pod, I still see that adding it as a task is helpful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as mentioned, +1 for the task considering the RPC timeouts too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 looks like it is well handled in flattenRbdImage but as for my understanding it is skipping getCloneDepth check because we are calling flattenRbdImage with forceFlatten=true.

Is there any specific reason why stageTransaction should call flattenRbdImage with forceFlatten=true ?

Yes, we don't want to check any image depth in the node stage Request. we want to flatten all image in node stage request (a special check as been added to discard no parent error). This check is added only to support the kernel's which don't support the deep-flatten image feature.

If not we should rather call flattenRbdImage with forceFlatten=false, which should fix it all.

@stale
Copy link

stale bot commented Jan 24, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in a month if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jan 24, 2021
Base automatically changed from master to devel March 1, 2021 05:22
@stale stale bot removed the wontfix This will not be worked on label Mar 1, 2021
@mergify mergify bot closed this in #2045 May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rbd node service: flattern image when it has references to parent
3 participants