-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: flattern image when it has references to parent #1544
Conversation
/retest all |
Not sure what causes the problem, but there seems to be a component missing in the PV-created-from-snapshot:
|
Yes Neils, and for me, that looked irrelevant to this change. (Or maybe I missing something) |
For me too, but I doubt it is coincidence that the rbd e2e tests fail, whereas cephfs works. |
b0c17f0
to
37b59ad
Compare
37b59ad
to
1bc6f25
Compare
@Mergifyio rebase |
Currently as part of node service we add rbd flatten task for new PVC creates. Ideally we should add flatten task only for snapshots/cloned PVCs as required. Fixes: ceph#1543 Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Command
|
1bc6f25
to
4eeac9b
Compare
/retest all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marking for request changes to get analysis
return transaction, err | ||
} | ||
|
||
if feature && (depth != 0) { | ||
err = volOptions.flattenRbdImage(ctx, cr, true, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pkalever flattenRbdImage
already handles this error can you please check why its not working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Madhu, I remember this is on me, will get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 looks like it is well handled in flattenRbdImage
but as for my understanding it is skipping getCloneDepth
check because we are calling flattenRbdImage
with forceFlatten=true
.
Is there any specific reason why stageTransaction
should call flattenRbdImage
with forceFlatten=true
?
If not we should rather call flattenRbdImage
with forceFlatten=false
, which should fix it all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 here is the change:
diff --git a/internal/rbd/nodeserver.go b/internal/rbd/nodeserver.go
index a9499df95..ed5664aa0 100644
--- a/internal/rbd/nodeserver.go
+++ b/internal/rbd/nodeserver.go
@@ -274,7 +274,7 @@ func (ns *NodeServer) stageTransaction(ctx context.Context, req *csi.NodeStageVo
return transaction, err
}
if feature {
- err = volOptions.flattenRbdImage(ctx, cr, true, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth)
+ err = volOptions.flattenRbdImage(ctx, cr, false, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth)
if err != nil {
return transaction, err
}
(END)
which solves the issue:
[0] pkalever 😎 ceph-cluster✨ kubectl describe pod csi-rbd-demo-pod
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/csi-rbd-demo-pod to minikube
Normal SuccessfulAttachVolume 67s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-4d0d467d-f98c-4a77-ae34-b6720a45add3"
Warning FailedMount 15s (x7 over 55s) kubelet, minikube MountVolume.MountDevice failed for volume "pvc-4d0d467d-f98c-4a77-ae34-b6720a45add3" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 6) occurred while running rbd args: [--id admin -m 192.168.121.120,192.168.121.229 --keyfile=***stripped*** map rbd-pool/csi-vol-a24be4f3-1476-11eb-86e1-0242ac110004 --device-type krbd], rbd error output: rbd: sysfs write failed
rbd: map failed: (6) No such device or address
But I'm unsure if this change causes any regressions, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just calling volOptions.flatten()
in the nodeserver is sufficient. The image can be flattened inline, and not through the TaskManager which flattenRbdImage()
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but ideally the task is scheduled when this image got provisioned (in the controllerserver). If it was not done yet for some reason, doing it inline before mounting might result in more predictable performance during runtime (no flattening while the image is mounted and actively used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only to support the kernels which do not support the flattening feature. we don't want to block the RPC call for flattening the image( as we can face other issues if we block the RPC calls). as @pkalever mentioned the time taken for flattening depends on the size of the data. I will check why the current code is not working and will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic hmm, yes it makes sense up to some extent, especially when the PVC create and attach happens one after the other immediately. But, In case if provisioning of the volume is done much ahead of time before attaching it to the application pod, I still see that adding it as a task is helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as mentioned, +1 for the task considering the RPC timeouts too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 looks like it is well handled in
flattenRbdImage
but as for my understanding it is skippinggetCloneDepth
check because we are callingflattenRbdImage
withforceFlatten=true
.Is there any specific reason why
stageTransaction
should callflattenRbdImage
withforceFlatten=true
?
Yes, we don't want to check any image depth in the node stage Request. we want to flatten all image in node stage request (a special check as been added to discard no parent error). This check is added only to support the kernel's which don't support the deep-flatten image feature.
If not we should rather call
flattenRbdImage
withforceFlatten=false
, which should fix it all.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in a month if no further activity occurs. Thank you for your contributions. |
Currently, as part of node service we add rbd flatten task for new PVC
creates. Ideally, we should add flatten task only for snapshots/cloned
PVCs as required.
Fixes: #1543
Signed-off-by: Prasanna Kumar Kalever [email protected]