Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove child clone volume reference from the parent volume #425

Closed
wants to merge 1 commit into from

Conversation

Madhu-1
Copy link
Collaborator

@Madhu-1 Madhu-1 commented Jun 12, 2019

Describe what this PR does

when a Cloned images retain a reference to the parent snapshot then
we cannot delete the volume. flattening will remove the reference
so that volume can be deleted even if it is cloned from a snapshot

Added a check in delete volume, if the volume contains any snapshot
we are not allowing the volume deletion, prior to volume delete
all its associated snapshots should be deleted first.

Signed-off-by: Madhu Rajanna [email protected]

Related issues

closes #70, closes #227

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jun 12, 2019

don't merge this PR till I add snapshot e2e for the current code

@humblec
Copy link
Collaborator

humblec commented Jun 12, 2019

@Madhu-1 quick review comments from me, PTAL.

}

// remove the reference from volume and snapshot
// this will occuer if flatten image failed during volume clone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ..

@@ -550,6 +550,25 @@ func createSnapshot(pOpts *rbdSnapshot, adminID string, credentials map[string]s
return nil
}

// flattenImage remove the reference from the child clone to the parent snapshot
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 its also good to refer the documentation here in the source code comment.

@@ -210,6 +211,12 @@ func (cs *ControllerServer) checkSnapshot(req *csi.CreateVolumeRequest, rbdVol *
if err != nil {
return status.Error(codes.Internal, err.Error())
}

err = flattenImage(rbdSnap.Pool, rbdSnap.RbdImageName, rbdVol.Monitors, rbdVol.AdminID, req.GetSecrets())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 flattenImage can take rbdSnap struct to make better argument handling. Isnt it ?

when a Cloned images retain a reference to the parent snapshot then
we cannot delete the volume. flattening will remove the reference
so that volume can be deleted even if it is cloned from a snapshot

Added a check in delete volume, if the volume contains any snapshot
we are not allowing the volume deletion, prior to volume delete
all its associated snapshots should be deleted first.

Signed-off-by: Madhu Rajanna <[email protected]>
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jun 12, 2019

@humblec Thanks for the quick review. addressed review comments PTAL


output, err = execCommand("rbd", args)

if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be expected to fail if the image doesn't have a parent. Additionally, if the image has snapshots and does not have the 'deep-flatten' feature enabled, the snapshots themselves will still be linked to the parent image.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Thanks for the review,
do I need to explicitly enable deep-flatten feature during clone and create or is it enabled by default

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot enable it on existing images, it needs to have been enabled when the image was created. Support for this in krbd wasn't added until v5.1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman don't we need to enable this feature during clone (cloned image can be used later to snapshot and clone a PVC from it)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 negative -- deep-flatten isn't required for cloning, layering is the required feature.

Thinking holistically, we added the "rbd trash mv" tool to support cases where you cannot delete a parent image due to linked clones. In the Nautilus release, this is even automated for you via the rbd_move_parent_to_trash_on_remove config option (requires Mimic-or-later cluster). We are also adding automated background operation handling [1].

From an operator's point-of-view, we should never really expose the RBD implementation details for how snapshots and volume from snapshots (RBD clones) work on the backend. If you attempt to delete an image that has linked clones, move it to the trash for later cleanup. If you attempt to delete a snapshot with a linked clone, w/ clone v2 it just works. To support clone v1 (and v2), perhaps k8s snapshots could create a new RBD cloned image (and then it follows the previous rule about 'move-to-trash' upon delete).

[1] https://trello.com/c/JO5BPkRG

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jun 14, 2019

@dillaman can you confirm below-listed steps are correct to clone an image with a different version of ceph

steps to create rbd clone

ceph versions less than Mimic

[1]create an image with the `deep-flattern` option
[2]create a snapshot
[3] protect snapshot
[4]clone a new image from snapshot
[5]flattern newly  created image
[6]delete the newly created image[5] (this will be independent step can be one at last  also)
[7]unprotect snapshot
[8]delete snapshot 
[8]delete image  create in  step[1](this image cannot be deleted if it has snapshots)

ceph version Mimic onwards

[1]create an image with the `deep-flattern` option
[2]create a snapshot
[3]clone a new image from snapshot
[4]flattern newly cloned image
[5]delete the cloned image[3]  (this will be independent step can be one at last  also)
[6]delete image created in [1] with `rbd rm --rbd_move_to_trash_on_remove=true`
[7]delete snapshot ([6] and [7] may interchange)
The image moved to trash will be automatically deleted once the snapshot is deleted?

@humblec @j-griffith @ShyamsundarR let me know your thoughts on below query

  • what versions of ceph we should support with csi?
  • if version less than Mimic do we need to handle cloning differently
    • use clone v1 for ceph version less than Mimic
    • use clone v2 for Mimic later

based on the answer I will update this patch (this patch will help to finish smart cloning work)

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jun 18, 2019

@dillaman @humblec @ShyamsundarR @j-griffith can I get an answer to the above questions?

@dillaman
Copy link

@Madhu-1 The issue w/ deep-flatten + k8s is that by default and design, the krbd kernel driver is used. This driver is not tied to a Ceph release and doesn't support deep-flatten until v5.1.

Since deep-flatten is only required once a snapshot is created on a cloned image, perhaps you can just defer the flatten operation until the first snapshot is created on the cloned image (since it may never occur for most images and therefore saves disk space) and/or run flatten immediately after cloning if deep-flatten isn't enabled.

One important note is that if the exclusive-lock feature is enabled, you cannot run the flatten operation when krbd is using the image, so it would need to be run immediately after the clone operation.

@j-griffith
Copy link

j-griffith commented Jun 18, 2019

@Madhu-1

what versions of ceph we should support with csi?

I don't have enough visibility in to the field to know the best answer here. My thought was that we would not have a limitation on ceph version in terms of csi support, but provide different capabilities based on version. Keeping in mind that cloning is an option capability.

if version less than Mimic do we need to handle cloning differently
    use clone v1 for ceph version less than Mimic

Or don't support cloning in the capabilities when the version is < Mimic? I don't know the implications of this in terms of customers in production with Kube; but given that cloning is alpha in 1.15 it's pretty "new" so setting a cut off doesn't seem heavy handed to me.

    use clone v2 for Mimic later

Same note as above, it's up to you though if you want to have two implementations and select based on ceph version that's great; so long as the implementations are clean and easily separated. I would hate to see too much overhead or maintenance cost by trying to support both.

It seems like "upgrade to mimic if you want cloning" might be reasonable. I do have some concerns around this on the Rook side of things but it should be feasible.

@humblec
Copy link
Collaborator

humblec commented Jun 20, 2019

what versions of ceph we should support with csi?
if version less than Mimic do we need to handle cloning differently
use clone v1 for ceph version less than Mimic
use clone v2 for Mimic later

@Madhu-1 >v1.0 onwards we will be supporting nautilus release. Considering cloning is net new addition in upstream its good enough to declare we support this feature ONLY for nautlius.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Dec 6, 2019

closing this one, as flatten will be taken care in volume cloning PR #693

@Madhu-1 Madhu-1 closed this Dec 6, 2019
Nikhil-Ladha pushed a commit to Nikhil-Ladha/ceph-csi that referenced this pull request Nov 26, 2024
DFBUGS-906: Prevent dataloss due to the concurrent RPC calls (occurrence is very low)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNM DO NOT MERGE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unremoved RBD volume after removing PVC with snapshot [csi-rbd] detach cloned images from snapshots
4 participants