Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for standalone snapshot creation #467

Closed
wants to merge 1 commit into from

Conversation

Madhu-1
Copy link
Collaborator

@Madhu-1 Madhu-1 commented Jul 12, 2019

Describe what this PR does

with the current implementation in ceph-csi, it's not possible to
delete the cloned volume if the snapshot is present
due to the child linking, to remove this dependency
we had a discussion and come up with an idea to separate
out the clone and snapshot, so that we can delete the
snapshot and cloned image in any order.

The steps followed to create an independent snapshot as follows

Create a snapshot

  • Create a temporary snapshot from the parent volume
  • Clone a new image from a temporary snapshot with options
    --rbd-default-clone-format 2 --image-feature layering,deep-flatten
  • Deletetemprary snapshot created
  • Create a snapshot with requested Name

Create new PVC from a snapshot

  • Clone a new image from the snapshot with user-provided options
  • Check the depth of the image as the maximum number of nested volume
    clones can be (max 16 can be changed based on the configuration)
    if the depth is reached flatten the newly cloned image

Delete cloned PVC

  • Delete the cloned image (earlier we were removing the image with rbd rm
    command with the new design we will be moving the images to the trash)
    same applies for normal volume deletion also

Delete snapshot

  • Delete the temporary cloned image which was created for a snapshot
  • Delete the snapshot

example commands:-

1) rbd snap create <RBD image for src k8s volume>@<random snap name>
2) rbd clone --rbd-default-clone-format 2 --image-feature
layering,deep-flatten <RBD image for src  k8s volume>@<random snap <RBD image for temporary snap image>
3) rbd snap rm <RBD image for src k8s volume>@<random snap name>
4) rbd snap create <RBD image for temporary snap image>@<random snap name> <k8s snap name>
5) rbd clone --rbd-default-clone-format 2 --image-feature <k8s dst vol config>
<RBD image for temporary snap image>@<random snap name> <RBD image for k8s dst vol>

Signed-off-by: Madhu Rajanna [email protected]

Is there anything that requires special attention

Do you have any questions?

Is the change backward compatible?
No backward compatibility with previously created snapshots

Fixes: #issue_number

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

@dillaman @ShyamsundarR @humblec PTAL

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jul 12, 2019

@dillaman am facing one issue while mounting the cloned PVC
logs

E0712 05:14:52.497014    7792 utils.go:109] GRPC error: rbd: map failed exit status 6, rbd output: 2019-07-12 05:14:52.390 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: sysfs write failed
2019-07-12 05:14:52.432 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-07-12 05:14:52.433 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-07-12 05:14:52.433 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-07-12 05:14:52.439 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-07-12 05:14:52.439 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-07-12 05:14:52.439 7f463497cb00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable replicapool/csi-vol-d8f76c11-a463-11e9-8a0c-c2953b9dc10a".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

rbd volume Info

sh-4.2# rbd ls --pool=replicapool
csi-vol-c2af1e5a-a463-11e9-8a0c-c2953b9dc10a (Parent volume)
csi-vol-cba8d8ce-a463-11e9-8a0c-c2953b9dc10a (Temp clone volume)
csi-vol-d8f76c11-a463-11e9-8a0c-c2953b9dc10a ( K8s dest volume)
sh-4.2# rbd info csi-vol-c2af1e5a-a463-11e9-8a0c-c2953b9dc10a --pool=replicapool
rbd image 'csi-vol-c2af1e5a-a463-11e9-8a0c-c2953b9dc10a':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: 39855c9906c
	block_name_prefix: rbd_data.39855c9906c
	format: 2
	features: layering, operations
	op_features: clone-parent, snap-trash
	flags: 
	create_timestamp: Fri Jul 12 05:13:22 2019
	access_timestamp: Fri Jul 12 05:13:22 2019
	modify_timestamp: Fri Jul 12 05:13:22 2019


sh-4.2# rbd info csi-vol-cba8d8ce-a463-11e9-8a0c-c2953b9dc10a --pool=replicapool
rbd image 'csi-vol-cba8d8ce-a463-11e9-8a0c-c2953b9dc10a':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: 39d056ad63b2
	block_name_prefix: rbd_data.39d056ad63b2
	format: 2
	features: layering, deep-flatten, operations
	op_features: clone-parent, clone-child
	flags: 
	create_timestamp: Fri Jul 12 05:13:34 2019
	access_timestamp: Fri Jul 12 05:13:34 2019
	modify_timestamp: Fri Jul 12 05:13:34 2019
	parent: replicapool/csi-vol-c2af1e5a-a463-11e9-8a0c-c2953b9dc10a@219fc232-148d-43ff-bdf5-d970a6d464bf
	overlap: 1 GiB
sh-4.2# rbd info csi-vol-d8f76c11-a463-11e9-8a0c-c2953b9dc10a --pool=replicapool
rbd image 'csi-vol-d8f76c11-a463-11e9-8a0c-c2953b9dc10a':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 3a36eb833854
	block_name_prefix: rbd_data.3a36eb833854
	format: 2
	features: layering, operations
	op_features: clone-child
	flags: 
	create_timestamp: Fri Jul 12 05:13:57 2019
	access_timestamp: Fri Jul 12 05:13:57 2019
	modify_timestamp: Fri Jul 12 05:13:57 2019
	parent: replicapool/csi-vol-cba8d8ce-a463-11e9-8a0c-c2953b9dc10a@csi-snap-ca6c7cf3-a463-11e9-8a0c-c2953b9dc10a
	overlap: 1 GiB

@dillaman
Copy link

dillaman commented Jul 16, 2019

@Madhu-1 sorry for the delay, I didn't see the github notification email.

am facing one issue while mounting the cloned PVC logs

What kernel are you using? v4.16 (and later) should ignore the "operations" feature flag since it's a no-op for krbd. RHEL 8 should be based on at least v4.18 (I believe).

torvalds/linux@e573427

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jul 16, 2019

@dillaman am using CentOS Linux release 7.6.1810 (Core) kernel version is 3.10.0-957.21.3.el7.x86_64

also, need to check what Travis CI is using

@dillaman
Copy link

@Madhu-1 OK, so in addition to the Ceph release restriction for PV cloning (and standalone snapshot creation as the intermediary step), we need to have a semi-recent kernel (like RHEL 8 or CentOS 7 w/ kernel-ml)

@dillaman
Copy link

@Madhu-1 Looks like travis would be using Xenial [1]. Bionic should be on the v4.15 kernel and that feature was backported to v4.15 [2], so it might be worth a test.

[1] https://github.com/ceph/ceph-csi/blob/master/.travis.yml#L4
[2] https://lwn.net/Articles/747916/

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jul 19, 2019

@dillaman Thanks for the hint, I have tested it locally it's working with 5.2.1 kernel centos (i will test with Travis with bionic )

@Madhu-1 Madhu-1 force-pushed the stand-snap branch 2 times, most recently from acb5f43 to 813e0af Compare July 22, 2019 04:54
with the current implementation in ceph-csi, it's not possible to
delete the cloned volume if the snapshot is present
due to the child linking, to remove this dependency
we had a discussion and come up with an idea to separate
out the clone and snapshot, so that we can delete the
snapshot and cloned image in any order.

the steps followed to create an independent snapshot as follows

* Create  a temporary snapshot from the parent volume
* Clone a new image from a temporary snapshot with options
`--rbd-default-clone-format 2 --image-feature layering,deep-flatten`
* Deletetemprary snapshot created
* Create a snapshot with requested Name

* Clone a new image from the snapshot with user-provided options
* Check the depth of the image as the maximum number of nested volume
clones can be (max 16 can be changed based on the configuration)
if the depth is reached flatten the newly cloned image

* Delete the cloned image (earlier we were removing the image with `rbd rm`
command with the  new design we will be moving the images to the trash)
same applies for normal volume deletion also

* Delete the temporary cloned image which was created for a snapshot
* Delete the snapshot

example commands:-
```
1) rbd snap create <RBD image for src k8s volume>@<random snap name>
2) rbd clone --rbd-default-clone-format 2 --image-feature
layering,deep-flatten <RBD image for src  k8s volume>@<random snap <RBD image for temporary snap image>
3) rbd snap rm <RBD image for src k8s volume>@<random snap name>
4) rbd snap create <RBD image for temporary snap image>@<random snap name> <k8s snap name>
5) rbd clone --rbd-default-clone-format 2 --image-feature <k8s dst vol config>
<RBD image for temporary snap image>@<random snap name> <RBD image for k8s dst vol>
```

Signed-off-by: Madhu Rajanna <[email protected]>
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jul 29, 2019

up for initial review

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: these changes will require Mimim+ clusters. I am 100% fine w/ requiring that, but I wanted to double-check if that was the intent or if backward compatibility with creating old-style snapshots needs to be maintained?

@@ -47,7 +47,8 @@ var (
metadataStorage = flag.String("metadatastorage", "", "metadata persistence method [node|k8s_configmap]")

// rbd related flags
containerized = flag.Bool("containerized", true, "whether run as containerized")
containerized = flag.Bool("containerized", true, "whether run as containerized")
rbdMaxCloneDepth = flag.Uint("rbdmaximumclonedepth", 16, "Maximum number of nested volume clones that are taken before a flatten occurs")
Copy link

@dillaman dillaman Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: while 16 is technically the maximum, I think it should be set much lower before a flatten is required since a deeper depth will impact IO speeds. For example, Cinder uses a max depth of 5 [1] so perhaps copy that as a starting point.

Additionally, in the future it might be nice to have a soft vs hard depth limit (i.e. when you hit the soft-depth, it doesn't block PV/snapshot creation but it kicks off a background ceph rbd task add flatten task whereas the hard limit does block until the flatten is complete). In that case, can we name this something like rbdhardmaxclonedepth)?

[1] https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L83

@@ -88,6 +88,7 @@ spec:
- "--v=5"
- "--drivername=$(DRIVER_NAME)"
- "--containerized=true"
- "--rbdmaximumclonedepth=14"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: leave it at its default and don't override?

@@ -85,6 +85,7 @@ spec:
- "--v=5"
- "--drivername=rbd.csi.ceph.com"
- "--containerized=true"
- "--rbdmaximumclonedepth=14"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: same comment here

this will help us storing the snap info on rados and also helps in cleaning the
stale snaps
*/
func (cj *CSIJournal) GetTmpNamePrefix(sufix string, first, isSnap bool) string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: suffix instead of sufix

this will help us storing the snap info on rados and also helps in cleaning the
stale snaps
*/
func (cj *CSIJournal) GetTmpNamePrefix(sufix string, first, isSnap bool) string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to split into two functions -- one for volumes and the other for snaps. Having to pass the isSnap param just to pick an entirely different logic path seems odd to me.

return status.Error(codes.Internal, err.Error())
}

volErr = restoreSnapshot(rbdVol, rbdSnap, cr)
Copy link

@dillaman dillaman Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we rename restoreSnapshot to clarify that it's actually just performing an "rbd clone"? i.e. cloneSnapshot or cloneImage?

err = cs.doSnapshot(rbdSnap, cr)
// generate temp snap struct and vol struct to create a temprory snapshot
// and to create a new volume which will be used to take new snapshot
tmpSnap := &rbdSnapshot{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just my opinion, but it's confusing that rbdSnapshot and rbdVolume are being re-used to describe internal RBD images / snapshots that don't have associated CSI snapshots and volumes. It might be nice to re-org these structures to a parent struct that is only the RBD data and a derived class that adds CSI request related fields.

"failed to delete snapshot: %s/%s with error: %v",
rbdSnap.Pool, rbdSnap.RbdSnapName, err)
}

if found {
// TODO need to delete stale volumes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are stale volumes?

klog.V(4).Infof("rbd: rm %s using mon %s, pool %s", image, pOpts.Monitors, pOpts.Pool)
args := []string{"rm", image, "--pool", pOpts.Pool, "--id", cr.ID, "-m", pOpts.Monitors,
klog.V(4).Infof("rbd: trash mv %s using mon %s, pool %s", image, pOpts.Monitors, pOpts.Pool)
args := []string{"trash", "mv", image, "--pool", pOpts.Pool, "--id", cr.ID, "-m", pOpts.Monitors,
Copy link

@dillaman dillaman Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be combined w/ the ceph rbd task add trash remove call otherwise we will just be leaking images to the trash.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman as you know currently we are doing rbd task add remove to delete the image from backend

as with this implementation do I need to replace rbd task add remove with rbd trash mv and rbd task add trash remove to delete image from backend?

if the user does not have the manager permission is rbd trash mv enough?

Copy link

@dillaman dillaman Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a perfect world if ceph rbd task add trash remove fails the CSI should rollback and fail the CO request. Otherwise, you are just leaking images into the trash and if folks don't know to go looking in the trash, they will just cause a hidden problem down the road.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Thanks for your input

will keep this in mind, so that I can replace rbd task add remove with rbd trash mv and rbd task add trash remove

but what we have to do if the user doesn't have permission to execute execute rbd task add trash remove do i need to fallback to use rbd trash rm?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to delete the clone and run rbd trash rm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response @dillaman

if err != nil {
return nil, err
}
err = updateReservedSnap(rbdSnap, cr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to update the metadata? Shouldn't we know all the finalized names before we start the process?

@humblec
Copy link
Collaborator

humblec commented Mar 3, 2020

@Madhu-1 can you revisit this PR ?

@nixpanic nixpanic added this to the release-3.0.0 milestone Apr 9, 2020
@mergify
Copy link
Contributor

mergify bot commented May 21, 2020

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this pull request Feb 17, 2025
Syncing latest changes from devel for ceph-csi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNM DO NOT MERGE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants