rbd: protect against concurrent gRPC calls #92

pohly · 2018-10-17T12:56:44Z

The timeout value in external-provisioner is fairly low. It's not
uncommon that it times out and retries before the rbdplugin is done
with CreateVolume. rbdplugin has to serialize calls and ensure that
they are idempotent to deal with this.

/cc @sbezverk

See also PR #29

rootfs · 2018-10-17T13:47:18Z

for less locking contention, please use a key mutex https://github.com/kubernetes/kubernetes/blob/master/pkg/util/keymutex/keymutex.go

pohly · 2018-10-26T12:24:33Z

@rootfs sorry, somehow I missed your comment, hence my late reply.

Just to clarify, your concern is probably not strictly lock contention (= many different threads all trying to lock the same mutex for a short period of time) but rather serializing all provisioning operations such that only one volume can be provisioned at a time, right?

The timeout value in external-provisioner is fairly low. It's not uncommon that it times out and retries before the rbdplugin is done with CreateVolume. rbdplugin has to serialize calls and ensure that they are idempotent to deal with this.

pohly · 2018-10-26T13:35:40Z

@rootfs I've update the PR with more fine-grained locking.

But this opens up again the possibility that some goroutines might conflict each other, for example when accessing the same global data structures. I don't know the code well or your plans around it well enough to do more, so I have created issue #92 as a reminder.

rootfs · 2018-10-26T14:05:19Z

thanks!

Kubernetes 1.13 has been released, so we can use that instead of some pre-1.13 master branch. csi-test 0.3.5 no longer depends on the post-0.3 CSI spec, so plain 0.3 is fine now. ceph/ceph-csi#92 has been merged. We can use the upstream release again. Because the 0.3 image tag is following the master branch (ceph/ceph-csi#96), we get the latest features, which includes support for storing persistent data in a config map (ceph/ceph-csi#113). That mode worked whereas storing on the node failed with an error about not being able to create the file (probably because the directory hadn't been created). Instead of trying to fix that, the new feature is used. Provisioning tests were failing because patching the driver name was (no longer?) done correctly.

Because gRPC executes each call in a separate goroutine, some calls my run in parallel. Each entry point must be aware of that and deal with it by protecting shared data structures against concurrent access. For OIM CSI driver and OIM controller, the Kubernetes keymutex is used, similar to e.g. ceph/ceph-csi#92 For OIM registry, the in-memory database itself prevents concurrent access. A different backend (like etcd) might handle that differently.

rbd: protect against concurrent gRPC calls

Sync downstream devel with upstream devel

pohly mentioned this pull request Oct 17, 2018

fixed a race condition in NodePublishVolume/NodeUnpublishVolume #29

Closed

pohly mentioned this pull request Oct 17, 2018

rbd NodeUnpublishVolume: does not detach #86

Closed

pohly mentioned this pull request Oct 26, 2018

protect against concurrent gRPC calls #94

Closed

rbd: protect against concurrent gRPC calls

720ad4a

The timeout value in external-provisioner is fairly low. It's not uncommon that it times out and retries before the rbdplugin is done with CreateVolume. rbdplugin has to serialize calls and ensure that they are idempotent to deal with this.

pohly force-pushed the concurrency branch from 248c59f to 720ad4a Compare October 26, 2018 13:33

rootfs merged commit 47a7b1f into ceph:master Oct 26, 2018

pohly mentioned this pull request Oct 29, 2018

new release #96

Closed

pohly mentioned this pull request Dec 11, 2018

ControllerPublishVolume on Node called twice with same VolumeID intel/pmem-csi#103

Closed

wilmardo pushed a commit to wilmardo/ceph-csi that referenced this pull request Jul 29, 2019

Merge pull request ceph#92 from pohly/concurrency

a4647bc

rbd: protect against concurrent gRPC calls

Rakshith-R referenced this pull request in Rakshith-R/ceph-csi May 26, 2022

Merge pull request #92 from ceph/devel

9d04fba

Sync downstream devel with upstream devel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rbd: protect against concurrent gRPC calls #92

rbd: protect against concurrent gRPC calls #92

pohly commented Oct 17, 2018

rootfs commented Oct 17, 2018

pohly commented Oct 26, 2018

pohly commented Oct 26, 2018

rootfs commented Oct 26, 2018

rbd: protect against concurrent gRPC calls #92

rbd: protect against concurrent gRPC calls #92

Conversation

pohly commented Oct 17, 2018

rootfs commented Oct 17, 2018

pohly commented Oct 26, 2018

pohly commented Oct 26, 2018

rootfs commented Oct 26, 2018