-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: protect against concurrent gRPC calls #92
Conversation
for less locking contention, please use a key mutex https://github.com/kubernetes/kubernetes/blob/master/pkg/util/keymutex/keymutex.go |
@rootfs sorry, somehow I missed your comment, hence my late reply. Just to clarify, your concern is probably not strictly lock contention (= many different threads all trying to lock the same mutex for a short period of time) but rather serializing all provisioning operations such that only one volume can be provisioned at a time, right? |
The timeout value in external-provisioner is fairly low. It's not uncommon that it times out and retries before the rbdplugin is done with CreateVolume. rbdplugin has to serialize calls and ensure that they are idempotent to deal with this.
@rootfs I've update the PR with more fine-grained locking. But this opens up again the possibility that some goroutines might conflict each other, for example when accessing the same global data structures. I don't know the code well or your plans around it well enough to do more, so I have created issue #92 as a reminder. |
thanks! |
Kubernetes 1.13 has been released, so we can use that instead of some pre-1.13 master branch. csi-test 0.3.5 no longer depends on the post-0.3 CSI spec, so plain 0.3 is fine now. ceph/ceph-csi#92 has been merged. We can use the upstream release again. Because the 0.3 image tag is following the master branch (ceph/ceph-csi#96), we get the latest features, which includes support for storing persistent data in a config map (ceph/ceph-csi#113). That mode worked whereas storing on the node failed with an error about not being able to create the file (probably because the directory hadn't been created). Instead of trying to fix that, the new feature is used. Provisioning tests were failing because patching the driver name was (no longer?) done correctly.
Because gRPC executes each call in a separate goroutine, some calls my run in parallel. Each entry point must be aware of that and deal with it by protecting shared data structures against concurrent access. For OIM CSI driver and OIM controller, the Kubernetes keymutex is used, similar to e.g. ceph/ceph-csi#92 For OIM registry, the in-memory database itself prevents concurrent access. A different backend (like etcd) might handle that differently.
rbd: protect against concurrent gRPC calls
Sync downstream devel with upstream devel
The timeout value in external-provisioner is fairly low. It's not
uncommon that it times out and retries before the rbdplugin is done
with CreateVolume. rbdplugin has to serialize calls and ensure that
they are idempotent to deal with this.
/cc @sbezverk
See also PR #29