Clarify that plugin may return OK for ControllerUnpublish if node or volume not found #375

davidz627 · 2019-08-07T21:42:13Z

This clarifies the plugin is allowed to return OK if because of a missing node or volume the plugin is sure that the volume is detached from that node.

Context: kubernetes-csi/external-attacher#165

Fixes #373

Discussion at CSI Community Meeting Notes here: https://docs.google.com/document/d/1-oiNg5V_GtS_JBAEViVBhZ3BYVFlbSz70hreyaD7c5Y/edit#heading=h.z24362cngqjs

/assign @saad-ali @jdef @julian-hj @jieyu
/cc @gnufied @jsafrane @msau42

Clarifies the plugin is allowed to return `OK` for `ControllerUnpublishResponse` if the specified node is or volume no longer exist and the plugin is sure that the volume is detached from that node.

spec.md

saad-ali · 2019-08-07T21:48:18Z

CC @julian-hj @ddebroy

spec.md

…volume not found

davidz627 · 2019-08-13T19:03:47Z

@saad-ali @jieyu @julian-hj resolved all comments, PTAL

davidz627 · 2019-08-14T21:53:52Z

@jdef added a release note to the PR description

saad-ali

Outside of the comment below (which can be addressed in follow up PR), LGTM.

saad-ali · 2019-08-14T22:48:24Z

spec.md

-| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
-| Node does not exist | 5 NOT_FOUND | Indicates that a node corresponding to the specified `node_id` does not exist. | Caller MUST verify that the `node_id` is correct and that the node is available and has not been terminated or deleted before retrying with exponential backoff. |
+| Volume does not exist and volume not assumed ControllerUnpublished from node | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist and is not assumed to be ControllerUnpublished from node corresponding to the specified `node_id`. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
+| Node does not exist and volume not assumed ControllerUnpublished from node  | 5 NOT_FOUND | Indicates that a node corresponding to the specified `node_id` does not exist and the volume corresponding to the specified `volume_id` is not assumed to be ControllerUnpublished from node. | Caller MUST verify that the `node_id` is correct and that the node is available and has not been terminated or deleted before retrying with exponential backoff. |


Part of the reason for this change is that the current Kubernetes handling of this is wrong to accommodate that. However, even after this change, as specified,the plugin SHOULD return '0 OK' means SP could return NOT_FOUND and per the error code Recovery Behavior the Caller MUST verify that the '..._id' is correct... before retrying with exponential backoff -- which caller can not do because it has no way to be able to verify if a given node or volume still exist. Therefore, we should either loosen the Recovery Behavior to Caller SHOULD verify..., but really this is something that we will need to fix in CSI 2.0 -- NOT_FOUND should not be allowed as an error code, and the SP MUST return OK if volume or node are gone and effectively detached.

saad-ali · 2019-08-14T22:51:42Z

Merging

jdef · 2019-08-15T03:11:34Z

Should we file an issue to track the suggested 2.0 change?

…

On Wed, Aug 14, 2019, 6:48 PM Saad Ali ***@***.***> wrote: ***@***.**** approved this pull request. Outside of the comment below (which can be addressed in follow up PR), LGTM. ------------------------------ In spec.md <#375 (comment)> : > @@ -1292,8 +1293,8 @@ The CO MUST implement the specified error recovery behavior when it encounters t | Condition | gRPC Code | Description | Recovery Behavior | |-----------|-----------|-------------|-------------------| -| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | -| Node does not exist | 5 NOT_FOUND | Indicates that a node corresponding to the specified `node_id` does not exist. | Caller MUST verify that the `node_id` is correct and that the node is available and has not been terminated or deleted before retrying with exponential backoff. | +| Volume does not exist and volume not assumed ControllerUnpublished from node | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist and is not assumed to be ControllerUnpublished from node corresponding to the specified `node_id`. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | +| Node does not exist and volume not assumed ControllerUnpublished from node | 5 NOT_FOUND | Indicates that a node corresponding to the specified `node_id` does not exist and the volume corresponding to the specified `volume_id` is not assumed to be ControllerUnpublished from node. | Caller MUST verify that the `node_id` is correct and that the node is available and has not been terminated or deleted before retrying with exponential backoff. | Part of the reason for this change is that the current Kubernetes handling of this is wrong to accommodate that. However, even after this change, as specified,the plugin SHOULD return '0 OK' means SP could return NOT_FOUND and per the error code Recovery Behavior the Caller MUST verify that the '..._id' is correct... before retrying with exponential backoff -- which caller can not do because it has no way to be able to verify if a given node or volume still exist. Therefore, we should either loosen the Recovery Behavior to Caller SHOULD verify..., but really this is something that we will need to fix in CSI 2.0 -- NOT_FOUND should not be allowed as an error code, and the SP MUST return OK if volume or node are gone and effectively detached. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#375?email_source=notifications&email_token=AAR5KLDQAWOBIPDXNXX6FFDQESDVPA5CNFSM4IKEQER2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCBTRIOA#pullrequestreview-275190840>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAR5KLBWRCAONBFVP34ELVDQESDVPANCNFSM4IKEQERQ> .

davidz627 · 2019-08-15T16:55:44Z

@jdef #382

davidz627 force-pushed the fix/unpublish branch from bc156b2 to 5cf78c1 Compare August 7, 2019 21:43

saad-ali reviewed Aug 7, 2019

View reviewed changes

spec.md Outdated Show resolved Hide resolved

saad-ali assigned jdef Aug 7, 2019

davidz627 force-pushed the fix/unpublish branch from 5cf78c1 to e9147b7 Compare August 7, 2019 21:51

jieyu reviewed Aug 7, 2019

View reviewed changes

spec.md Outdated Show resolved Hide resolved

spec.md Outdated Show resolved Hide resolved

davidz627 force-pushed the fix/unpublish branch 2 times, most recently from 707b766 to 11e3af6 Compare August 8, 2019 18:23

davidz627 mentioned this pull request Aug 8, 2019

Travis CI failing with undefined proto errors for trivial text change PR's #377

Closed

misterikkit reviewed Aug 8, 2019

View reviewed changes

spec.md Outdated Show resolved Hide resolved

spec.md Outdated Show resolved Hide resolved

Clarify that plugin may return OK for ControllerUnpublish if node or …

938e4ca

…volume not found

davidz627 force-pushed the fix/unpublish branch from 11e3af6 to 938e4ca Compare August 13, 2019 19:00

jieyu approved these changes Aug 14, 2019

View reviewed changes

jdef added the needs-release-note label Aug 14, 2019

jdef approved these changes Aug 14, 2019

View reviewed changes

saad-ali approved these changes Aug 14, 2019

View reviewed changes

saad-ali merged commit 375efea into container-storage-interface:master Aug 14, 2019

davidz627 deleted the fix/unpublish branch August 14, 2019 23:01

davidz627 mentioned this pull request Aug 15, 2019

Change recovery behavior of NOT_FOUND in ControllerUnpublishVolume to a SHOULD #383

Merged

davidz627 mentioned this pull request Aug 15, 2019

ControllerUnpublishVolume should require "OK" when node/volume unpublished and get rid of "recoverable" NOT FOUND errors #382

Open

msau42 mentioned this pull request Apr 24, 2020

Remove missing volume test case for NodeUnpublishVolume kubernetes-csi/csi-test#258

Merged

jsafrane mentioned this pull request May 5, 2020

CSI ephemeral volumes: idempotent NodeUnpublishVolume kubernetes/kubernetes#90752

Open

timoreimann mentioned this pull request May 20, 2020

Allow plugin to return OK if NodeUnpublishVolume cannot find a volume #433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify that plugin may return OK for ControllerUnpublish if node or volume not found #375

Clarify that plugin may return OK for ControllerUnpublish if node or volume not found #375

davidz627 commented Aug 7, 2019 •

edited by saad-ali

Loading

saad-ali commented Aug 7, 2019

davidz627 commented Aug 13, 2019

davidz627 commented Aug 14, 2019

saad-ali left a comment

saad-ali Aug 14, 2019

saad-ali commented Aug 14, 2019

jdef commented Aug 15, 2019 via email

davidz627 commented Aug 15, 2019

Clarify that plugin may return OK for ControllerUnpublish if node or volume not found #375

Clarify that plugin may return OK for ControllerUnpublish if node or volume not found #375

Conversation

davidz627 commented Aug 7, 2019 • edited by saad-ali Loading

saad-ali commented Aug 7, 2019

davidz627 commented Aug 13, 2019

davidz627 commented Aug 14, 2019

saad-ali left a comment

Choose a reason for hiding this comment

saad-ali Aug 14, 2019

Choose a reason for hiding this comment

saad-ali commented Aug 14, 2019

jdef commented Aug 15, 2019 via email

davidz627 commented Aug 15, 2019

davidz627 commented Aug 7, 2019 •

edited by saad-ali

Loading