-
Notifications
You must be signed in to change notification settings - Fork 26
Conversation
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
# Conflicts: # resources/resources.go
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
Signed-off-by: 27149chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Thanks for updating the description of the PR with the original order vs the new one.
About the order: can you explain why you need to run "Rescan iscsi session if it is a iscsi host" as the first step of the unmount order? -
Is there any new idempotent cases by the new order?
-
you mentioned that no rescan -r in the unmount. but do we still use rescan in the mount phase? then i wonder what happned if create pod with PV1 happened in the same time of delete PV2 in the same host. is there any new risk here?
-
is this branch passed staging on A9000 and SVC?
Reviewable status: 0 of 17 files reviewed, 14 unresolved discussions (waiting on @27149chen and @shay-berman)
remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):
iscsiConnector = connectors.NewISCSIConnectorWithExecutor(executor) } else { fcConnector = conns[0]
not sure i like this setting
how u ensure that [0] is fc and 1 is iscsi? can u check it by the object it self? and what if you get only 1 conns? you may get error on run time.
r u sure its the right thing to do?
remote/mounter/block_device_utils/rescan.go, line 30 at r2 (raw file):
const rescanIscsiTimeout = 1 * 60 * 1000 const rescanScsiTimeout = 2 * 60 * 1000 const ISCSIADM = "iscsiadm"
try to keep camel case, since capital is not standard in go even for const.
remote/mounter/block_device_utils/rescan.go, line 55 at r2 (raw file):
return b.CleanupSCSIDevices(volumeMountProperties) case ISCSI: return b.CleanupISCSIDevices(volumeMountProperties)
why u need the volumemountproperties here?
remote/mounter/initiator/linuxscsi.go, line 15 at r2 (raw file):
const multipathCmd = "multipath" const FlushTimeout = 10 * 1000 const FlushRetries = 3
if u can please state why 3 time was choosen.
also please add in the PR description that you decided to increase the rescan (and if u can please say why)
remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):
func newFibreChannelConnectorWithExecutorAndLogger(executor utils.Executor, logger logs.Logger) *fibreChannelConnector { initi := initiator.NewLinuxFibreChannelWithExecutor(executor) return &fibreChannelConnector{&scsiConnector{logger: logger, exec: executor, initi: initi}}
what is initi?
could you please provide self-descriptive name for the arg and also for the key in the scsiConnector. maybe connector object? not sure from the code.
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
// ConnectVolume attach the volume to host by rescaning all the active FC HBAs. func (c *fibreChannelConnector) ConnectVolume(volumeMountProperties *resources.VolumeMountProperties) error { hbas := c.initi.GetHBAs()
is there any difference to get the HBA if u r scsi vs iscsi?
remote/mounter/initiator/connectors/iscsi.go, line 11 at r2 (raw file):
type iscsiConnector struct { *scsiConnector
its a bit confusing that the struct is iscsiconnector and the attribute inside is scsi without I.
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
It("should call multipath and remove all the scsi devices", func() { err := iscsiConnector.DisconnectVolume(volumeMountProperties)
- a bit confusing the title is scsi and then u use iscsi. is there any differance between them?
- in the PR you mentioned u just to iscsi rescan before unmount flow - is this the only change regarding to iscsi?
- u test here only that the physical devices removed, but what about the mpath device? do u need to check it as well?
remote/mounter/initiator/connectors/scsi.go, line 27 at r2 (raw file):
_, devMapper, devNames, err := utils.GetMultipathOutputAndDeviceMapperAndDevice(volumeMountProperties.WWN, c.exec) if err != nil { return c.logger.ErrorRet(err, "Failed to get multipath output before disconnecting volume")
is there any idempotent case here that should be handled? not sure, just asking.
remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):
} if devMapper == "" { // The device is already removed
so we actually assume that if no mpath found , then there is no physical devices for it, right?
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
// flush multipath device c.logger.Info("Flush multipath device", logs.Args{{"name", devMapper}}) c.initi.FlushMultipath(devMapper)
please handle error if such happned inside the function
remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):
err = c.removeDevices(devices) if err != nil { return c.logger.ErrorRet(err, "Failed to remove devices")
so actually if it fails here for some reason it will never try to remove it again, because u already removed the mpath device, right?
remote/mounter/initiator/connectors/scsi.go, line 51 at r2 (raw file):
// If flushing the multipath failed before, try now after we have removed the devices. c.logger.Info("Flush multipath device again after removing the devices", logs.Args{{"name", devMapper}}) c.initi.FlushMultipath(devMapper)
- please handle errors
- and maybe trigger it only if it was failed few lines before
- any specific reason why you trigger it again? why the mpath removal will fail in the first place? and why you think that second time removal will address it? is it just from experience?
utils/mpath.go, line 77 at r2 (raw file):
index := bodyRegex.FindStringIndex(text) trimedText := text[index[0]:] deviceName := strings.Split(trimedText, " ")[1]
make sure u have unit tests with faulty devices as well.
what happened if its faulty device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- no need, I removed it.
- hope no.
- We now use another way to resan HBAs, I don't think there are any new risks here.
- passed svc iscsi and ds8k fc.
Reviewable status: 0 of 17 files reviewed, 14 unresolved discussions (waiting on @27149chen and @shay-berman)
remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):
Previously, shay-berman wrote…
not sure i like this setting
how u ensure that [0] is fc and 1 is iscsi? can u check it by the object it self? and what if you get only 1 conns? you may get error on run time.
r u sure its the right thing to do?
- It is used only for UT, so I can make sure 0 is fc and 1 is iscsi,
- If only one conn is passed, ut will fail
remote/mounter/block_device_utils/rescan.go, line 30 at r2 (raw file):
Previously, shay-berman wrote…
try to keep camel case, since capital is not standard in go even for const.
Done.
remote/mounter/block_device_utils/rescan.go, line 55 at r2 (raw file):
Previously, shay-berman wrote…
why u need the volumemountproperties here?
- I need the volume wwn
- keep same with FC
remote/mounter/initiator/linuxscsi.go, line 15 at r2 (raw file):
Previously, shay-berman wrote…
if u can please state why 3 time was choosen.
also please add in the PR description that you decided to increase the rescan (and if u can please say why)
no special reason, os-brick retry 3 times
remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):
Previously, shay-berman wrote…
what is initi?
could you please provide self-descriptive name for the arg and also for the key in the scsiConnector. maybe connector object? not sure from the code.
initi = initiator object, will be a fc or iscsi initiator
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
Previously, shay-berman wrote…
is there any difference to get the HBA if u r scsi vs iscsi?
yes, fc will return fc hbas, and iscsi will return iscsi initiator(s) (not implemented yet)
remote/mounter/initiator/connectors/iscsi.go, line 11 at r2 (raw file):
Previously, shay-berman wrote…
its a bit confusing that the struct is iscsiconnector and the attribute inside is scsi without I.
scsiConnector is the Base of iscsiConnector and fcConnector
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
Previously, shay-berman wrote…
- a bit confusing the title is scsi and then u use iscsi. is there any differance between them?
- in the PR you mentioned u just to iscsi rescan before unmount flow - is this the only change regarding to iscsi?
- u test here only that the physical devices removed, but what about the mpath device? do u need to check it as well?
- changed to iscsi
- for now, yes.
- good point, I think it'll be better if we can check and retry, but it is a big code change, i'll do it next release.
remote/mounter/initiator/connectors/scsi.go, line 27 at r2 (raw file):
Previously, shay-berman wrote…
is there any idempotent case here that should be handled? not sure, just asking.
I guess no
remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):
Previously, shay-berman wrote…
so we actually assume that if no mpath found , then there is no physical devices for it, right?
yes, because we have no way to find it, will improve it in next release, we should not rely on multipath
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
Previously, shay-berman wrote…
please handle error if such happned inside the function
there is no error to handle for the "multipath -f"
remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):
Previously, shay-berman wrote…
so actually if it fails here for some reason it will never try to remove it again, because u already removed the mpath device, right?
yes
remote/mounter/initiator/connectors/scsi.go, line 51 at r2 (raw file):
Previously, shay-berman wrote…
- please handle errors
- and maybe trigger it only if it was failed few lines before
- any specific reason why you trigger it again? why the mpath removal will fail in the first place? and why you think that second time removal will address it? is it just from experience?
- I think no error need to be handle about "multipath -f"
- multipath -f will return directly if device is not found
- from os-brick code logic
utils/mpath.go, line 77 at r2 (raw file):
Previously, shay-berman wrote…
make sure u have unit tests with faulty devices as well.
what happened if its faulty device?
faulty device output is similar with regular one, just different status in same location
Signed-off-by: 27149chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ok cool
- ack
- ack
- ack
Reviewable status: 0 of 18 files reviewed, 2 unresolved discussions (waiting on @27149chen)
remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):
Previously, 27149chen wrote…
- It is used only for UT, so I can make sure 0 is fc and 1 is iscsi,
- If only one conn is passed, ut will fail
ok
remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):
Previously, 27149chen wrote…
initi = initiator object, will be a fc or iscsi initiator
ok
must say that the naming is confusing, because it looks like that the Initiator is actually object that do rescan.
rename it only if u have time and u think it can help to clarify the code.
but anyway try to keep full name rather then initi.
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
Previously, 27149chen wrote…
yes, fc will return fc hbas, and iscsi will return iscsi initiator(s) (not implemented yet)
ok, i probably didn't see implementation so i asked for the difference.
r u going to implement iscsi it as part of this PR?
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
Previously, 27149chen wrote…
- changed to iscsi
- for now, yes.
- good point, I think it'll be better if we can check and retry, but it is a big code change, i'll do it next release.
- ack
- i think you mentioned that you going to remove the "iscsi rescan before unmount flow". So I guess there is nothing special for iscsi in this PR. is it corrent?
- ok, if u think the code is not covered enough then try to add at least 1 UT to cover it.
remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):
Previously, 27149chen wrote…
yes, because we have no way to find it, will improve it in next release, we should not rely on multipath
if u think its very important fix, then open jira ticket for it.
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
Previously, 27149chen wrote…
there is no error to handle for the "multipath -f"
sound strange but ok.
i asked it because you have a comment that says "If flushing the multipath failed before, try now after we have removed the devices."
so i guessed that the command line failed with exit code.
remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):
Previously, 27149chen wrote…
yes
ok sounds like a potential faulty devices left behind (rare case).
is this related to the issue you mentioned before to avoid using multipath -ll ? if so then please specify it inside the new jira ticket.
utils/mpath.go, line 77 at r2 (raw file):
Previously, 27149chen wrote…
faulty device output is similar with regular one, just different status in same location
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 2 unresolved discussions (waiting on @27149chen and @shay-berman)
remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):
Previously, shay-berman wrote…
ok
must say that the naming is confusing, because it looks like that the Initiator is actually object that do rescan.
rename it only if u have time and u think it can help to clarify the code.but anyway try to keep full name rather then initi.
ok, updated.
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
Previously, shay-berman wrote…
ok, i probably didn't see implementation so i asked for the difference.
r u going to implement iscsi it as part of this PR?
no
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
Previously, shay-berman wrote…
- ack
- i think you mentioned that you going to remove the "iscsi rescan before unmount flow". So I guess there is nothing special for iscsi in this PR. is it corrent?
- ok, if u think the code is not covered enough then try to add at least 1 UT to cover it.
- yes
- I will add it when check-and-retry is implemented, currently, I think nothing to test.
remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):
Previously, shay-berman wrote…
if u think its very important fix, then open jira ticket for it.
ok, created. UB-1953
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
Previously, shay-berman wrote…
sound strange but ok.
i asked it because you have a comment that says "If flushing the multipath failed before, try now after we have removed the devices."
so i guessed that the command line failed with exit code.
For some rare cases first multipath flush will fail and return device in use, but after io flush and device removal, next multipath flush will succeed.
remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):
Previously, shay-berman wrote…
ok sounds like a potential faulty devices left behind (rare case).
is this related to the issue you mentioned before to avoid using multipath -ll ? if so then please specify it inside the new jira ticket.
yes, if we can avoid using multipath output, we will retry after failure.
Signed-off-by: 27149chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
Previously, 27149chen wrote…
no
ok, does it means after this PR the iscsi env in CI will be broken? if so i would suggest to do it in this PR. if not then LGTM
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
Previously, 27149chen wrote…
- yes
- I will add it when check-and-retry is implemented, currently, I think nothing to test.
- if u think its important make sure u have jira ticket for it.
its ok
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
Previously, 27149chen wrote…
For some rare cases first multipath flush will fail and return device in use, but after io flush and device removal, next multipath flush will succeed.
ok
if u can add a comment line about your insight.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)
remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):
Previously, shay-berman wrote…
ok, does it means after this PR the iscsi env in CI will be broken? if so i would suggest to do it in this PR. if not then LGTM
won't be broken
remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):
Previously, shay-berman wrote…
- if u think its important make sure u have jira ticket for it.
its ok
ticket is created
remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):
Previously, shay-berman wrote…
ok
if u can add a comment line about your insight.
lgtm
comment added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)
Description of the issue
There is an issue with the new rescan method introduced in PR #294 that the unmount didn't clean up the physical devices.
More detail : In the new cleanup proceduce, we get the multipath devices belonging to the same volume from multipath -ll output, but it happens in ActionAfterDetach, before that, a multipath -f is called, which will remove the volume record from the multipath -ll output. So that we can't get what we want, and the command "echo 1 > /sys/block/sdx/device/delete" won't be called. It will lead to stale info (stale paths in /dev/disk/by-path/) on host.
Description of the optimal solution
Origin order of flex unmount is:
Change the order of the flex unmount as follow:
This change is