Skip to content
This repository was archived by the owner on Jul 22, 2024. It is now read-only.

Feature/ub 1941 unmount fix #299

Merged
merged 16 commits into from
Mar 26, 2019
Merged

Feature/ub 1941 unmount fix #299

merged 16 commits into from
Mar 26, 2019

Conversation

27149chen
Copy link
Contributor

@27149chen 27149chen commented Mar 19, 2019

Description of the issue
There is an issue with the new rescan method introduced in PR #294 that the unmount didn't clean up the physical devices.
More detail : In the new cleanup proceduce, we get the multipath devices belonging to the same volume from multipath -ll output, but it happens in ActionAfterDetach, before that, a multipath -f is called, which will remove the volume record from the multipath -ll output. So that we can't get what we want, and the command "echo 1 > /sys/block/sdx/device/delete" won't be called. It will lead to stale info (stale paths in /dev/disk/by-path/) on host.

Description of the optimal solution
Origin order of flex unmount is:

  1. Flush the multipth device: multipath -f mpathx
  2. Unmap the volume from storage
  3. Rescan iscsi session
  4. resan-scsi-bus.sh -r

Change the order of the flex unmount as follow:

  1. Rescan iscsi session if it is a iscsi host
  2. Flush the multipth device: multipath -f mpathx
  3. Remove the devices: echo 1 > /sys/block/sdx/device/delete
  4. Unmap the volume from storage

This change is Reviewable

@27149chen 27149chen requested a review from shay-berman March 19, 2019 10:13
@coveralls
Copy link

coveralls commented Mar 19, 2019

Coverage Status

Coverage remained the same at 66.484% when pulling 25032d0 on feature/UB-1941_unmount_fix into 8e7f868 on dev.

@27149chen 27149chen self-assigned this Mar 19, 2019
@27149chen 27149chen added this to the Helm chart support milestone Mar 19, 2019
Copy link
Contributor

@shay-berman shay-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks for updating the description of the PR with the original order vs the new one.
    About the order: can you explain why you need to run "Rescan iscsi session if it is a iscsi host" as the first step of the unmount order?

  2. Is there any new idempotent cases by the new order?

  3. you mentioned that no rescan -r in the unmount. but do we still use rescan in the mount phase? then i wonder what happned if create pod with PV1 happened in the same time of delete PV2 in the same host. is there any new risk here?

  4. is this branch passed staging on A9000 and SVC?

Reviewable status: 0 of 17 files reviewed, 14 unresolved discussions (waiting on @27149chen and @shay-berman)


remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):

		iscsiConnector = connectors.NewISCSIConnectorWithExecutor(executor)
	} else {
		fcConnector = conns[0]

not sure i like this setting
how u ensure that [0] is fc and 1 is iscsi? can u check it by the object it self? and what if you get only 1 conns? you may get error on run time.
r u sure its the right thing to do?


remote/mounter/block_device_utils/rescan.go, line 30 at r2 (raw file):

const rescanIscsiTimeout = 1 * 60 * 1000
const rescanScsiTimeout = 2 * 60 * 1000
const ISCSIADM = "iscsiadm"

try to keep camel case, since capital is not standard in go even for const.


remote/mounter/block_device_utils/rescan.go, line 55 at r2 (raw file):

		return b.CleanupSCSIDevices(volumeMountProperties)
	case ISCSI:
		return b.CleanupISCSIDevices(volumeMountProperties)

why u need the volumemountproperties here?


remote/mounter/initiator/linuxscsi.go, line 15 at r2 (raw file):

const multipathCmd = "multipath"
const FlushTimeout = 10 * 1000
const FlushRetries = 3

if u can please state why 3 time was choosen.
also please add in the PR description that you decided to increase the rescan (and if u can please say why)


remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):

func newFibreChannelConnectorWithExecutorAndLogger(executor utils.Executor, logger logs.Logger) *fibreChannelConnector {
	initi := initiator.NewLinuxFibreChannelWithExecutor(executor)
	return &fibreChannelConnector{&scsiConnector{logger: logger, exec: executor, initi: initi}}

what is initi?
could you please provide self-descriptive name for the arg and also for the key in the scsiConnector. maybe connector object? not sure from the code.


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

// ConnectVolume attach the volume to host by rescaning all the active FC HBAs.
func (c *fibreChannelConnector) ConnectVolume(volumeMountProperties *resources.VolumeMountProperties) error {
	hbas := c.initi.GetHBAs()

is there any difference to get the HBA if u r scsi vs iscsi?


remote/mounter/initiator/connectors/iscsi.go, line 11 at r2 (raw file):

type iscsiConnector struct {
	*scsiConnector

its a bit confusing that the struct is iscsiconnector and the attribute inside is scsi without I.


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

		It("should call multipath and remove all the scsi devices", func() {
			err := iscsiConnector.DisconnectVolume(volumeMountProperties)
  1. a bit confusing the title is scsi and then u use iscsi. is there any differance between them?
  2. in the PR you mentioned u just to iscsi rescan before unmount flow - is this the only change regarding to iscsi?
  3. u test here only that the physical devices removed, but what about the mpath device? do u need to check it as well?

remote/mounter/initiator/connectors/scsi.go, line 27 at r2 (raw file):

	_, devMapper, devNames, err := utils.GetMultipathOutputAndDeviceMapperAndDevice(volumeMountProperties.WWN, c.exec)
	if err != nil {
		return c.logger.ErrorRet(err, "Failed to get multipath output before disconnecting volume")

is there any idempotent case here that should be handled? not sure, just asking.


remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):

	}
	if devMapper == "" {
		// The device is already removed

so we actually assume that if no mpath found , then there is no physical devices for it, right?


remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

	// flush multipath device
	c.logger.Info("Flush multipath device", logs.Args{{"name", devMapper}})
	c.initi.FlushMultipath(devMapper)

please handle error if such happned inside the function


remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):

	err = c.removeDevices(devices)
	if err != nil {
		return c.logger.ErrorRet(err, "Failed to remove devices")

so actually if it fails here for some reason it will never try to remove it again, because u already removed the mpath device, right?


remote/mounter/initiator/connectors/scsi.go, line 51 at r2 (raw file):

	// If flushing the multipath failed before, try now after we have removed the devices.
	c.logger.Info("Flush multipath device again after removing the devices", logs.Args{{"name", devMapper}})
	c.initi.FlushMultipath(devMapper)
  1. please handle errors
  2. and maybe trigger it only if it was failed few lines before
  3. any specific reason why you trigger it again? why the mpath removal will fail in the first place? and why you think that second time removal will address it? is it just from experience?

utils/mpath.go, line 77 at r2 (raw file):

				index := bodyRegex.FindStringIndex(text)
				trimedText := text[index[0]:]
				deviceName := strings.Split(trimedText, " ")[1]

make sure u have unit tests with faulty devices as well.
what happened if its faulty device?

Copy link
Contributor Author

@27149chen 27149chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. no need, I removed it.
  2. hope no.
  3. We now use another way to resan HBAs, I don't think there are any new risks here.
  4. passed svc iscsi and ds8k fc.

Reviewable status: 0 of 17 files reviewed, 14 unresolved discussions (waiting on @27149chen and @shay-berman)


remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):

Previously, shay-berman wrote…

not sure i like this setting
how u ensure that [0] is fc and 1 is iscsi? can u check it by the object it self? and what if you get only 1 conns? you may get error on run time.
r u sure its the right thing to do?

  1. It is used only for UT, so I can make sure 0 is fc and 1 is iscsi,
  2. If only one conn is passed, ut will fail

remote/mounter/block_device_utils/rescan.go, line 30 at r2 (raw file):

Previously, shay-berman wrote…

try to keep camel case, since capital is not standard in go even for const.

Done.


remote/mounter/block_device_utils/rescan.go, line 55 at r2 (raw file):

Previously, shay-berman wrote…

why u need the volumemountproperties here?

  1. I need the volume wwn
  2. keep same with FC

remote/mounter/initiator/linuxscsi.go, line 15 at r2 (raw file):

Previously, shay-berman wrote…

if u can please state why 3 time was choosen.
also please add in the PR description that you decided to increase the rescan (and if u can please say why)

no special reason, os-brick retry 3 times


remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):

Previously, shay-berman wrote…

what is initi?
could you please provide self-descriptive name for the arg and also for the key in the scsiConnector. maybe connector object? not sure from the code.

initi = initiator object, will be a fc or iscsi initiator


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

Previously, shay-berman wrote…

is there any difference to get the HBA if u r scsi vs iscsi?

yes, fc will return fc hbas, and iscsi will return iscsi initiator(s) (not implemented yet)


remote/mounter/initiator/connectors/iscsi.go, line 11 at r2 (raw file):

Previously, shay-berman wrote…

its a bit confusing that the struct is iscsiconnector and the attribute inside is scsi without I.

scsiConnector is the Base of iscsiConnector and fcConnector


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

Previously, shay-berman wrote…
  1. a bit confusing the title is scsi and then u use iscsi. is there any differance between them?
  2. in the PR you mentioned u just to iscsi rescan before unmount flow - is this the only change regarding to iscsi?
  3. u test here only that the physical devices removed, but what about the mpath device? do u need to check it as well?
  1. changed to iscsi
  2. for now, yes.
  3. good point, I think it'll be better if we can check and retry, but it is a big code change, i'll do it next release.

remote/mounter/initiator/connectors/scsi.go, line 27 at r2 (raw file):

Previously, shay-berman wrote…

is there any idempotent case here that should be handled? not sure, just asking.

I guess no


remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):

Previously, shay-berman wrote…

so we actually assume that if no mpath found , then there is no physical devices for it, right?

yes, because we have no way to find it, will improve it in next release, we should not rely on multipath


remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

Previously, shay-berman wrote…

please handle error if such happned inside the function

there is no error to handle for the "multipath -f"


remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):

Previously, shay-berman wrote…

so actually if it fails here for some reason it will never try to remove it again, because u already removed the mpath device, right?

yes


remote/mounter/initiator/connectors/scsi.go, line 51 at r2 (raw file):

Previously, shay-berman wrote…
  1. please handle errors
  2. and maybe trigger it only if it was failed few lines before
  3. any specific reason why you trigger it again? why the mpath removal will fail in the first place? and why you think that second time removal will address it? is it just from experience?
  1. I think no error need to be handle about "multipath -f"
  2. multipath -f will return directly if device is not found
  3. from os-brick code logic

utils/mpath.go, line 77 at r2 (raw file):

Previously, shay-berman wrote…

make sure u have unit tests with faulty devices as well.
what happened if its faulty device?

faulty device output is similar with regular one, just different status in same location

Signed-off-by: 27149chen <[email protected]>
Copy link
Contributor

@shay-berman shay-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ok cool
  2. ack
  3. ack
  4. ack

Reviewable status: 0 of 18 files reviewed, 2 unresolved discussions (waiting on @27149chen)


remote/mounter/block_device_utils/block_device_utils.go, line 48 at r2 (raw file):

Previously, 27149chen wrote…
  1. It is used only for UT, so I can make sure 0 is fc and 1 is iscsi,
  2. If only one conn is passed, ut will fail

ok


remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):

Previously, 27149chen wrote…

initi = initiator object, will be a fc or iscsi initiator

ok
must say that the naming is confusing, because it looks like that the Initiator is actually object that do rescan.
rename it only if u have time and u think it can help to clarify the code.

but anyway try to keep full name rather then initi.


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

Previously, 27149chen wrote…

yes, fc will return fc hbas, and iscsi will return iscsi initiator(s) (not implemented yet)

ok, i probably didn't see implementation so i asked for the difference.
r u going to implement iscsi it as part of this PR?


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

Previously, 27149chen wrote…
  1. changed to iscsi
  2. for now, yes.
  3. good point, I think it'll be better if we can check and retry, but it is a big code change, i'll do it next release.
  1. ack
  2. i think you mentioned that you going to remove the "iscsi rescan before unmount flow". So I guess there is nothing special for iscsi in this PR. is it corrent?
  3. ok, if u think the code is not covered enough then try to add at least 1 UT to cover it.

remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):

Previously, 27149chen wrote…

yes, because we have no way to find it, will improve it in next release, we should not rely on multipath

if u think its very important fix, then open jira ticket for it.


remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

Previously, 27149chen wrote…

there is no error to handle for the "multipath -f"

sound strange but ok.
i asked it because you have a comment that says "If flushing the multipath failed before, try now after we have removed the devices."
so i guessed that the command line failed with exit code.


remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):

Previously, 27149chen wrote…

yes

ok sounds like a potential faulty devices left behind (rare case).
is this related to the issue you mentioned before to avoid using multipath -ll ? if so then please specify it inside the new jira ticket.


utils/mpath.go, line 77 at r2 (raw file):

Previously, 27149chen wrote…

faulty device output is similar with regular one, just different status in same location

ok

Copy link
Contributor Author

@27149chen 27149chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, 2 unresolved discussions (waiting on @27149chen and @shay-berman)


remote/mounter/initiator/connectors/fibre_channel.go, line 36 at r2 (raw file):

Previously, shay-berman wrote…

ok
must say that the naming is confusing, because it looks like that the Initiator is actually object that do rescan.
rename it only if u have time and u think it can help to clarify the code.

but anyway try to keep full name rather then initi.

ok, updated.


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

Previously, shay-berman wrote…

ok, i probably didn't see implementation so i asked for the difference.
r u going to implement iscsi it as part of this PR?

no


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

Previously, shay-berman wrote…
  1. ack
  2. i think you mentioned that you going to remove the "iscsi rescan before unmount flow". So I guess there is nothing special for iscsi in this PR. is it corrent?
  3. ok, if u think the code is not covered enough then try to add at least 1 UT to cover it.
  1. yes
  2. I will add it when check-and-retry is implemented, currently, I think nothing to test.

remote/mounter/initiator/connectors/scsi.go, line 30 at r2 (raw file):

Previously, shay-berman wrote…

if u think its very important fix, then open jira ticket for it.

ok, created. UB-1953


remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

Previously, shay-berman wrote…

sound strange but ok.
i asked it because you have a comment that says "If flushing the multipath failed before, try now after we have removed the devices."
so i guessed that the command line failed with exit code.

For some rare cases first multipath flush will fail and return device in use, but after io flush and device removal, next multipath flush will succeed.


remote/mounter/initiator/connectors/scsi.go, line 46 at r2 (raw file):

Previously, shay-berman wrote…

ok sounds like a potential faulty devices left behind (rare case).
is this related to the issue you mentioned before to avoid using multipath -ll ? if so then please specify it inside the new jira ticket.

yes, if we can avoid using multipath output, we will retry after failure.

Signed-off-by: 27149chen <[email protected]>
Copy link
Contributor

@shay-berman shay-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

Previously, 27149chen wrote…

no

ok, does it means after this PR the iscsi env in CI will be broken? if so i would suggest to do it in this PR. if not then LGTM


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

Previously, 27149chen wrote…
  1. yes
  2. I will add it when check-and-retry is implemented, currently, I think nothing to test.
  1. if u think its important make sure u have jira ticket for it.
    its ok

remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

Previously, 27149chen wrote…

For some rare cases first multipath flush will fail and return device in use, but after io flush and device removal, next multipath flush will succeed.

ok
if u can add a comment line about your insight.
lgtm

Copy link
Contributor Author

@27149chen 27149chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)


remote/mounter/initiator/connectors/fibre_channel.go, line 41 at r2 (raw file):

Previously, shay-berman wrote…

ok, does it means after this PR the iscsi env in CI will be broken? if so i would suggest to do it in this PR. if not then LGTM

won't be broken


remote/mounter/initiator/connectors/iscsi_test.go, line 35 at r2 (raw file):

Previously, shay-berman wrote…
  1. if u think its important make sure u have jira ticket for it.
    its ok

ticket is created


remote/mounter/initiator/connectors/scsi.go, line 36 at r2 (raw file):

Previously, shay-berman wrote…

ok
if u can add a comment line about your insight.
lgtm

comment added

Copy link
Contributor

@shay-berman shay-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 18 files reviewed, all discussions resolved (waiting on @27149chen)

@27149chen 27149chen merged commit f01ccd3 into dev Mar 26, 2019
@27149chen 27149chen deleted the feature/UB-1941_unmount_fix branch March 26, 2019 13:30
@shay-berman shay-berman modified the milestone: Helm chart support Mar 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants