Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close Open disk file handles #126

Closed
wants to merge 1 commit into from

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Apr 14, 2021

What type of PR is this?
/kind bug

What this PR does / why we need it:

If csi-proxy keeps these files open it blocks EBS detach from succeeding.

I tested this fix and it works for me.

Full story:

While debugging my EBS CSI implementation I found everything worked fine except for volume Detach. EBS support said OS was reporting something holding on to the disk (despite me trying removing all accesspoints, taking disk offline, etc.). Eventually after much guessing and testing, narrowed it to csi-proxy, i.e. as soon as I stopped it Detach would proceed. (I'm no Windows expert so possibly there was an easier way to find out what was holding onto device, like via process explorer, but yeah...)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fix GetDiskNumberByName and ListDiskIDs keeping file handles open on disks

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 14, 2021
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 14, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @wongma7. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 14, 2021
@ddebroy
Copy link
Contributor

ddebroy commented Apr 19, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 19, 2021
@jingxu97
Copy link
Contributor

@wongma7 could you address the comment?

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 21, 2021
@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2021
@wongma7
Copy link
Contributor Author

wongma7 commented May 21, 2021

/assign @msau42

please approve! 2 liner PR! many thanks!

@@ -254,6 +254,7 @@ func (imp APIImplementor) GetDiskNumberWithID(page83ID string) (uint32, error) {

for i := range disks {
h, err := syscall.Open(disks[i].Path, syscall.O_RDONLY, 0)
defer syscall.Close(h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the maximum number of disks a node can have? Would it be worth enclosing this loop in a code section so that it gets closed at the end of each loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for ec2 depends on instance type but typically like 20-30
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/volume_limits.html#instance-type-volume-limits (EKS only supports windows nitro instances so the "AWS PV" driver in the other section don't apply)

do you mean to wrap it like this? I don't think it's worth the loss in readability

func() (uint32, error) {
   ...
   syscall.Open
   defer syscall.Close(h)
}()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a separate function could also work.

The default fd limit per process is 1024 for Linux. I don't know about Windows. Does Windows also have a maximum disk limit? cc @ddebroy @jingxu97

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some refactoring so we always open and close 1 handle at a time

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wongma7
To complete the pull request process, please ask for approval from jingxu97 after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 27, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 27, 2021
@wongma7
Copy link
Contributor Author

wongma7 commented May 27, 2021

/hold

need to retest on my setup

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2021
@jingxu97
Copy link
Contributor

@wongma7 thanks for this PR. Pls let me know after you test it.

@wongma7
Copy link
Contributor Author

wongma7 commented Jun 1, 2021

/hold cancel

together with #145 I tested that attach, mount, unmount, detach works with ebs

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2021
}
}

return 0, fmt.Errorf("Could not find disk with Page83 ID %s", page83ID)
}

var ErrNotFound = errors.New("not found")

func (imp APIImplementor) getDiskNumberByID(path string, page83ID string) (uint32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of confused by these two functions getDiskNumberAndIDByPath and getDiskNumberByID.

Is there a way to kind of combine those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let me combine.

(the PR started as 2 lines, bit by bit it has ballooned, so I am getting confused too 😂)

@jingxu97
Copy link
Contributor

jingxu97 commented Jun 1, 2021

@mauriciopoppe is working on disk API part #140
wondering whether you want to wait for that PR merge since there might be some name changes?

@wongma7
Copy link
Contributor Author

wongma7 commented Jun 1, 2021

@jingxu97 I'm going to revert to the original PR where I just call syscall.Close(h) at end of the loop. I don't want to refactor and then rebase if beta3 API will make it obsolete

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 1, 2021
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 1, 2021
@wongma7 wongma7 closed this Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants