Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Make nodepool concurrent ops scale better (#12488)" #12916

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

Xylosma
Copy link
Contributor

@Xylosma Xylosma commented Jan 29, 2025

This reverts commit 1dbed42.

Release Note Template for Downstream PRs (will be copied)

container: reverted locking behavior that caused regression of operation apply time spike started in `v6.15`

@github-actions github-actions bot requested a review from roaks3 January 29, 2025 22:15
Copy link

Hello! I am a robot. Tests will require approval from a repository maintainer to run.

@roaks3, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@modular-magician modular-magician added the awaiting-approval Pull requests that need reviewer's approval to run presubmit tests label Jan 29, 2025
Copy link
Member

@KatrinaHoffert KatrinaHoffert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from GKE's side.

We might wanna tweak the proposed release note, though? Mark it as type bug (https://googlecloudplatform.github.io/magic-modules/code-review/release-notes/#type-specific-guidelines-and-examples). Probably don't use the word "commit" and mention when the issue started (v6.15). There doesn't seem to be a lot of past precedent for how to word it, but I found one similar case that we may want to copy from:

vpcaccess: reverted new behaviour introduced by resource google_vpc_access_connector in 4.75.0. min_throughput and max_throughput fields lost their default value, and customers could not make deployment due to that change.

@Xylosma
Copy link
Contributor Author

Xylosma commented Jan 29, 2025

LGTM from GKE's side.

We might wanna tweak the proposed release note, though? Mark it as type bug (https://googlecloudplatform.github.io/magic-modules/code-review/release-notes/#type-specific-guidelines-and-examples). Probably don't use the word "commit" and mention when the issue started (v6.15). There doesn't seem to be a lot of past precedent for how to word it, but I found one similar case that we may want to copy from:

vpcaccess: reverted new behaviour introduced by resource google_vpc_access_connector in 4.75.0. min_throughput and max_throughput fields lost their default value, and customers could not make deployment due to that change.

Added the impacted version.
The release note was already a type bug, was there anywhere else that I should mark it?

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here was the original PR #12488

@Xylosma could you just confirm that other than the apply time, there is no other impact here? I want to be sure this isn't a breaking change, since the previous change is already out in the wild.

@modular-magician modular-magician added service/container and removed awaiting-approval Pull requests that need reviewer's approval to run presubmit tests labels Jan 30, 2025
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 2 files changed, 175 insertions(+), 147 deletions(-))
google-beta provider: Diff ( 2 files changed, 175 insertions(+), 147 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 225
Passed tests: 214
Skipped tests: 11
Affected tests: 0

Click here to see the affected service packages
  • container

🟢 All tests passed!

View the build log

@KatrinaHoffert
Copy link
Member

This should not be a breaking change, at least not in the typical sense. The previous PR unintentionally mostly removed parallelism, since the cluster write lock would never be acquireable when we're waiting for a node pool to be created/updated/deleted (except for a brief race condition when a node pool request just finished but we haven't yet acquired the read lock while it waits for the operation). This restores parallelism as it previously worked.

If anything, the previous PR could be viewed as somewhat breaking because the lack of parallelism could cause TF apply to timeout due to taking longer. For any users impacted in that way, temporarily reducing parallelism until this revert is live should fix that. That works because the timeouts are counting down even when waiting for these locks. Such a workaround would only be for fixing timeouts caused by this issue. This PR is required to fix the apply time.

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants