-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special rate limiter for namespace replication inducing APIs #4455
Add special rate limiter for namespace replication inducing APIs #4455
Conversation
common/dynamicconfig/constants.go
Outdated
// The limit is evenly distributed among available internal-frontend service instances. If this is set, it | ||
// overwrites the per instance limit configured with "internal-frontend.namespaceRPS.namespaceReplicationInducingAPIs". | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
InternalFrontendGlobalNamespaceNamespaceReplicationInducingAPIsRPS = "internal-frontend.globalNamespaceRPS.namespaceReplicationInducingAPIs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this, do we? can we just set it to unlimited or 1000 rps? the worker role does do UpdateWorkerBuildIdCompatibility, but it's limited by the rate of the scavenger. and for UpdateNamespace we probably don't want to slow it down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to put some value there, might as well have this dynamic config option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, each option adds more mental overhead. if we agree it doesn't make sense I'd prefer not to have an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll make a function that returns 0 instead of making it configurable.
// FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance is a per host/per namespace RPS limit for | ||
// namespace replication inducing APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance = "frontend.namespaceRPS.namespaceReplicationInducingAPIs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall plan is to deprecate the non-global ones and keep only the global ones, since they're easier to configure (don't have to think about how many instances you have, they divide the rate as it gets scaled). so for new keys, we could just do a global one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I thought the global one was still experimental.
Let's leave it for now and remove once global becomes the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we could do it now and have less work to do later? there's no backwards compatibility to worry about
// FrontendNamespaceReplicationInducingAPIsRPS limits the per second request rate for namespace replication inducing | ||
// APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendNamespaceReplicationInducingAPIsRPS = "frontend.rps.namespaceReplicationInducingAPIs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems not that useful. the point is it's a global resource. so we really want one "global" total limit (automatically divided across frontends) and one global per-namespace limit that's set lower (to prevent any one ns from hogging the entire global limit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'll remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now there's no overall limit, there's only per-namespace limits. we need both if we want this to be effective.
(the naming is bad.. in this context "global" means "across all frontends", not "across all namespaces". we want an "overall global" i.e. "across all namespaces + across all frontends" limit.)
// The replication queue is used to propagate critical failover messages and this mapping prevents flooding the | ||
// queue and delaying failover. | ||
NamespaceReplicationInducingAPIToPriority = map[string]int{ | ||
"UpdateNamespace": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about RegisterNamespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't care about that one, it's called once per namespace so it can't flood the per namespace replication queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? there is no per-namespace queue, there's only one queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was wrong.
00aa856
to
5658607
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the intention is to limit the call rate of e.g. update namespace,
why not simply rate limit register / update namespace to rate / burst: 0.00166666666 & 1?
this PR seems to special handle the workflow versioning story with namespace
Have you consider use a separate queue? |
Many times but decided not to for the time being. |
If this is the case, why not do it now? Then we don't need to worry about namespace replication being flooded? |
This reverts commit 5658607.
…on-inducing-rate-limiter
// guessedSetId := hashBuildId(buildId) | ||
// return guessedSetId, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intentional?
Added special rate limiting for APIs that may insert replication tasks into a namespace replication queue.
The replication queue is used to propagate critical failover messages and this mapping prevents flooding the
queue and delaying failover.
Closes #4240