-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special rate limiter for namespace replication inducing APIs #4455
Changes from all commits
ade0c9b
b4438b1
5658607
d8c8213
bbd6781
991cf1a
42b8b1d
2af1af1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -217,6 +217,10 @@ const ( | |
FrontendHistoryMaxPageSize = "frontend.historyMaxPageSize" | ||
// FrontendRPS is workflow rate limit per second | ||
FrontendRPS = "frontend.rps" | ||
// FrontendNamespaceReplicationInducingAPIsRPS limits the per second request rate for namespace replication inducing | ||
// APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendNamespaceReplicationInducingAPIsRPS = "frontend.rps.namespaceReplicationInducingAPIs" | ||
// FrontendMaxNamespaceRPSPerInstance is workflow namespace rate limit per second | ||
FrontendMaxNamespaceRPSPerInstance = "frontend.namespaceRPS" | ||
// FrontendMaxNamespaceBurstPerInstance is workflow namespace burst limit | ||
|
@@ -226,9 +230,17 @@ const ( | |
// FrontendMaxNamespaceVisibilityRPSPerInstance is namespace rate limit per second for visibility APIs. | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendMaxNamespaceVisibilityRPSPerInstance = "frontend.namespaceRPS.visibility" | ||
// FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance is a per host/per namespace RPS limit for | ||
// namespace replication inducing APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance = "frontend.namespaceRPS.namespaceReplicationInducingAPIs" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the overall plan is to deprecate the non-global ones and keep only the global ones, since they're easier to configure (don't have to think about how many instances you have, they divide the rate as it gets scaled). so for new keys, we could just do a global one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I thought the global one was still experimental. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or we could do it now and have less work to do later? there's no backwards compatibility to worry about |
||
// FrontendMaxNamespaceVisibilityBurstPerInstance is namespace burst limit for visibility APIs. | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendMaxNamespaceVisibilityBurstPerInstance = "frontend.namespaceBurst.visibility" | ||
// FrontendMaxNamespaceNamespaceReplicationInducingAPIsBurstPerInstance is a per host/per namespace burst limit for | ||
// namespace replication inducing APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendMaxNamespaceNamespaceReplicationInducingAPIsBurstPerInstance = "frontend.namespaceBurst.namespaceReplicationInducingAPIs" | ||
// FrontendGlobalNamespaceRPS is workflow namespace rate limit per second for the whole cluster. | ||
// The limit is evenly distributed among available frontend service instances. | ||
// If this is set, it overwrites per instance limit "frontend.namespaceRPS". | ||
|
@@ -241,6 +253,13 @@ const ( | |
// If this is set, it overwrites per instance limit "frontend.namespaceRPS.visibility". | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendGlobalNamespaceVisibilityRPS = "frontend.globalNamespaceRPS.visibility" | ||
// FrontendGlobalNamespaceNamespaceReplicationInducingAPIsRPS is a cluster global, per namespace RPS limit for | ||
// namespace replication inducing APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility). | ||
// The limit is evenly distributed among available frontend service instances. | ||
// If this is set, it overwrites the per instance limit configured with | ||
// "frontend.namespaceRPS.namespaceReplicationInducingAPIs". | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
FrontendGlobalNamespaceNamespaceReplicationInducingAPIsRPS = "frontend.globalNamespaceRPS.namespaceReplicationInducingAPIs" | ||
// InternalFrontendGlobalNamespaceVisibilityRPS is workflow namespace rate limit per second | ||
// across all internal-frontends. | ||
// This config is EXPERIMENTAL and may be changed or removed in a later release. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,7 +72,6 @@ var ( | |
"PollActivityTaskQueue": 2, | ||
"GetWorkflowExecutionHistoryReverse": 2, | ||
"GetWorkerBuildIdCompatibility": 2, | ||
"UpdateWorkerBuildIdCompatibility": 2, | ||
"GetWorkerTaskReachability": 2, | ||
"DeleteWorkflowExecution": 2, | ||
|
||
|
@@ -95,13 +94,22 @@ var ( | |
|
||
VisibilityAPIPrioritiesOrdered = []int{0} | ||
|
||
// Special rate limiting for APIs that may insert replication tasks into a namespace replication queue. | ||
// The replication queue is used to propagate critical failover messages and this mapping prevents flooding the | ||
// queue and delaying failover. | ||
NamespaceReplicationInducingAPIToPriority = map[string]int{ | ||
"UpdateNamespace": 0, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about RegisterNamespace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't care about that one, it's called once per namespace so it can't flood the per namespace replication queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? there is no per-namespace queue, there's only one queue There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was wrong. |
||
"UpdateWorkerBuildIdCompatibility": 1, | ||
} | ||
|
||
NamespaceReplicationInducingAPIPrioritiesOrdered = []int{0, 1} | ||
|
||
OtherAPIToPriority = map[string]int{ | ||
"GetClusterInfo": 0, | ||
"GetSystemInfo": 0, | ||
"GetSearchAttributes": 0, | ||
|
||
"RegisterNamespace": 0, | ||
"UpdateNamespace": 0, | ||
"DescribeNamespace": 0, | ||
"ListNamespaces": 0, | ||
"DeprecateNamespace": 0, | ||
|
@@ -157,12 +165,14 @@ func (c *NamespaceRateBurstImpl) Burst() int { | |
func NewRequestToRateLimiter( | ||
executionRateBurstFn quotas.RateBurst, | ||
visibilityRateBurstFn quotas.RateBurst, | ||
namespaceReplicationInducingRateBurstFn quotas.RateBurst, | ||
otherRateBurstFn quotas.RateBurst, | ||
) quotas.RequestRateLimiter { | ||
mapping := make(map[string]quotas.RequestRateLimiter) | ||
|
||
executionRateLimiter := NewExecutionPriorityRateLimiter(executionRateBurstFn) | ||
visibilityRateLimiter := NewVisibilityPriorityRateLimiter(visibilityRateBurstFn) | ||
namespaceReplicationInducingRateLimiter := NewNamespaceReplicationInducingAPIPriorityRateLimiter(namespaceReplicationInducingRateBurstFn) | ||
otherRateLimiter := NewOtherAPIPriorityRateLimiter(otherRateBurstFn) | ||
|
||
for api := range ExecutionAPIToPriority { | ||
|
@@ -171,6 +181,9 @@ func NewRequestToRateLimiter( | |
for api := range VisibilityAPIToPriority { | ||
mapping[api] = visibilityRateLimiter | ||
} | ||
for api := range NamespaceReplicationInducingAPIToPriority { | ||
mapping[api] = namespaceReplicationInducingRateLimiter | ||
} | ||
for api := range OtherAPIToPriority { | ||
mapping[api] = otherRateLimiter | ||
} | ||
|
@@ -208,6 +221,21 @@ func NewVisibilityPriorityRateLimiter( | |
}, rateLimiters) | ||
} | ||
|
||
func NewNamespaceReplicationInducingAPIPriorityRateLimiter( | ||
rateBurstFn quotas.RateBurst, | ||
) quotas.RequestRateLimiter { | ||
rateLimiters := make(map[int]quotas.RequestRateLimiter) | ||
for priority := range NamespaceReplicationInducingAPIPrioritiesOrdered { | ||
rateLimiters[priority] = quotas.NewRequestRateLimiterAdapter(quotas.NewDynamicRateLimiter(rateBurstFn, time.Minute)) | ||
} | ||
return quotas.NewPriorityRateLimiter(func(req quotas.Request) int { | ||
if priority, ok := NamespaceReplicationInducingAPIToPriority[req.API]; ok { | ||
return priority | ||
} | ||
return NamespaceReplicationInducingAPIPrioritiesOrdered[len(NamespaceReplicationInducingAPIPrioritiesOrdered)-1] | ||
}, rateLimiters) | ||
} | ||
|
||
func NewOtherAPIPriorityRateLimiter( | ||
rateBurstFn quotas.RateBurst, | ||
) quotas.RequestRateLimiter { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems not that useful. the point is it's a global resource. so we really want one "global" total limit (automatically divided across frontends) and one global per-namespace limit that's set lower (to prevent any one ns from hogging the entire global limit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'll remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now there's no overall limit, there's only per-namespace limits. we need both if we want this to be effective.
(the naming is bad.. in this context "global" means "across all frontends", not "across all namespaces". we want an "overall global" i.e. "across all namespaces + across all frontends" limit.)