Add special rate limiter for namespace replication inducing APIs #4455

bergundy · 2023-06-07T22:41:32Z

Added special rate limiting for APIs that may insert replication tasks into a namespace replication queue.
The replication queue is used to propagate critical failover messages and this mapping prevents flooding the
queue and delaying failover.

Closes #4240

dnr · 2023-06-08T16:44:42Z

common/dynamicconfig/constants.go

+	// The limit is evenly distributed among available internal-frontend service instances. If this is set, it
+	// overwrites the per instance limit configured with "internal-frontend.namespaceRPS.namespaceReplicationInducingAPIs".
+	// This config is EXPERIMENTAL and may be changed or removed in a later release.
+	InternalFrontendGlobalNamespaceNamespaceReplicationInducingAPIsRPS = "internal-frontend.globalNamespaceRPS.namespaceReplicationInducingAPIs"


I don't think we need this, do we? can we just set it to unlimited or 1000 rps? the worker role does do UpdateWorkerBuildIdCompatibility, but it's limited by the rate of the scavenger. and for UpdateNamespace we probably don't want to slow it down?

We have to put some value there, might as well have this dynamic config option.

eh, each option adds more mental overhead. if we agree it doesn't make sense I'd prefer not to have an option

We'll make a function that returns 0 instead of making it configurable.

dnr · 2023-06-08T16:50:37Z

common/dynamicconfig/constants.go

+	// FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance is a per host/per namespace RPS limit for
+	// namespace replication inducing APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility).
+	// This config is EXPERIMENTAL and may be changed or removed in a later release.
+	FrontendMaxNamespaceNamespaceReplicationInducingAPIsRPSPerInstance = "frontend.namespaceRPS.namespaceReplicationInducingAPIs"


I think the overall plan is to deprecate the non-global ones and keep only the global ones, since they're easier to configure (don't have to think about how many instances you have, they divide the rate as it gets scaled). so for new keys, we could just do a global one

Sure, I thought the global one was still experimental.
Let's leave it for now and remove once global becomes the default.

or we could do it now and have less work to do later? there's no backwards compatibility to worry about

dnr · 2023-06-08T16:52:58Z

common/dynamicconfig/constants.go

+	// FrontendNamespaceReplicationInducingAPIsRPS limits the per second request rate for namespace replication inducing
+	// APIs (e.g. UpdateNamespace, UpdateWorkerBuildIdCompatibility).
+	// This config is EXPERIMENTAL and may be changed or removed in a later release.
+	FrontendNamespaceReplicationInducingAPIsRPS = "frontend.rps.namespaceReplicationInducingAPIs"


this seems not that useful. the point is it's a global resource. so we really want one "global" total limit (automatically divided across frontends) and one global per-namespace limit that's set lower (to prevent any one ns from hogging the entire global limit)

Agree, I'll remove this

but now there's no overall limit, there's only per-namespace limits. we need both if we want this to be effective.

(the naming is bad.. in this context "global" means "across all frontends", not "across all namespaces". we want an "overall global" i.e. "across all namespaces + across all frontends" limit.)

dnr · 2023-06-08T16:54:04Z

service/frontend/configs/quotas.go

+	// The replication queue is used to propagate critical failover messages and this mapping prevents flooding the
+	// queue and delaying failover.
+	NamespaceReplicationInducingAPIToPriority = map[string]int{
+		"UpdateNamespace":                  0,


what about RegisterNamespace?

We don't care about that one, it's called once per namespace so it can't flood the per namespace replication queue.

? there is no per-namespace queue, there's only one queue

Yes, I was wrong.

wxing1292

if the intention is to limit the call rate of e.g. update namespace,
why not simply rate limit register / update namespace to rate / burst: 0.00166666666 & 1?

this PR seems to special handle the workflow versioning story with namespace

yux0 · 2023-06-09T04:32:13Z

Have you consider use a separate queue?

bergundy · 2023-06-09T04:34:41Z

Have you consider use a separate queue?

Many times but decided not to for the time being.
I believe we’ll end up there eventually.

yux0 · 2023-06-09T04:52:38Z

Have you consider use a separate queue?

Many times but decided not to for the time being. I believe we’ll end up there eventually.

If this is the case, why not do it now? Then we don't need to worry about namespace replication being flooded?

service/frontend/service.go

This reverts commit 5658607.

…on-inducing-rate-limiter

yiminc · 2023-06-10T00:48:39Z

service/matching/version_sets.go

+			// guessedSetId := hashBuildId(buildId)
+			// return guessedSetId, nil


this is intentional?

bergundy requested a review from a team as a code owner June 7, 2023 22:41

dnr reviewed Jun 8, 2023

View reviewed changes

bergundy added 3 commits June 8, 2023 11:51

Add special rate limiter for namespace replication inducing APIs

ade0c9b

Prevent tests from hitting the rate limit

b4438b1

Address review comments

5658607

bergundy force-pushed the namespace-replication-inducing-rate-limiter branch from 00aa856 to 5658607 Compare June 8, 2023 18:59

wxing1292 reviewed Jun 9, 2023

View reviewed changes

bergundy commented Jun 9, 2023

View reviewed changes

service/frontend/service.go Outdated Show resolved Hide resolved

bergundy commented Jun 9, 2023

View reviewed changes

service/frontend/service.go Outdated Show resolved Hide resolved

bergundy added 5 commits June 9, 2023 17:16

Apply suggestions from code review

d8c8213

Revert "Address review comments"

bbd6781

This reverts commit 5658607.

More comments

991cf1a

Merge remote-tracking branch 'origin/master' into namespace-replicati…

42b8b1d

…on-inducing-rate-limiter

Fix lint

2af1af1

yiminc reviewed Jun 10, 2023

View reviewed changes

yiminc approved these changes Jun 10, 2023

View reviewed changes

bergundy merged commit d44aa1e into temporalio:master Jun 10, 2023

bergundy deleted the namespace-replication-inducing-rate-limiter branch June 10, 2023 03:36

mindaugasrukas pushed a commit that referenced this pull request Jun 10, 2023

Add special rate limiter for namespace replication inducing APIs (#4455)

5341dfc

deepakkarki pushed a commit that referenced this pull request Jun 14, 2023

Add special rate limiter for namespace replication inducing APIs (#4455)

eebed31

bergundy mentioned this pull request Jun 16, 2023

Apply stricter rate limiting for the RegisterNamespace API #4513

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add special rate limiter for namespace replication inducing APIs #4455

Add special rate limiter for namespace replication inducing APIs #4455

bergundy commented Jun 7, 2023

dnr Jun 8, 2023

bergundy Jun 8, 2023

dnr Jun 9, 2023

bergundy Jun 9, 2023

dnr Jun 8, 2023

bergundy Jun 8, 2023

dnr Jun 9, 2023

dnr Jun 8, 2023

bergundy Jun 8, 2023

dnr Jun 9, 2023

dnr Jun 8, 2023

bergundy Jun 8, 2023

dnr Jun 9, 2023

bergundy Jun 9, 2023

wxing1292 left a comment

yux0 commented Jun 9, 2023

bergundy commented Jun 9, 2023

yux0 commented Jun 9, 2023

yiminc Jun 10, 2023

		// guessedSetId := hashBuildId(buildId)
		// return guessedSetId, nil

Add special rate limiter for namespace replication inducing APIs #4455

Add special rate limiter for namespace replication inducing APIs #4455

Conversation

bergundy commented Jun 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wxing1292 left a comment

Choose a reason for hiding this comment

yux0 commented Jun 9, 2023

bergundy commented Jun 9, 2023

yux0 commented Jun 9, 2023

Choose a reason for hiding this comment