tenantrate: switch to a single token bucket #62783

RaduBerinde · 2021-03-30T03:45:48Z

Tenant rate limiting currently uses four separate token buckets, each
with its own rate and burst limits. In this commit we switch to a
single shared token bucket (see #55114 for motivation).

We use a "cost model" to map read and write requests to "KV Compute
Units". Requests have a base cost plus a per-byte cost. The details
are documented in settings.go. The values were chosen based on
experiments ran by Nathan:
https://docs.google.com/spreadsheets/d/1PPlIcKnusOqWtBoOZVd9xBEMPe5Ss1FgrJlYFgpQZaM/edit#gid=735409177

The rate was chosen so that it maps to 20% of 1 KV vCPU. This keeps
the rate limit on small requests roughly the same as before
(especially on mixed workloads).

The largest departure from the previous limits is that we allow much
more read bytes (the per-byte cost of reads is small in the cost
model). If we were to keep close to the previous limits, the value of
kv.tenant_rate_limiter.read_cost_per_megabyte would be 200 instead of
10. Perhaps we want to be more conservative here and make this
value somewhere in-between?

Fixes #55114.

Release note: None

cockroach-teamcity · 2021-03-30T03:45:55Z

This change is

ajwerner

Now that it's back to being one dimension, I can see an argument to just bend the quotapool.RateLimiter to your will. I have had a hard time striking the right abstraction level for that library. It's currently got these very generic hooks plus simple reference implementations which are pretty powerful on their own. It probably ought to be split out into a few package.

A problem with that is you'd need to either early-bind the mapping to resources or add an AcquireFunc equivalent to the quotapool.IntPool to read some configuration and you'd need to add support for the non-blocking subtraction. I feel like there's probably a way to work it such that your library here gets control of its Request and the settings and the abstraction there owns the rate and burst.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @tbg)

pkg/kv/kvserver/tenantrate/limiter.go, line 99 at r1 (raw file):

Quoted 4 lines of code…

	// TODO(radu): find a way to omit these atomic operations in the case when we
	// don't have to wait.
	rl.metrics.currentBlocked.Inc(1)
	defer rl.metrics.currentBlocked.Dec(1)

Feel free to add the hooks to the quotapool library. We have an OnAcquisition hook that gets called regardless of whether or not you hit the fast path. It seems totally reasonable to add hooks for when you don't get the fast path and then when you finish waiting. The setup would go in config.go. The calls would be here:

cockroach/pkg/util/quotapool/quotapool.go

Line 234 in aec84ca

pkg/kv/kvserver/tenantrate/limiter.go, line 206 at r1 (raw file):

Quoted 5 lines of code…

// TODO(ajwerner): It seems possible that when adding or reducing the burst
// values that we might want to remove those values from the token bucket.
// It's not obvious that we want to add tokens when increasing the burst as
// that might lead to a big spike in load immediately upon increasing this
// limit.

The code very much does this in the quotapool rate limter and I think it's worthwhile:

cockroach/pkg/util/quotapool/int_rate.go

Lines 122 to 127 in aec84ca

    
           shouldNotify = r.burst < v.burst || r.rate < v.rate 
        
           burstDelta := v.burst - r.burst 
        
           r.limitConfig = *v 
        
           r.cur += float64(burstDelta) 
        
           r.update(r.p.qp.TimeSource().Now()) 
        
           return shouldNotify

tbg · 2021-03-30T08:43:50Z

Can we take a step back and go through what the plan here is? When I filed that issue I still had the distributed token bucket and "protecting KV" mixed up in my head. The KV-side rate limiter is strictly for "protecting KV", right? As such, is there any reason to bake compute units into it? Won't this per-tenant limiter go away and be replaced by a per-node limiter and would we then limit on "request units" on a node level which doesn't necessarily seem that useful? Or are we going to keep the per-tenant limiters around on top of that and simply set much higher limits than today (i.e. allow a tenant to exhaust, say, 30% of the node assuming it has capacity)?

I don't think there is a single place that really spells out what the plan is (right?), which worries me.

andy-kimball

Actually, there's a 3rd concern: limit each tenant to some % of the KV machine. We never want to allow any single tenant to monopolize KV resources, even if there are no other tenants running on that machine. The reason is that this will lead to severe "noisy neighbor" syndrome, where tenants will see amazing performance when they happen to be the only ones running on a KV node, followed by sharp dips in performance if just 1 or 2 other tenants begins running. If instead we limit each tenant to a maximum of say 10% of a KV node (and I expect these nodes to be big), then adding more than 10 tenants leads to a much more incremental drop in performance.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @RaduBerinde, and @tbg)

RaduBerinde

We still want to limit the amount of resources used by any one free tier user, regardless of whether the KV nodes are being overloaded or not. We will add a separate mechanism for limiting per-KV-node load, and this tenantrate mechanism will evolve to perform distributed rate-limiting (where the limit will apply per cluster, not per KV node).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @RaduBerinde, and @tbg)

RaduBerinde

Now that it's back to being one dimension, I can see an argument to just bend the quotapool.RateLimiter to your will [...]

Thanks, I will take a look - this is my first attempt at working with this code. Some things did feel awkward to me, especially the plumbing around readBytesResource when all we really need is to modify the tenantBucket under the quotapool's lock (but I didn't have any epiphany on how to make it simpler).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @RaduBerinde, and @tbg)

tbg · 2021-03-31T09:21:53Z

We still want to limit the amount of resources used by any one free tier user, regardless of whether the KV nodes are being overloaded or not. We will add a separate mechanism for limiting per-KV-node load, and this tenantrate mechanism will evolve to perform distributed rate-limiting (where the limit will apply per cluster, not per KV node).

Thanks for explaining Andy and Radu. And the argument for using resource units is what? The motivation in #55114 stemmed from the fact that I thought these would be surfaced externally, but isn't the per-tenant limiter at KV internal? Does it make sense to compile these constants we've chosen into CRDB, where we can't change them on a per-tenant basis?

RaduBerinde · 2021-03-31T12:56:12Z

We want the limits to be proportional to the actual resource usage (and thus how much we pay for the KV nodes), as much as possible. Under the current model, a mixed workload can get twice as many resources than a read-heavy or a write-heavy workload, which doesn't make sense. It is also a big advantage to use a single token bucket because the next step will be implementing it in a distributed fashion and I don't want 4x the complication and overhead.

Per-tenant tunability is perpendicular, we don't have that today but we can easily add it with either approach.

I would also turn it around on you and ask why you think the four token buckets are better?

tbg · 2021-03-31T12:59:51Z

Oh, sorry for the noise - your response finally helped me page in that the salient difference in this change is allowing reads/writes/etc to be converted against each other. Sorry, that was an obvious miss. Sorry about the noise, this makes sense.

I would also turn it around on you and ask why you think the four token buckets are better?

Reframing with my renewed understanding, I was concerned about the hard-coding of the conversion constants between the various dimensions now drawing from a single bucket. You could imagine that there are workloads where for a given request quota throughput, it's too slow on writes but too fast on reads, etc.
But that seems like a premature concern that we can address when it arises, by making the conversion rates definable on a per-tenant basis.

RaduBerinde · 2021-03-31T13:06:00Z

But that seems like a premature concern that we can address when it arises, by making the conversion rates definable on a per-tenant basis.

Thanks! Yeah. We will need some kind of per-tenant control in any case - we will need to differentiate between free tier users and serverless users who haven't exceeded their budget.

RaduBerinde

Made some updates. Please ignore the first commit, it's from a separate PR that I've been trying to merge all day.

It should be interesting to look at the diff in estimate_iops, it will show how various workload profiles would perform now vs before.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/tenantrate/limiter.go, line 206 at r1 (raw file):

Previously, ajwerner wrote…

// TODO(ajwerner): It seems possible that when adding or reducing the burst
// values that we might want to remove those values from the token bucket.
// It's not obvious that we want to add tokens when increasing the burst as
// that might lead to a big spike in load immediately upon increasing this
// limit.
The code very much does this in the quotapool rate limter and I think it's worthwhile:

cockroach/pkg/util/quotapool/int_rate.go

Lines 122 to 127 in aec84ca

shouldNotify = r.burst < v.burst || r.rate < v.rate

burstDelta := v.burst - r.burst

r.limitConfig = *v

r.cur += float64(burstDelta)

r.update(r.p.qp.TimeSource().Now())

return shouldNotify

Done.

andy-kimball · 2021-03-31T23:29:59Z

One thing this discussion highlights: do we need the allocator to try and spread out ranges for a given tenant across as many nodes as possible? Otherwise, if a tenant's ranges all happen to be on the same node, they will hit their per-tenant throttling limit earlier than if the pages are more spread out. And the activity from that one tenant may not be enough to trigger the allocator to move pages around either, so it will just stay in a permanent state of throttling.

RaduBerinde · 2021-04-01T00:00:42Z

That will not matter once we have the distributed rate limiting, correct? It sounds like a good idea to me in general though.. spreading things as much as possible reduces the magnitude of the noisy neighbor problem on any one node.

andy-kimball · 2021-04-01T00:05:08Z

I think it will still be a problem even with distributed rate limiting. The per-tenant KV rate limit would throttle a given tenant's pods on a particular machine, even if the distributed rate limiter has plenty of available capacity for that tenant.

RaduBerinde · 2021-04-01T00:10:27Z

Hm, in my mind we would implement only distributed per-tenant rate limiting, where we control the total rate across all KV nodes. There wouldn't also be a per-node per-tenant rate limit.

We would have a separate per-node aggregate tenant limit to avoid overload (and for the purpose of avoiding overload, I agree that spreading out would help).

andy-kimball · 2021-04-01T00:13:43Z

I see. My worry with that is that then tenants would see a different problem. If they happened to be the only tenant running on a KV node, they'd get fabulous throughput. Then if even one other tenant starts running on the same KV node, the 1st tenant would see their throughput cut in half. So there could be fairly severe "noisy neighbor" effects. I was assuming the per-node limit would be per-tenant.

RaduBerinde · 2021-04-01T00:36:08Z

A tenant would never do more than the distributed per-tenant rate limit, which I assume won't be too fabulous (probably a very small multiple of the current per-node limit, right?).

The per-node aggregate limit would be much higher than what a few tenants would be able to push through (the goal would be for this limit to rarely come into play - we would try to provision the cluster so that KV nodes don't routinely get overloaded).

andy-kimball · 2021-04-01T01:04:46Z

I'm thinking about paid tenants that are not rate limited, because they're paying for whatever they use. They want to get as much performance as possible, but with a reasonable amount of predictability.

RaduBerinde · 2021-04-01T01:24:06Z

Ah, I see. I suspect we may still want some limiting there too (at a much higher rate). But in any case, noisy paid neighbors will be problematic.

sumeerbhola

It's not completely clear to me how the code artifacts here will extend into a distributed rate limiting setting. Do you expect to reuse limiter, tokenBucket etc. in a distributed setting by having a local "controller" fill the tokens based on what it has pre-allocated from a global coordinator?

If instead we limit each tenant to a maximum of say 10% of a KV node ...
do we need the allocator to try and spread out ranges for a given tenant across as many nodes as possible? Otherwise, if a tenant's ranges all happen to be on the same node, they will hit their per-tenant throttling limit earlier than if the pages are more spread out.

Regarding the noisy neighbor and per-tenant per-node limits etc. discussion above, there are some interesting points being made. I've typically not seen systems actively spreading a tenant's ranges across more nodes even if there is no need to based on current load -- one reason being that isolation failures do happen (and even worse, sometimes there are queries of death). Actively, spreading each tenant around increases the blast radius. The corollary being there aren't any limits on what a tenant can use of the resources in a node (or maybe something like 75%), and if another tenant wakes up and starts to consume more resources, there is a performance decline until rebalancing kicks in. It would be interesting to think through this and make improvements once we can observe real serverless clusters with significant tenant activity -- do our current serverless cluster(s) have enough utilization to make it worthwhile to look at them?

Reviewed 2 of 10 files at r1, 4 of 11 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @RaduBerinde, and @tbg)

pkg/kv/kvserver/tenantrate/limiter.go, line 136 at r2 (raw file):

// updateConfig is used by the factory to inform the limiter of a new
// configuration.

is this a new configuration, or a delta change to the configuration? It looks like a delta, based on the Add in the implementation.

pkg/kv/kvserver/tenantrate/limiter.go, line 225 at r2 (raw file):

}
}

This comment is not specific to this PR, but there seem to be many interfaces, structs and layers of abstraction in the rate limiting code, which makes it hard for me to wrap my head around the code. There is even interface{} in the method below, accompanied with a type switch.
Is there a high-level summary describing and motivating the need for all these code abstractions, for what to me naively boils down to what looks like a straightforward token bucket mechanism.

RaduBerinde

It's not completely clear to me how the code artifacts here will extend into a distributed rate limiting setting. Do you expect to reuse limiter, tokenBucket etc. in a distributed setting by having a local "controller" fill the tokens based on what it has pre-allocated from a global coordinator?

I have some ideas that I hope to put in an RFC soon. I haven't yet thought far enough to know which parts of the code I'll reuse.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @RaduBerinde, @sumeerbhola, and @tbg)

pkg/kv/kvserver/tenantrate/limiter.go, line 136 at r2 (raw file):

Previously, sumeerbhola wrote…

is this a new configuration, or a delta change to the configuration? It looks like a delta, based on the Add in the implementation.

It's a pretty non-intuitive API, Add just means that we will get a call to Merge with the config argument.

RaduBerinde · 2021-04-06T23:14:52Z

I would like to close on this soon.

My main hesitation is with the significantly higher read bytes allowance. In the spreadsheet @andy-kimball points out that the numbers may be skewed if the test dataset fits in memory. Should we set a more conservative value (in-between the current value and the one suggested by the experiments)?

ajwerner

I'm fine with the logic here. I worry the read unit is too high.

From a higher level, I'm worried we're missing a library at the right level of abstraction. The logic here is so similar to the quotapool.RateLimiter. The main difference is just where the synchronization happens on the configuration and the ability to suck quota out after the reads. It feels like if you just added a simple method to deal with the reads and then you did the conversion before queuing rather than after, then you could avoid all of this mucking with the bad quotapool.QuotaPool API.

The only reason I can think to not do the above suggestion is because of an eye towards future changes, but I don't currently see any clear vision in that direction.

Reviewed 2 of 11 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @RaduBerinde, @sumeerbhola, and @tbg)

pkg/kv/kvserver/tenantrate/limiter.go, line 136 at r2 (raw file):

Previously, RaduBerinde wrote…

It's a pretty non-intuitive API, Add just means that we will get a call to Merge with the config argument.

I really struggled with how to frame this API. I agree it's extremely non-intuitive. A wanted to make a generic data structure that clients could leverage but really struggled to make something good. Add is a low-level primitive to muck with the state (Resource) that the QuotaPool is mediating access to on behalf of Requests. In the first pass of all of this is was just about providing an api to put resource back into the pool. In that world it was not so bad. Over time it got worse. I feel like the abstraction being offerred here is high leverage but I don't quite know how to package up the concepts properly.

pkg/kv/kvserver/tenantrate/limiter.go, line 203 at r2 (raw file):

	}
	delta := needed - tb.tokens
	tryAgainAfter = time.Duration((delta * float64(time.Second)) / tb.config.Rate)

There are cases where the rates are really high where having a fallback can matter. May as well productionize this code while we're here. Could see adding a test case.

cockroach/pkg/util/quotapool/int_rate.go

Lines 193 to 200 in c8331c0

    
           // Compute the time it will take for r.cur to get to the needed capacity. 
        
           timeDelta := time.Duration((delta * float64(time.Second)) / float64(r.rate)) 
        
           // Deal with the exceedingly edge case that timeDelta, as a floating point 
        
           // number, is less than 1ns by returning 1ns and looping back around. 
        
           if timeDelta == 0 { 
        
           	timeDelta++ 
        
           }

pkg/kv/kvserver/tenantrate/limiter.go, line 225 at r2 (raw file):

Previously, sumeerbhola wrote…

This comment is not specific to this PR, but there seem to be many interfaces, structs and layers of abstraction in the rate limiting code, which makes it hard for me to wrap my head around the code. There is even interface{} in the method below, accompanied with a type switch.
Is there a high-level summary describing and motivating the need for all these code abstractions, for what to me naively boils down to what looks like a straightforward token bucket mechanism.

I can take this.

The basic idea is that a quotapool.QuotaPool should be used to implement some form of queue+limit logic. It is not intended for direct consumption. The QuotaPool protects some Resource which is handed out to Requests. The Request implementation gets handed the Resource and can modify it and report whether it is satisfied. In order to add rate-limiting functionality, the request can also indicate when it should get notified to try again. The other way the Resource can be modified is with the Add() method which passes an arbitrary object to the Resource instance under the QuotaPool's mutex, and decides whether now the head of the queue should be notified to try again. It's all somewhat abstract.

This code sort of grew, as code does. This code is used to implement both an integer-resource quotapool as well as a token bucket rate-limiter. Any thoughts on API improvements would be appreciated. The API surface area of quotapool.IntPool and quotepool.RateLimiter are both sane but the lower-level quotapool.QuotaPool is an opaque and non-intuitive. Also, the interfaces push towards a sync.Pool to avoid allocation which is a bummer but seemingly unavoidable in go when one wants both abstraction and performance. I have hopes that generics may make expressing some of these concepts better but I don't have a concrete vision.

The entry point for the code is likely

cockroach/pkg/util/quotapool/quotapool.go

Lines 32 to 69 in c8331c0

    
           // Resource is an interface that represents a quantity which is being 
        
           // pooled and allocated. It is any quantity that can be subdivided and 
        
           // combined. 
        
           // 
        
           // This library does not provide any concrete implementations of Resource but 
        
           // internally the *IntAlloc is used as a resource. 
        
           type Resource interface { 
        
           	// Merge combines val into the current resource. 
        
           	// After val is passed to Merge, the QuotaPool will never use 
        
           	// that Resource again. This behavior allows clients to pool instances of 
        
           	// Resources by creating Resource during Acquisition and destroying them in 
        
           	// Merge. 
        
           	Merge(val interface{}) (shouldNotify bool) 
        
           } 
        
           // Request is an interface used to acquire quota from the pool. 
        
           // Request is responsible for subdividing a resource into the portion which is 
        
           // retained when the Request is fulfilled and the remainder. 
        
           type Request interface { 
        
           	// Acquire decides whether a Request can be fulfilled by a given quantity of 
        
           	// Resource. 
        
           	// 
        
           	// If it is not fulfilled it must not modify or retain the passed alloc. 
        
           	// If it is fulfilled, it should modify the Resource value accordingly. 
        
           	// 
        
           	// If tryAgainAfter is positive, acquisition will be attempted again after 
        
           	// the specified duration. This is critical for the implementation of 
        
           	// rate limiters on top of this package. 
        
           	Acquire(context.Context, Resource) (fulfilled bool, tryAgainAfter time.Duration) 
        
           	// ShouldWait indicates whether this request should be queued. If this method 
        
           	// returns false and there is insufficient capacity in the pool when the 
        
           	// request is queued then ErrNotEnoughQuota will be returned from calls to 
        
           	// Acquire. 
        
           	ShouldWait() bool 
        
           }

pkg/kv/kvserver/tenantrate/settings.go, line 39 at r2 (raw file):

	Burst float64

	// ReadRequestUnits is the baseline cost of a read, in KV Compute Units.

This comment could be more informative. The choice of value here tends to not be all that important but the fact that there is a value is very important. The explanation of the "courtesy" unit deserves to be somewhere and this feels like the spot.

pkg/kv/kvserver/tenantrate/settings.go, line 78 at r2 (raw file):

		"kv.tenant_rate_limiter.read_request_cost",
		"base cost of a read request in KV Compute Units",
		0.7,

This feels high.

sumeerbhola

The logic here is so similar to the quotapool.RateLimiter.

I didn't notice earlier that int_rate.go had its own token bucket implementation. It would indeed be nice if we could reduce duplication.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @RaduBerinde)

pkg/kv/kvserver/tenantrate/limiter.go, line 136 at r2 (raw file):

Previously, ajwerner wrote…

I really struggled with how to frame this API. I agree it's extremely non-intuitive. A wanted to make a generic data structure that clients could leverage but really struggled to make something good. Add is a low-level primitive to muck with the state (Resource) that the QuotaPool is mediating access to on behalf of Requests. In the first pass of all of this is was just about providing an api to put resource back into the pool. In that world it was not so bad. Over time it got worse. I feel like the abstraction being offerred here is high leverage but I don't quite know how to package up the concepts properly.

I think this is what tripped me up when trying to understand these abstractions, which your other explanation has helped with.
The abstraction is not really for abstracting the details of Request and Resource from each other since they know the specifics of the other. But their interaction is mediated by the QuotaPool, which is handling synchronization and queueing, which causes the type assertions (like waitRequest asserts that it is interacting with tokenBuckets).
The commentary around Resource is also quite confusing e.g. "It is any quantity that can be subdivided and combined", which is not true for the token buckets. There is also "After val is passed to Merge, the QuotaPool will never use that Resource again. This behavior allows clients to pool instances of Resources by creating Resource during Acquisition and destroying them in Merge." This doesn't seem right, though maybe it refers to the val parameter for intAlloc which is also an intAlloc.

I can see why generics would improve the situation, but the Merge method would still be clunky -- it is trying to do many things: config changes, unilateral resource acquisition, returning of resources for non-token-bucket. Perhaps separating them into different methods would aid understanding, though I'm not so sure that is needed, especially given your other comment about QuotaPool being for implementation sharing and not for "direct consumption".

I don't really have any radical ideas for restructuring, but code comment improvements would probably help the readability.

pkg/kv/kvserver/tenantrate/limiter.go, line 225 at r2 (raw file):

Previously, ajwerner wrote…

I can take this.

The basic idea is that a quotapool.QuotaPool should be used to implement some form of queue+limit logic. It is not intended for direct consumption. The QuotaPool protects some Resource which is handed out to Requests. The Request implementation gets handed the Resource and can modify it and report whether it is satisfied. In order to add rate-limiting functionality, the request can also indicate when it should get notified to try again. The other way the Resource can be modified is with the Add() method which passes an arbitrary object to the Resource instance under the QuotaPool's mutex, and decides whether now the head of the queue should be notified to try again. It's all somewhat abstract.

This code sort of grew, as code does. This code is used to implement both an integer-resource quotapool as well as a token bucket rate-limiter. Any thoughts on API improvements would be appreciated. The API surface area of quotapool.IntPool and quotepool.RateLimiter are both sane but the lower-level quotapool.QuotaPool is an opaque and non-intuitive. Also, the interfaces push towards a sync.Pool to avoid allocation which is a bummer but seemingly unavoidable in go when one wants both abstraction and performance. I have hopes that generics may make expressing some of these concepts better but I don't have a concrete vision.

The entry point for the code is likely

cockroach/pkg/util/quotapool/quotapool.go

Lines 32 to 69 in c8331c0

// Resource is an interface that represents a quantity which is being

// pooled and allocated. It is any quantity that can be subdivided and

// combined.

//

// This library does not provide any concrete implementations of Resource but

// internally the *IntAlloc is used as a resource.

type Resource interface {

// Merge combines val into the current resource.

// After val is passed to Merge, the QuotaPool will never use

// that Resource again. This behavior allows clients to pool instances of

// Resources by creating Resource during Acquisition and destroying them in

// Merge.

Merge(val interface{}) (shouldNotify bool)

}

// Request is an interface used to acquire quota from the pool.

// Request is responsible for subdividing a resource into the portion which is

// retained when the Request is fulfilled and the remainder.

type Request interface {

// Acquire decides whether a Request can be fulfilled by a given quantity of

// Resource.

//

// If it is not fulfilled it must not modify or retain the passed alloc.

// If it is fulfilled, it should modify the Resource value accordingly.

//

// If tryAgainAfter is positive, acquisition will be attempted again after

// the specified duration. This is critical for the implementation of

// rate limiters on top of this package.

Acquire(context.Context, Resource) (fulfilled bool, tryAgainAfter time.Duration)

// ShouldWait indicates whether this request should be queued. If this method

// returns false and there is insufficient capacity in the pool when the

// request is queued then ErrNotEnoughQuota will be returned from calls to

// Acquire.

ShouldWait() bool

}

Thanks, this is very helpful.

RaduBerinde

I will try to modify RateLimit. I was trying to limit the blast area of my change, given that we want to backport this. I will need to modify the burst to a float64 (not sure why it is int when the rate is float?) and add some more methods.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @RaduBerinde)

pkg/kv/kvserver/tenantrate/settings.go, line 39 at r2 (raw file):

Previously, ajwerner wrote…

This comment could be more informative. The choice of value here tends to not be all that important but the fact that there is a value is very important. The explanation of the "courtesy" unit deserves to be somewhere and this feels like the spot.

The units are all explained in the comment above, what is it missing?

pkg/kv/kvserver/tenantrate/settings.go, line 78 at r2 (raw file):

Previously, ajwerner wrote…

This feels high.

Can you expand? What would you suggest and why?

If I was to stick close to the previous code, I would have it at 1.0. I lowered to 0.7 which is the value derived from experiments (column B6 times 1000 https://docs.google.com/spreadsheets/d/1PPlIcKnusOqWtBoOZVd9xBEMPe5Ss1FgrJlYFgpQZaM/edit#gid=735409177).

ajwerner

Ack on the rate cost of a read request. I wasn't thinking about it properly. I was caught up thinking that most of the cost we wanted to calculate was on the data read than on the request itself and that this quantity was closer to the old "courtesy" byte concept that was here previously.

The only other user of the rate limiter is pretty straightforward. The units are integers because this code replaced some usages of https://pkg.go.dev/golang.org/x/time/rate which only allows integer quantities. I don't think floating point is going to be a problem.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @RaduBerinde, and @sumeerbhola)

pkg/kv/kvserver/tenantrate/limiter.go, line 136 at r2 (raw file):
It seems that a lot of the commentary rotted. Especially the commentary on the `Merge. The token-bucket functionality was added later. I needed a token bucket and wanted some other properties I already had here and realized that it was a rather minor code change. I'm going to take a stab at fixing up the commentary.

I can see why generics would improve the situation, but the Merge method would still be clunky -- it is trying to do many things: config changes, unilateral resource acquisition, returning of resources for non-token-bucket.

Ack, yeah, really it should be the structure doing the updating to the resource rather than the other way around. I made #63227 to try to clean that up.

For what it's worth, I can see an argument against doing anything generic here. I found it interesting and valuable when hacking around with different policies for queuing. It may be under-motivated. If we get rid of this usage of the abstract pool then there won't be any users and perhaps we should just unexport it. However, I did use it to good effect in this Friday commit to block instead of explode when failing to allocate memory in a long-running changefeed: #63050. In that case the resource was a memory monitor which was dynamically growing and shrinking as opposed to some sort of constant.

RaduBerinde · 2021-04-08T01:00:35Z

@ajwerner I'm working on using RateLimiter. How do you recommend handling the config? When admitting a request, I need to use the parameters in the Config to calculate the amount. I don't want to take another lock (or use atomics) just for that..

ajwerner · 2021-04-08T02:43:17Z

Huh, good question. I was thinking a RWMutex or an atomic on a pointer to a config. If you feel that just using the abstract pool makes it more straightforward and is less convoluted with the refactor, I’m fine with that. It also, as you noted, makes the floating point requirement a less invasive backport.

…

On Wed, Apr 7, 2021 at 9:00 PM RaduBerinde ***@***.***> wrote: @ajwerner <https://github.com/ajwerner> I'm working on using RateLimiter. How do you recommend handling the config? When admitting a request, I need to use the parameters in the Config to calculate the amount. I don't want to take another lock (or use atomics) just for that.. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62783 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOBBAQJBY6VZCH7M7IZCQDTHT54FANCNFSM42BETM3A> .

RaduBerinde · 2021-04-08T02:53:13Z

I'll tinker more with it. It's unfortunate, it was much cleaner to just use the RateLimiter.

RaduBerinde · 2021-04-08T05:23:45Z

Ok, so I think the main problem was not the use of the abstract pool per se but the duplication of all the token bucket accounting logic. So I extracted that into a separate TokenBucket type and reused it.

ajwerner

needs a make bazel-generate.

Reviewed 1 of 19 files at r3, 2 of 16 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @RaduBerinde, and @sumeerbhola)

This commit extracts the token bucket accounting logic into a separate, public type. The same code will be used by the tenantrate limiter. Release note: None

Tenant rate limiting currently uses four separate token buckets, each with its own rate and burst limits. In this commit we switch to a single shared token bucket (see cockroachdb#55114 for motivation). We use a "cost model" to map read and write requests to "KV Compute Units". Requests have a base cost plus a per-byte cost. The details are documented in settings.go. The values were chosen based on experiments ran by Nathan: https://docs.google.com/spreadsheets/d/1PPlIcKnusOqWtBoOZVd9xBEMPe5Ss1FgrJlYFgpQZaM/edit#gid=735409177 The rate was chosen so that it maps to 20% of 1 KV vCPU. This keeps the rate limit on small requests roughly the same as before (especially on mixed workloads). The largest departure from the previous limits is that we allow much more read bytes (the per-byte cost of reads is small in the cost model). If we were to keep close to the previous limits, the value of kv.tenant_rate_limiter.read_cost_per_megabyte would be 200 instead of 10. Perhaps we want to be more conservative here and make this value somewhere in-between? Fixes cockroachdb#55114. Release note: None

sumeerbhola

Reviewed 3 of 19 files at r3, 4 of 16 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @nvanbenschoten, and @sumeerbhola)

RaduBerinde · 2021-04-14T02:44:15Z

bors r+

RaduBerinde · 2021-04-14T02:44:22Z

TFTRs!

craig · 2021-04-14T03:51:41Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde requested review from ajwerner, tbg, nvanbenschoten and andy-kimball March 30, 2021 03:45

ajwerner reviewed Mar 30, 2021

View reviewed changes

andy-kimball reviewed Mar 30, 2021

View reviewed changes

RaduBerinde commented Mar 30, 2021

View reviewed changes

RaduBerinde force-pushed the tenantrate-singlebucket branch 2 times, most recently from 566ec93 to 5a432af Compare March 31, 2021 22:19

RaduBerinde commented Mar 31, 2021

View reviewed changes

RaduBerinde force-pushed the tenantrate-singlebucket branch from 5a432af to 1edc55e Compare April 1, 2021 00:00

RaduBerinde force-pushed the tenantrate-singlebucket branch from 1edc55e to 67a3ea5 Compare April 1, 2021 15:05

RaduBerinde requested a review from sumeerbhola April 4, 2021 23:05

sumeerbhola reviewed Apr 6, 2021

View reviewed changes

RaduBerinde commented Apr 6, 2021

View reviewed changes

ajwerner reviewed Apr 7, 2021

View reviewed changes

tbg removed their request for review April 7, 2021 09:36

sumeerbhola reviewed Apr 7, 2021

View reviewed changes

RaduBerinde commented Apr 7, 2021

View reviewed changes

ajwerner reviewed Apr 7, 2021

View reviewed changes

RaduBerinde force-pushed the tenantrate-singlebucket branch 2 times, most recently from cd32669 to 17bb6f7 Compare April 8, 2021 05:22

ajwerner approved these changes Apr 8, 2021

View reviewed changes

RaduBerinde added 2 commits April 8, 2021 11:45

quotapool: extract token bucket accounting

6ee2874

This commit extracts the token bucket accounting logic into a separate, public type. The same code will be used by the tenantrate limiter. Release note: None

RaduBerinde mentioned this pull request Apr 8, 2021

util/quotapool: clean up some commentary, invert Update logic #63227

Merged

RaduBerinde force-pushed the tenantrate-singlebucket branch from 17bb6f7 to 5c560bd Compare April 8, 2021 15:56

RaduBerinde added the backport-21.1.x label Apr 8, 2021

sumeerbhola reviewed Apr 9, 2021

View reviewed changes

craig bot merged commit 3687e56 into cockroachdb:master Apr 14, 2021

RaduBerinde deleted the tenantrate-singlebucket branch April 19, 2021 18:38

rafiss added this to the 21.1 milestone Apr 22, 2021

RaduBerinde mentioned this pull request May 17, 2021

release-21.1: tenantrate: switch to a single token bucket #65347

Merged

	shouldNotify = r.burst < v.burst \|\| r.rate < v.rate
	burstDelta := v.burst - r.burst
	r.limitConfig = *v
	r.cur += float64(burstDelta)
	r.update(r.p.qp.TimeSource().Now())
	return shouldNotify

	// Compute the time it will take for r.cur to get to the needed capacity.
	timeDelta := time.Duration((delta * float64(time.Second)) / float64(r.rate))

	// Deal with the exceedingly edge case that timeDelta, as a floating point
	// number, is less than 1ns by returning 1ns and looping back around.
	if timeDelta == 0 {
	timeDelta++
	}

	// Resource is an interface that represents a quantity which is being
	// pooled and allocated. It is any quantity that can be subdivided and
	// combined.
	//
	// This library does not provide any concrete implementations of Resource but
	// internally the *IntAlloc is used as a resource.
	type Resource interface {

	// Merge combines val into the current resource.
	// After val is passed to Merge, the QuotaPool will never use
	// that Resource again. This behavior allows clients to pool instances of
	// Resources by creating Resource during Acquisition and destroying them in
	// Merge.
	Merge(val interface{}) (shouldNotify bool)
	}

	// Request is an interface used to acquire quota from the pool.
	// Request is responsible for subdividing a resource into the portion which is
	// retained when the Request is fulfilled and the remainder.
	type Request interface {

	// Acquire decides whether a Request can be fulfilled by a given quantity of
	// Resource.
	//
	// If it is not fulfilled it must not modify or retain the passed alloc.
	// If it is fulfilled, it should modify the Resource value accordingly.
	//
	// If tryAgainAfter is positive, acquisition will be attempted again after
	// the specified duration. This is critical for the implementation of
	// rate limiters on top of this package.
	Acquire(context.Context, Resource) (fulfilled bool, tryAgainAfter time.Duration)

	// ShouldWait indicates whether this request should be queued. If this method
	// returns false and there is insufficient capacity in the pool when the
	// request is queued then ErrNotEnoughQuota will be returned from calls to
	// Acquire.
	ShouldWait() bool
	}

tenantrate: switch to a single token bucket #62783

tenantrate: switch to a single token bucket #62783

Conversation

RaduBerinde commented Mar 30, 2021

cockroach-teamcity commented Mar 30, 2021

ajwerner left a comment

Choose a reason for hiding this comment

tbg commented Mar 30, 2021

andy-kimball left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

tbg commented Mar 31, 2021

RaduBerinde commented Mar 31, 2021

tbg commented Mar 31, 2021

RaduBerinde commented Mar 31, 2021

RaduBerinde left a comment

Choose a reason for hiding this comment

andy-kimball commented Mar 31, 2021

RaduBerinde commented Apr 1, 2021

andy-kimball commented Apr 1, 2021

RaduBerinde commented Apr 1, 2021

andy-kimball commented Apr 1, 2021

RaduBerinde commented Apr 1, 2021

andy-kimball commented Apr 1, 2021

RaduBerinde commented Apr 1, 2021

sumeerbhola left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde commented Apr 6, 2021

ajwerner left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

RaduBerinde commented Apr 8, 2021

ajwerner commented Apr 8, 2021 via email

RaduBerinde commented Apr 8, 2021

RaduBerinde commented Apr 8, 2021

ajwerner left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

RaduBerinde commented Apr 14, 2021

RaduBerinde commented Apr 14, 2021

craig bot commented Apr 14, 2021