Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global ratelimiter helper: usage-tracking fallback-capable rate.Limiter #6028
Global ratelimiter helper: usage-tracking fallback-capable rate.Limiter #6028
Changes from all commits
8873565
6369bc2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, it's worth adding some metrics tracking when we fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'm planning on metrics. that'll be outside these objects though, and it's why
Collect()
returns the "use fallback" bool.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a transition from non-fallback to fallback or vice versa, the counter will be inaccurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you mean the limiting behavior will be inaccurate, e.g. it'll allow / deny more requests than it should:
yes, but we don't have any great way to resolve that.
*rate.Limiter
doesn't allow directly manipulating its tokens, and there isn't even a true "correct" behavior if this happens because it implies we don't know the correct behavior - we haven't been able to sync with other instances.it'll only be incorrect for one "time to fill the ratelimiter's burstable tokens" period though, which is currently 1 second (limit == burst in basically all of our uses). generally speaking that'd mean ~2x overage at worst, assuming all hosts switched at the same time.
I could try to empty the tokens when switching, which would be safer for us because it wouldn't allow exceeding either limit, but we don't have an efficient API for that - ideally it'd get tokens and try to e.g. binary-search
Allow(n)
to empty most or all of them, since it's racing with other calls. Right now it'd need around thousands of calls per limiter to drain, which is definitely excessive.The #6030 wrapper could do this relatively precisely and efficiently, as it can lock everything while draining. If we think 1 second of overage is worth preventing, I suspect it'd be relatively easy to add a "drain the limiter" API for this kind of purpose. I'm... ~50% "yes, worth it"? Can definitely be convinced to build that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr from IRL, from clarifying some of the surroundings here:
a layer above this, which holds all these individual tracking-and-fallback-able limiters, will be periodically calling
Collect()
and it'll emit a counter for "using fallback" so we can monitor how often fallbacks are used.but "uses fallback" should only occur in practice:
so the sustained existence of fallback-using things when we expect the global ratelimiter to be working is a concern, but not individual "failed update" events or temporary use.
the exact metrics emitted are going to be in a later diff, that all happens outside of this object.