Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global ingestion rate limiter to distributors #1766

Merged

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Oct 29, 2019

Following up this design doc, in this PR I'm proposing to introduce a global ingestion rate limiter implemented as a local rate limiter configured with limit / N, where N is the actual number of distributors.

Fixes #1090

Notes about the distributors ring

We currently don't have a ring of distributors, so I had to introduce it.

For this purpose, in this PR I'm proposing to introduce a generic simple service discovery based on the KVStore, instead of the ring itself. It's an opinionated change. If there's no consensus, I can rollback to reuse the ring implementation for distributors too.

From my perspective, a generic lightweight internal service discovery/registry would be just easier to plug compared to the current ring, because of:

  • No need to have a real ring (ie. no need for tokens)
  • No need to deal with IngesterState
  • No need to have FlushTransferer

Notes about alternatives

  • I've excluded to use Kubernetes API to figure out the current number of distributors replica, to not introduce Kubernetes as an hard dependency.

@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from 6da7da1 to 3b3dbb3 Compare October 29, 2019 17:03
@gouthamve
Copy link
Contributor

gouthamve commented Oct 30, 2019

For this purpose, in this PR I'm proposing to introduce a generic simple service discovery based on the KVStore, instead of the ring itself. It's an opinionated change. If there's no consensus, I can rollback to reuse the ring implementation for distributors too.

I don't want to maintain another way of doing service discovery tbh. I'd would just use the current one again. There is no proof that the current method is limiting us in a anyway and my guess would be that people would be inclined to use the gossip backend for rate-limiting SD which should be even lesser IO.

If we find that the ring based implementation is too IO intensive / becoming a bottleneck, we can then add a new one.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. This is a straightforward extension of ring idea. I like that.

pkg/distributor/distributor.go Show resolved Hide resolved
pkg/util/servicediscovery/registry.proto Outdated Show resolved Hide resolved
pkg/util/servicediscovery/registry.proto Outdated Show resolved Hide resolved
@pstibrany
Copy link
Contributor

pstibrany commented Oct 30, 2019

I don't want to maintain another way of doing service discovery tbh. I'd would just use the current one again. There is no proof that the current method is limiting us in a anyway and my guess would be that people would be inclined to use the gossip backend for rate-limiting SD which should be even lesser IO.

This code is reusing existing KVStore, but is storing a different structure into it, under a different key. I think that makes sense, instead of making ring structure bigger. It's written in a generic way (I don't see a need for that, for now, but is harmless otherwise) and called "service discovery", but idea is the same as in the ring (without tokens).

@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch 3 times, most recently from a005da6 to 714be9f Compare October 31, 2019 09:29
@pracucci
Copy link
Contributor Author

@pstibrany and @gouthamve thanks for your comments. I've addressed the feedback:

  1. Removed the service discovery in favour of the ring
  2. The FlushTransferer in the ring is no more required (ie. distributors don't need it)
  3. I've generalized the ring.Lifecycler logs, removing any reference to "ingester" or "consul" (from the logs) because misleading (not necessarily a ring of ingesters, or backed by consul)
  4. I've masked the LifecyclerConfig behind a custom ring config for the distributors, so that we expose only the config options / CLI flags which make sense for the distributor (see distributor_ring.go)

May you do another round of review, please?

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Nice work. (I like fixed error messages 👍 )

@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch 2 times, most recently from 142945c to 15350c1 Compare November 1, 2019 11:44
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me :-)

pkg/ring/lifecycler.go Outdated Show resolved Hide resolved
@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch 2 times, most recently from ca48988 to 7b42491 Compare November 5, 2019 09:22
@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from 7b42491 to 0dd49d3 Compare November 11, 2019 17:42
@tomwilkie tomwilkie requested a review from gouthamve November 14, 2019 15:45
@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from 0dd49d3 to c8648ee Compare November 15, 2019 08:50
@tomwilkie
Copy link
Contributor

Ping @gouthamve?

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
One nit and a suggestion if you have a chance. It would be nice to reuse the NoopFlushTransferer in the ruler. Currently the ruler basically implements NoopFlushTransferer itself.

pkg/distributor/distributor.go Show resolved Hide resolved
}

// RegisterFlags adds the flags required to config this to the given FlagSet
func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) {
Copy link
Contributor

@jtlisi jtlisi Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar for this would be nice for the ruler. The main difference being the number of tokens. However, that can probably be figured out at another time. Maybe the ring package could use a refactor to make it more friendly to simpler use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I would suggest to keep ruler's refactoring as a separate PR, given it's not related to this work. I will add it to my backlog.

// be used in cases we don't need one
type NoopFlushTransferer struct{}

// NewNoopFlushTransferer makes a new NoopFlushTransferer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ruler has identical functionality in pkg/ruler/lifecycle.go it would be cool to delete that and reuse this instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ruler Flush() is not a noop. Currently the NoopFlushTransferer is a adopt all or nothing: how would you see reusing it in the ruler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I forgot the change to make the ruler flush has not been merged yet:
https://github.com/cortexproject/cortex/pull/1571/files#diff-95b5a43683ab83429e93fef5c5daf87fL26

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. We can use the NoopFlushTransferer in the ruler as soon as the PR 1571 will be merged.

@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from c8648ee to b018af5 Compare November 26, 2019 10:29
@owen-d
Copy link
Contributor

owen-d commented Nov 29, 2019

Nice work -- I like the ring re-use.

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but have some minor nits!

docs/arguments.md Outdated Show resolved Hide resolved
docs/arguments.md Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Show resolved Hide resolved
if d.cfg.LimiterReloadPeriod == 0 {
return
}
if !canJoinRing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to:

	if !canJoinRing {
		ingestionRateStrategy = newLocalIngestionRateStrategy(limits)
	} else if limits.IngestionRateStrategy() == validation.GlobalIngestionRateStrategy {
		distributorsRing, err = ring.NewLifecycler(cfg.DistributorRing.ToLifecyclerConfig(), nil, "distributor", ring.DistributorRingKey)
		if err != nil {
			return nil, err
		}

		distributorsRing.Start()

		ingestionRateStrategy = newGlobalIngestionRateStrategy(limits, distributorsRing)
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that it's never going to be infinite, maybe double the logic once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I think the current logic is safer and the intention is more clear (when it's an internal dependency, do not configure any rate limit at all).

CHANGELOG.md Outdated Show resolved Hide resolved
@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from 241aaf4 to f209bfc Compare December 10, 2019 14:26
@pracucci pracucci requested a review from gouthamve December 10, 2019 14:49
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
…ssarily a ring of ingesters backed by consul

Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci force-pushed the add-global-rate-limiter-to-distributors branch from bc5f1d0 to 738b5d7 Compare December 11, 2019 13:42
@gouthamve gouthamve merged commit d1529ff into cortexproject:master Dec 11, 2019
cfg.DistributorRing.HeartbeatPeriod = 100 * time.Millisecond
cfg.DistributorRing.InstanceID = strconv.Itoa(rand.Int())
cfg.DistributorRing.KVStore.Mock = kvStore
cfg.DistributorRing.InstanceInterfaceNames = []string{"eth0", "en0", "lo0"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestDistributor_PushIngestionRateLimiter fails in my local machine because none of these interfaces match :/ I will open a PR to get it from the system so that tests pass on all machines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate centralised rate-limiting system
7 participants