Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ShuffleSharding on READONLY ingesters #6517

Merged

Conversation

danielblando
Copy link
Contributor

@danielblando danielblando commented Jan 16, 2025

What this PR does:
We found an issue when a tenant has ingestion_tenant_shard_size lower than the number of ACTIVE ingesters or high number of READONLY ingesters when testing the new status of READONLY.

Eg:
Failed Push
Lets assume we have a ring with
10 ACTIVE ingesters
50 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20

The current subRing of this tenant can be created with only READONLY ingesters. In this case, DoBatch will fail as there will be no health ingesters to send data.

Early throttle
Lets assume we have a ring with
80 ACTIVE ingesters
20 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20

The current subRing can be created as a mix of ACTIVE and READONLY ingesters. This will cause a subRing of size 20 but only 15 ACTIVE ingesters supposedly. The localLimit for each ingesters will be calculated over 20 as the shard size but only 15 ingester are receiving all data. The new scenario will create a subRing over just the 80 ACTIVE ingesters.

This PR introduce extension to READONLY instances on ShuffleShard. It works similar to lookback. On the write path as we dont use READONLY and extend instead to write, it will not send requests to REANDONLY on RemoteWrite. On read path, it will return a shard greather than expected as it happens with lookback.

We are also changing the registered timestamp of ingesters which returns from READONLY state. These ingesters are as they entered the ring again for a Write perspective. This will make the lookback on read extend on them

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@danielblando danielblando force-pushed the filter-ro-ingester-sharding branch from 4e75055 to d344999 Compare January 16, 2025 21:46
@danielblando danielblando marked this pull request as ready for review January 17, 2025 18:20
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we extend replicas if the current instance is read only status, does that work?

pkg/ring/model.go Show resolved Hide resolved
pkg/ring/ring.go Outdated Show resolved Hide resolved
@danielblando
Copy link
Contributor Author

It does @yeya24.
I have moved to that approach. The filter approach brings problem to the Read operation as we can be missing some ingesters to query.
Extending we will use a similar logic as lookback.

@danielblando danielblando changed the title Add operation on ShuffleSharding to filter READONLY ingesters Extend ShuffleSharding on READONLY ingesters Jan 22, 2025
@danielblando danielblando force-pushed the filter-ro-ingester-sharding branch from ef12664 to 4006b10 Compare January 23, 2025 01:07
@CharlieTLe
Copy link
Member

Hello @danielblando, thank you for opening this PR.

There is a release in progress. As such, please rebase your CHANGELOG entry on top of the master branch and move the CHANGELOG entry to the top under ## master / unreleased.

Thanks,
Charlie

@danielblando danielblando force-pushed the filter-ro-ingester-sharding branch from 4006b10 to 1c55b56 Compare January 23, 2025 18:57
@@ -1005,6 +1005,12 @@ func (i *Lifecycler) changeState(ctx context.Context, state InstanceState) error

level.Info(i.logger).Log("msg", "changing instance state from", "old_state", currState, "new_state", state, "ring", i.RingName)
i.setState(state)

//The instances is rejoining the ring. It should reset its registered time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, what happens if we don't reset the registered time?

Copy link
Contributor Author

@danielblando danielblando Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to catch cases the ingester goes to READONLY and back to ACTIVE.
The query still need to extend on these cases. The change on registeredTimestamp enforce that query will continue to extend requests on these ingesters.

"same number of instances, prioritize readOnly than timestamp changes": {
r1: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: ACTIVE, Timestamp: 123456}}},
r2: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: READONLY, Timestamp: 789012}}},
expected: EqualButReadOnly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused about the name. I understand the prioritization but it is weird to call it equal when you have timestamp different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I get what you mean, but i dont have a better naming. I think this is ok as ReadOnly is less restrictive than EqualButTimestamporState

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
@danielblando danielblando force-pushed the filter-ro-ingester-sharding branch from 1c55b56 to 8ad97a8 Compare January 30, 2025 21:56
@danielblando danielblando merged commit b48f93b into cortexproject:master Jan 30, 2025
17 checks passed
@danielblando danielblando deleted the filter-ro-ingester-sharding branch January 30, 2025 22:37
alexqyle pushed a commit to alexqyle/cortex that referenced this pull request Jan 31, 2025
* Filter readOnly ingesters when sharding

Signed-off-by: Daniel Deluiggi <[email protected]>

* Extend shard on READONLY

Signed-off-by: Daniel Deluiggi <[email protected]>

* Remove old code

Signed-off-by: Daniel Deluiggi <[email protected]>

* Fix test

Signed-off-by: Daniel Deluiggi <[email protected]>

* update changelog

Signed-off-by: Daniel Deluiggi <[email protected]>

---------

Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants