-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend ShuffleSharding on READONLY ingesters #6517
Extend ShuffleSharding on READONLY ingesters #6517
Conversation
4e75055
to
d344999
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we extend replicas if the current instance is read only status, does that work?
It does @yeya24. |
ef12664
to
4006b10
Compare
Hello @danielblando, thank you for opening this PR. There is a release in progress. As such, please rebase your CHANGELOG entry on top of the master branch and move the CHANGELOG entry to the top under Thanks, |
4006b10
to
1c55b56
Compare
@@ -1005,6 +1005,12 @@ func (i *Lifecycler) changeState(ctx context.Context, state InstanceState) error | |||
|
|||
level.Info(i.logger).Log("msg", "changing instance state from", "old_state", currState, "new_state", state, "ring", i.RingName) | |||
i.setState(state) | |||
|
|||
//The instances is rejoining the ring. It should reset its registered time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, what happens if we don't reset the registered time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to catch cases the ingester goes to READONLY and back to ACTIVE.
The query still need to extend on these cases. The change on registeredTimestamp enforce that query will continue to extend requests on these ingesters.
"same number of instances, prioritize readOnly than timestamp changes": { | ||
r1: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: ACTIVE, Timestamp: 123456}}}, | ||
r2: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: READONLY, Timestamp: 789012}}}, | ||
expected: EqualButReadOnly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused about the name. I understand the prioritization but it is weird to call it equal
when you have timestamp different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I get what you mean, but i dont have a better naming. I think this is ok as ReadOnly is less restrictive than EqualButTimestamporState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
1c55b56
to
8ad97a8
Compare
* Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <[email protected]> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <[email protected]> * Remove old code Signed-off-by: Daniel Deluiggi <[email protected]> * Fix test Signed-off-by: Daniel Deluiggi <[email protected]> * update changelog Signed-off-by: Daniel Deluiggi <[email protected]> --------- Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]>
What this PR does:
We found an issue when a tenant has
ingestion_tenant_shard_size
lower than the number of ACTIVE ingesters or high number of READONLY ingesters when testing the new status of READONLY.Eg:
Failed Push
Lets assume we have a ring with
10 ACTIVE ingesters
50 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20
The current subRing of this tenant can be created with only READONLY ingesters. In this case, DoBatch will fail as there will be no health ingesters to send data.
Early throttle
Lets assume we have a ring with
80 ACTIVE ingesters
20 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20
The current subRing can be created as a mix of ACTIVE and READONLY ingesters. This will cause a subRing of size 20 but only 15 ACTIVE ingesters supposedly. The localLimit for each ingesters will be calculated over 20 as the shard size but only 15 ingester are receiving all data. The new scenario will create a subRing over just the 80 ACTIVE ingesters.
This PR introduce extension to READONLY instances on ShuffleShard. It works similar to lookback. On the write path as we dont use READONLY and extend instead to write, it will not send requests to REANDONLY on RemoteWrite. On read path, it will return a shard greather than expected as it happens with lookback.
We are also changing the registered timestamp of ingesters which returns from READONLY state. These ingesters are as they entered the ring again for a Write perspective. This will make the lookback on read extend on them
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]