-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for store instance high availability #404
Conversation
The format of this proposal mirrors an existing draft proposal: thanos-io#387
|
||
## Open issues | ||
|
||
### Querying from multiple buckets or object stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I personally prefer this approach, would allow us to run queries that span multiple buckets.
Example - we collect info from two tenants each of which gets uploaded to their own bucket tenantA
and tenantB
if we wanted to query the metrics for both adding the a Identifier
to the InfoResponse
we could have the bucket name on there and it would be valid to wait for both responses.
A further use case of this could be to have buckets that are historical / archive data and some that are within x days and would mean that you could spin up store nodes quicker for the recent bucket as would require less overhead at start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here with @domgreen. Maybe slight change in gateway: bool
into something like gateway string
with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense having a string instead of bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think using a string identifier is the way forward.
The Protobuf definition could be:
bool deduplicated = 4
string store_group_id = 5
The deduplicated
boolean would determine how the query instance handles retrieving the data (i.e. whether to wait for all results). I think I prefer deduplicated
over gateway
as it's more specific to the functionality we care about in this case. Sidecar instances would have deduplicated
set to false. Bucket stores would have deduplicated
set to true.
It'd be up to the specific implementation to determine what the store_group_id
string is. For bucket stores, I think concatenating the object store URL and bucket name would make sense. For all other stores, I'd leave it empty for now until we have a good reason to use it.
The logic would work like this:
if deduplicated:
query_and_use_fastest_response_per_store_group
else
query_and_wait_for_all_responses
Stores would be grouped by store_group_id
. Stores having an empty store_group_id
field would be considered all part of the same group.
Since boolean fields are false by default in Protobufs, deduplicated
would be false and existing instances would be treated as before for backwards compatibility (i.e. the query instance will wait for all responses, subject to timeouts).
If you agree, I'll update the proposal to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a redundant deduplicated
option? Isn't it enough having specified store_group_id
as a object store+bucket_name as identifier? Thus, we don't have to wait for all responses if stores have the same store_group_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can account only non-zero string in store_group_id
for the deduplication, hence older version would work as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relying on an empty string is too implicit as it alters the behaviour of how responses are treated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this's a disadvantage, if you describe the behaviour in the documentation and the help. Otherwise you have to bear in mind both options to be working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the massive delay in reviews and thanks for this (:
store_group_id
makes sense to me. deduplicated
is tricky. I don't like it because both store gateways with query_and_use_fastest_response_per_store_group
and sidecars with query_and_wait_for_all_responses
logic are kind of duplicated, so we can work on naming definitely.
However, what about another idea:
In Info
let's add:
store_group_id
peer_type
(like it is in gossipcluster.PeerState.Type
).
I am proposing this because I am trying to figure out how to make deduplication flow consistent across ALL components. Including not yet solved: Ruler
deduplication!
Basically, with this we can deduct correct behavior everywhere:
- Store gateway:
store_group_id
would be as @mattbostock suggested and peerType=store so query can performquery_and_use_fastest_response_per_store_group
logic. - Sidecars:
store_group_id
would be all external labels as it is ID of those. peerType=source so query can assumequery_and_wait_for_all_responses
flow with somereplica
label deduplication. - Ruler: The tricky part is that we might want these in HA, but it's hard to define external labels for these, so having some arbitrary
store_group_id
would be perfect. Plus ruler ispeerType=source
so same replica deduplication might apply here as well.
I am not saying we want solve ruler with this proposal, just that we might want to reuse same thing and move ALL to store_group_id
for deduplication logic for consistency.
Tricky part here:
4 Rules (in 2 different groups) - it would be not trivial to implement this, because the query_and_wait_for_all_responses
+ dedup
is done truly by removing (or not) replica label. But that is something not for proposal anyway.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments all.
@Bplotka: I think your proposal makes sense, I'll update this PR to use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for it!
failed but similar data was recorded by another Prometheus server in the same | ||
failure domain. To support deduplication, Thanos must wait for all Thanos | ||
sidecar servers to return their data (subject to timeouts) before returning a | ||
response to a client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth to mention, that we DO not ignore these timeouts, even with the HA sidecar scenario we treat one replica being inaccessible as an "error case" - which might be wrong or confusing and definitely different to store gateway case (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good shout, I'll make that explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, I'm not sure if we need to expand on this as I want to avoid confusing the reader by giving too much detail. Maybe I should remove the 'subject to timeouts' part?
that do not explicitly set a value for `gateway` will not be affected. | ||
|
||
If a store instance is a gateway, query instances will treat each store | ||
instance in a label group as having access to the same data. Query instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to tell if stores are from same label groups if they do not put any labels in current logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to the alternative mentioned below. will comment there
|
||
## Open issues | ||
|
||
### Querying from multiple buckets or object stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here with @domgreen. Maybe slight change in gateway: bool
into something like gateway string
with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?
Updated based on feedback, comments welcome. |
@mattbostock I don't know why we did not ask about this, but.... For HA, is that not enough to just put multiple replicas of store gateways behind some loadbalancer (for example 2 replica statefulset behind k8s service)? (: This will not work for gossip (as in gossip pods talks to pods), but wil work for static |
Thanks for your work here, but I think we need to merge it as However this proposal might be a good base for other features in soon future:
|
Merged, just to move it to rejected state in next PR and have it for future reference. |
Bumps [webpack](https://github.com/webpack/webpack) from 5.91.0 to 5.96.1. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.91.0...v5.96.1) --- updated-dependencies: - dependency-name: webpack dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The format of this proposal mirrors an existing draft proposal:
#387
We should to arrive at consensus on the open issues described in the proposal (under the 'Open issues' section) before merging.
/cc @xjewer @fabxc @Bplotka