-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributor: don't return errors when discarding samples with duplicated timestamps #10430
Conversation
…ted timestamps Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Taylor C <[email protected]>
pkg/distributor/distributor.go
Outdated
@@ -899,7 +899,7 @@ func (d *Distributor) validateSeries(nowt time.Time, ts *mimirpb.PreallocTimeser | |||
if deduplicatedSamplesAndHistograms > 0 { | |||
d.sampleValidationMetrics.duplicateTimestamp.WithLabelValues(userID, group).Add(float64(deduplicatedSamplesAndHistograms)) | |||
unsafeMetricName, _ := extract.UnsafeMetricNameFromLabelAdapters(ts.Labels) | |||
return false, fmt.Errorf(duplicateTimestampMsgFormat, deduplicatedSamplesAndHistograms, unsafeMetricName) | |||
level.Warn(d.log).Log("msg", "samples with duplicated timestamps have been discarded", "series", fmt.Sprintf("'%.200s'", unsafeMetricName), "samples discarded", deduplicatedSamplesAndHistograms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned we may log too much. This is per-series. In the worst case, this would be logged for every series in every write request, killing distributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pracucci ,
We could then either remove logs or try to sample them, like we already do with ingesters, where in some cases we log only 10% of actual logs. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... we actually don't sample logs but errors. So the sampling proposal cannot be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to remove logs for the time being, and to introduce rate limited logs in a follow-up PR.
Signed-off-by: Yuri Nikolic <[email protected]>
…ted timestamps (#10430) (#10434) * Distributor: don't return errors when discarding samples with duplicated timestamps Signed-off-by: Yuri Nikolic <[email protected]> * Update CHANGELOG.md Signed-off-by: Yuri Nikolic <[email protected]> * Update CHANGELOG.md Co-authored-by: Taylor C <[email protected]> * Remove logging for the time being Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]> Co-authored-by: Taylor C <[email protected]> (cherry picked from commit 78e0d3a) Co-authored-by: Đurica Yuri Nikolić <[email protected]>
…ted timestamps (#10430) (#10437) * Distributor: don't return errors when discarding samples with duplicated timestamps Signed-off-by: Yuri Nikolic <[email protected]> * Update CHANGELOG.md Signed-off-by: Yuri Nikolic <[email protected]> * Update CHANGELOG.md Co-authored-by: Taylor C <[email protected]> * Remove logging for the time being Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]> Co-authored-by: Taylor C <[email protected]> (cherry picked from commit 78e0d3a) Co-authored-by: Đurica Yuri Nikolić <[email protected]>
What this PR does
In #10145 we introduced a request validation at distributor level that, for each object of type
mimirpb.PreallocTimeseries
in a request, iterates all its float and histogram samples, and as soon as a sample with a timestamp that has already been visited is found, discards the corresponding sample from the timeseries. Once the validation is completed, the request without discarded samples is forwarded to ingesters. Discarded samples are tracked by thecortex_discarded_samples_total
metrics with reasonsample_duplicate_timestamp
.One of the problems introduced by #10145 is that if a distributor discards at least one sample due to a duplicated timestamp, it also returns an error reporting the name of the timeseries, and the number of discarded samples. This behaviour introduces a confusion since
In order to remove this ambiguity, we decided to:
mimirpb.PreallocTimeseries
object on distributor side.cortex_discarded_samples_total
metrics with reasonsample_duplicate_timestamp
.Which issue(s) this PR fixes or relates to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.