Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor: don't return errors when discarding samples with duplicated timestamps #10430

Merged
merged 4 commits into from
Jan 14, 2025

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Jan 13, 2025

What this PR does

In #10145 we introduced a request validation at distributor level that, for each object of type mimirpb.PreallocTimeseries in a request, iterates all its float and histogram samples, and as soon as a sample with a timestamp that has already been visited is found, discards the corresponding sample from the timeseries. Once the validation is completed, the request without discarded samples is forwarded to ingesters. Discarded samples are tracked by the cortex_discarded_samples_total metrics with reason sample_duplicate_timestamp.
One of the problems introduced by #10145 is that if a distributor discards at least one sample due to a duplicated timestamp, it also returns an error reporting the name of the timeseries, and the number of discarded samples. This behaviour introduces a confusion since

  • on the one hand, the first sample with a certain timestamp that reaches a distributor is propagated to ingesters for further processing, and might be successfully ingested,
  • while on the other hand, all other samples belonging to the same timeseries and with the same timestamp as the first one will be rejected, and will return an error, making the request both successful and unsuccessful.

In order to remove this ambiguity, we decided to:

  • Keep discarding samples with duplicated timestamps within the same mimirpb.PreallocTimeseries object on distributor side.
  • Keep tracing the discarded samples in the cortex_discarded_samples_total metrics with reason sample_duplicate_timestamp.
  • Not to return errors reporting the timeseries name and the number of discarded samples.

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@duricanikolic duricanikolic self-assigned this Jan 13, 2025
@duricanikolic duricanikolic requested a review from a team as a code owner January 13, 2025 18:23
Signed-off-by: Yuri Nikolic <[email protected]>
Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Taylor C <[email protected]>
@@ -899,7 +899,7 @@ func (d *Distributor) validateSeries(nowt time.Time, ts *mimirpb.PreallocTimeser
if deduplicatedSamplesAndHistograms > 0 {
d.sampleValidationMetrics.duplicateTimestamp.WithLabelValues(userID, group).Add(float64(deduplicatedSamplesAndHistograms))
unsafeMetricName, _ := extract.UnsafeMetricNameFromLabelAdapters(ts.Labels)
return false, fmt.Errorf(duplicateTimestampMsgFormat, deduplicatedSamplesAndHistograms, unsafeMetricName)
level.Warn(d.log).Log("msg", "samples with duplicated timestamps have been discarded", "series", fmt.Sprintf("'%.200s'", unsafeMetricName), "samples discarded", deduplicatedSamplesAndHistograms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned we may log too much. This is per-series. In the worst case, this would be logged for every series in every write request, killing distributors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pracucci ,
We could then either remove logs or try to sample them, like we already do with ingesters, where in some cases we log only 10% of actual logs. WDYT?

Copy link
Contributor Author

@duricanikolic duricanikolic Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... we actually don't sample logs but errors. So the sampling proposal cannot be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to remove logs for the time being, and to introduce rate limited logs in a follow-up PR.

@duricanikolic duricanikolic merged commit 78e0d3a into main Jan 14, 2025
29 checks passed
@duricanikolic duricanikolic deleted the yuri/distributor-deduplicate-timestamps branch January 14, 2025 11:04
duricanikolic added a commit that referenced this pull request Jan 14, 2025
…ted timestamps (#10430) (#10434)

* Distributor: don't return errors when discarding samples with duplicated timestamps

Signed-off-by: Yuri Nikolic <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Yuri Nikolic <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Taylor C <[email protected]>

* Remove logging for the time being

Signed-off-by: Yuri Nikolic <[email protected]>

---------

Signed-off-by: Yuri Nikolic <[email protected]>
Co-authored-by: Taylor C <[email protected]>
(cherry picked from commit 78e0d3a)

Co-authored-by: Đurica Yuri Nikolić <[email protected]>
duricanikolic added a commit that referenced this pull request Jan 14, 2025
…ted timestamps (#10430) (#10437)

* Distributor: don't return errors when discarding samples with duplicated timestamps

Signed-off-by: Yuri Nikolic <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Yuri Nikolic <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Taylor C <[email protected]>

* Remove logging for the time being

Signed-off-by: Yuri Nikolic <[email protected]>

---------

Signed-off-by: Yuri Nikolic <[email protected]>
Co-authored-by: Taylor C <[email protected]>
(cherry picked from commit 78e0d3a)

Co-authored-by: Đurica Yuri Nikolić <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants