Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: native histogram support for downsampling #6350

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rabenhorst
Copy link
Contributor

@rabenhorst rabenhorst commented May 9, 2023

This PR adds native histogram support for downsampling in compact and is part of #5907.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Add native histogram support for downsampling

Verification

  • Add unit tests for native histograms in downsampling

@rabenhorst rabenhorst changed the title Native histogram support for downsampling (w.i.p.) compact: native histogram support for downsampling (w.i.p.) May 9, 2023
@rabenhorst rabenhorst mentioned this pull request May 9, 2023
7 tasks
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch 4 times, most recently from 161bfb0 to 669417a Compare May 10, 2023 13:39
@rabenhorst rabenhorst changed the title compact: native histogram support for downsampling (w.i.p.) compact: native histogram support for downsampling May 10, 2023
@rabenhorst rabenhorst marked this pull request as ready for review May 10, 2023 15:54
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch from dc62091 to f049b1c Compare May 16, 2023 11:25
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch 4 times, most recently from ed2aa80 to bdf8e80 Compare June 13, 2023 12:55
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch from bdf8e80 to fd82b1a Compare July 10, 2023 18:33
}

if h.total > 0 {
if fh.DetectReset(h.previous) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increase of schema is treated like a reset:

A decrease of the ZeroThreshold or an increase of the Schema (i.e. an increase of resolution) can only happen together with a reset. Thus, the method returns true in either case.

If we do fh.CopyToSchema(h.schema) previously then this case is not caught AFAICT. Shouldn't we do the same reset logic in the preceding if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to check that, but not sure if this will be still relevant. I proposed a change to how we can do downsampling: #6554

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added downsampling support for mixed chunk series and this PR is still open for review since #6554 will be a longer term task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do fh.CopyToSchema(h.schema) previously then this case is not caught AFAICT. Shouldn't we do the same reset logic in the preceding if?

I think you are right and will look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do fh.CopyToSchema(h.schema) previously then this case is not caught AFAICT. Shouldn't we do the same reset logic in the preceding if?

h.schema is the precalculated minimal schema from all samples of the series, which we need to know, otherwise we might run in the situation where we can't add a sample to aggregates. For the reset we need to compare current and previous histogram, so I suggest to keep the original histogram and use that for reset detection: d7d2224

From what I understand the head appender sets reset hints properly, so even without the fix it should work.

@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch from 619d806 to 0415cc4 Compare July 24, 2023 09:40
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch 2 times, most recently from 3b58955 to 97a8dd6 Compare August 1, 2023 10:50
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch from 55ae8d5 to c0cfcc6 Compare August 19, 2023 17:44
Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed imports

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Added native histogram support for downsampling

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Improved comment

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Removed unnecessary if

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed native histogram proto conversion in remote engine

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Added downsampling support for native histograms

Reverted files from base PR

Added changelog

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Reverted error in bucket.go

Signed-off-by: Sebastian Rabenhorst <[email protected]>

trigger tests

Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed tests, removed unused code

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed tests, removed unused code

Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>

Improved mixed type chunks and native histogram downsampling tests

Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>
@rabenhorst rabenhorst force-pushed the native-histogram-support-downsampling branch from c0cfcc6 to c402aaa Compare August 19, 2023 17:45
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get this feedback consistently that Thanos is hard to understand and we are adding quite new functionality here thus could you please add a paragraph to docs/components/compact.md about how native histogram downsampling works?

@@ -85,6 +85,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6163](https://github.com/thanos-io/thanos/pull/6163) Receiver: changed default max backoff from 30s to 5s for forwarding requests. Can be configured with `--receive-forward-max-backoff`.
- [#6327](https://github.com/thanos-io/thanos/pull/6327) *: *breaking :warning:* Use histograms instead of summaries for instrumented handlers.
- [#6322](https://github.com/thanos-io/thanos/pull/6322) Logging: Avoid expensive log.Valuer evaluation for disallowed levels.
- [#6350] https://github.com/thanos-io/thanos/pull/6350 Compact: Add native histogram support for downsampling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#6350] https://github.com/thanos-io/thanos/pull/6350 Compact: Add native histogram support for downsampling.
- [#6350](https://github.com/thanos-io/thanos/pull/6350) Compact: Add native histogram support for downsampling.

@@ -53,6 +53,7 @@ type HeadGenOptions struct {
ScrapeInterval time.Duration

WithWAL bool
AppendLabels labels.Labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

// Native histograms only have aggregates for COUNT, COUNTER and SUM.
aggrsTypes := []storepb.Aggr{storepb.Aggr_COUNT, storepb.Aggr_COUNTER, storepb.Aggr_SUM}
for _, concurrency := range []int{1, 2, 4, 8, 16, 32} {
b.Run(fmt.Sprintf("aggregates: %v, concurrency: %d", aggrs, concurrency), func(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant to use aggrsTypes here? aggrs is empty 😄

@@ -179,23 +192,44 @@ func Downsample(
}
}
}

}
if i > 0 && previousIsHistogram != isHistogram(ac) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe downsampleAggr could handle this interleaving of different chunk types? For example, we could loop through the given data and perform downsampling individually for different sets of chunk types. It's getting hard to understand this loop. The same could probably be said about DownsampleRaw but there are fewer if branches there so it's understandable.

}

if h.sum == nil {
if h.resetDetected {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to reset this to false?

} else {
h.sum = fh.Copy()
}
// This needs to be h gauge histogram, otherwise reset detection will be triggered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This needs to be h gauge histogram, otherwise reset detection will be triggered
// This needs to be a gauge histogram, otherwise reset detection will be triggered

}
// This needs to be h gauge histogram, otherwise reset detection will be triggered
// when appending the aggregated chunk and histogram.count < appender.count.
h.sum.CounterResetHint = histogram.GaugeType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the result that this is always set when adding multiple histograms? Maybe we can move this out of the if?

b.chunks[AggrCount] = chunkenc.NewXORChunk()

// Sum aggregate always needs to be gauge type, otherwise
// append can fail when sum_window_n > sum_window_n+1 (.i.e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// append can fail when sum_window_n > sum_window_n+1 (.i.e
// append can fail when sum_window_n > sum_window_n+1 (i.e.

switch fh.CounterResetHint {
case histogram.GaugeType:
if app != nil {
var ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var ()

@rabenhorst
Copy link
Contributor Author

We get this feedback consistently that Thanos is hard to understand and we are adding quite new functionality here thus could you please add a paragraph to docs/components/compact.md about how native histogram downsampling works?

Agreed. I will ad documentation about it.

@GiedriusS
Copy link
Member

Any update on this? Would be cool to merge it.

@Matroxt
Copy link

Matroxt commented Jun 5, 2024

This seems to be the last piece missing to fully support native histograms in Thanos.
Can we help in any way so this can get merged?

@GiedriusS
Copy link
Member

This seems to be the last piece missing to fully support native histograms in Thanos. Can we help in any way so this can get merged?

You can open up a separate PR using commits from this branch and then add extra fixes on top

@rabenhorst
Copy link
Contributor Author

This seems to be the last piece missing to fully support native histograms in Thanos. Can we help in any way so this can get merged?

I don't had/have the time continue working on this. As @GiedriusS wrote, feel free to jut copy my branch. If you have question you can put them here or find me in CNCF slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants