-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE: Fix native histogram bugs #9145
Conversation
f133c46
to
e1363e2
Compare
This has a very slight negative impact to the benchmark due to copying the span values. However it appears acceptable to me:
|
…tHistogram This copies the spans and buckets so that they are not reused between native histograms.
e1363e2
to
6b64584
Compare
What I believe to be the upstream fix if we want to avoid this work around is here: prometheus/prometheus#14771 I haven't checked, but I suspect the extra copying in the upstream case will have a similar overhead to this change. (In other words I think the changes are equitable, and it more comes down to whether prometheus want such a change). |
Nice find. I'd prefer to use the upstream fix as that will ensure there aren't any other places that could be affected by this issue (eg. range vector selectors). |
Yep, I agree. However, I think we should merge this for now and change it back once upstream is fixed. I've added a TODO to note this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is sufficient: there's also a similar problem that occurs when successive different histograms are returned due to lookback.
For example, in the test case in 8bae026, the underlying histograms are at T=30s and T=1m30s, but the query is running with a 1m step and so performs lookback at T=1m and T=2m. In this case, the histograms returned by PeekPrev
share spans due to the bug fixed in prometheus/prometheus#14771 (PeekPrev
uses a histogram created with AtFloatHistogram(nil)
here), and so the test fails.
I think the safest thing to do until prometheus/prometheus#14771 is fixed is to just always copy the first histogram for a point in sum
- I'll put together a PR for this now.
Once that bug is fixed, I also think we should extend requireNotSame
in TestInstantVectorSelector_NativeHistogramPointerHandling
to include the require.NotSame(t, &points[0].H.PositiveSpans[0], &points[1].H.PositiveSpans[0], "must not point to the same underlying array")
assertion you've introduced here, and similar assertions for the other slices as well. (edit: I've done this in 52f9c1f)
Good catch with the PeekPrev. An alternate workaround would be to copy the span slices after using I'd be more inclined to do that so we can keep both of these test cases and then remove the workaround once upstream is fixed. It also means we don't need to Copy as many histograms in Sum, or remember to do it in other places (like Avg once done). |
Good idea, I'll do that instead. |
Thanks. Feel free to push onto my branch. |
Done 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving all of your changes @jhesketh - could you take a look at mine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just some nits. Thanks for making the changes :-)
@@ -231,3 +226,13 @@ func TestInstantVectorSelector_NativeHistogramPointerHandling(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func requireNotSameSlices[T any](t *testing.T, s1, s2 []T, description string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) the inline comment could be a doc-string for this
requireNotSame(t, points[0].H, points[1].H) | ||
}, | ||
}, | ||
"successive histograms returned due to lookback, but refer to different histograms": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
"successive histograms returned due to lookback, but refer to different histograms": { | |
"successive histograms returned due to lookback, but create different histograms at each point": { |
* Add test the checks for NH compaction on sum * Use chunkIterator.AtFloatHistogram instead of memoizedIterator.AtFloatHistogram This copies the spans and buckets so that they are not reused between native histograms. * Update CHANGELOG * Add TODO to remove workaround * Add failing test case * Apply same assertions to all test cases * Apply workaround to lookback case as well * Address PR feedback --------- Co-authored-by: Charles Korn <[email protected]> (cherry picked from commit b1da9ec)
* Add test the checks for NH compaction on sum * Use chunkIterator.AtFloatHistogram instead of memoizedIterator.AtFloatHistogram This copies the spans and buckets so that they are not reused between native histograms. * Update CHANGELOG * Add TODO to remove workaround * Add failing test case * Apply same assertions to all test cases * Apply workaround to lookback case as well * Address PR feedback --------- Co-authored-by: Charles Korn <[email protected]> (cherry picked from commit b1da9ec) Co-authored-by: Joshua Hesketh <[email protected]>
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.