-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve entry deduplication. #2302
Conversation
This PR removes mostcommon and sort insert function in the heap iterator. I discovered while working on grafana#2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those. Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better. I also change the label ordering, I think whether we are forward or backward we should keep the same aphabetical labels ordering not sure why direction was altering this before. Signed-off-by: Cyril Tovena <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit about a test, then LGTM.
pkg/iter/iterator_test.go
Outdated
NewStreamIterator(foo), | ||
}, logproto.BACKWARD) | ||
// first reverse streams, they should already be correctly ordered for the heap iterator to work. | ||
for i, j := 0, len(foo.Entries)-1; i < j; i, j = i+1, j-1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually mutating the internal entries used by the stream iter as they reference the same slice. I think you'll want a function which creates the same []stream
to guarantee we don't mutate the same underlying data. Basically the testware and the HeapIter should use identical but separate copies of the same data.
Signed-off-by: Cyril Tovena <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #2302 +/- ##
==========================================
+ Coverage 61.03% 61.04% +0.01%
==========================================
Files 158 158
Lines 12778 12751 -27
==========================================
- Hits 7799 7784 -15
+ Misses 4394 4379 -15
- Partials 585 588 +3
|
This PR removes
mostcommon
andsort insert
function in the heap iterator. I discovered while working on #2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those.Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better.
I also change the label ordering, I think whether we are forward or backward we should keep the same alphabetical labels ordering not sure why direction was altering this before.
Signed-off-by: Cyril Tovena [email protected]