Improve entry deduplication. #2302

cyriltovena · 2020-07-06T19:59:41Z

This PR removes mostcommon and sort insert function in the heap iterator. I discovered while working on #2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those.

Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better.

I also change the label ordering, I think whether we are forward or backward we should keep the same alphabetical labels ordering not sure why direction was altering this before.

Signed-off-by: Cyril Tovena [email protected]

This PR removes mostcommon and sort insert function in the heap iterator. I discovered while working on grafana#2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those. Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better. I also change the label ordering, I think whether we are forward or backward we should keep the same aphabetical labels ordering not sure why direction was altering this before. Signed-off-by: Cyril Tovena <[email protected]>

owen-d

nit about a test, then LGTM.

owen-d · 2020-07-08T12:47:19Z

pkg/iter/iterator_test.go

+		NewStreamIterator(foo),
+	}, logproto.BACKWARD)
+	// first reverse streams, they should already be correctly ordered for the heap iterator to work.
+	for i, j := 0, len(foo.Entries)-1; i < j; i, j = i+1, j-1 {


This is actually mutating the internal entries used by the stream iter as they reference the same slice. I think you'll want a function which creates the same []stream to guarantee we don't mutate the same underlying data. Basically the testware and the HeapIter should use identical but separate copies of the same data.

Signed-off-by: Cyril Tovena <[email protected]>

codecov-commenter · 2020-07-08T13:23:56Z

Codecov Report

Merging #2302 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2302      +/-   ##
==========================================
+ Coverage   61.03%   61.04%   +0.01%     
==========================================
  Files         158      158              
  Lines       12778    12751      -27     
==========================================
- Hits         7799     7784      -15     
+ Misses       4394     4379      -15     
- Partials      585      588       +3

Impacted Files	Coverage Δ
pkg/iter/iterator.go	`66.85% <100.00%> (-1.31%)`	⬇️
pkg/promtail/targets/tailer.go	`73.86% <0.00%> (-2.28%)`	⬇️
pkg/promtail/targets/filetarget.go	`68.67% <0.00%> (-1.81%)`	⬇️
pkg/logql/evaluator.go	`92.13% <0.00%> (-0.41%)`	⬇️
pkg/promtail/positions/positions.go	`60.71% <0.00%> (+13.39%)`	⬆️

pull-request-size bot added the size/L label Jul 6, 2020

cyriltovena force-pushed the dedupe-entry branch from 2e634d3 to c9298bc Compare July 6, 2020 21:13

Merge remote-tracking branch 'upstream/master' into dedupe-entry

08fdb7e

owen-d reviewed Jul 8, 2020

View reviewed changes

Improve heap iterator backward test.

1eb8b5a

Signed-off-by: Cyril Tovena <[email protected]>

owen-d approved these changes Jul 8, 2020

View reviewed changes

owen-d merged commit cd74043 into grafana:master Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve entry deduplication. #2302

Improve entry deduplication. #2302

cyriltovena commented Jul 6, 2020

owen-d left a comment

owen-d Jul 8, 2020

codecov-commenter commented Jul 8, 2020

Improve entry deduplication. #2302

Improve entry deduplication. #2302

Conversation

cyriltovena commented Jul 6, 2020

owen-d left a comment

Choose a reason for hiding this comment

owen-d Jul 8, 2020

Choose a reason for hiding this comment

codecov-commenter commented Jul 8, 2020

Codecov Report