fix: Send correct batch stats when SendBatchMaxSize is set #5385

njvrzm · 2022-05-18T03:58:15Z

Description:
This fixes a bug with the batch processor's batch_send_size and batch_send_size_bytes metrics. Their values were being calculated before SendBatchMaxSize was applied.

We observed this issue during performance testing. With SendBatchMaxSize set to a small enough value that it almost always took effect, graphs of batch_send_size showed an odd sawtooth pattern and the total values were far in excess of the number of items actually sent. Looking at some individual metrics the issue was clear - with a SendBatchMaxSize of 100, for instance, we'd see batch_send_size metrics looking like 1000, 900, 800, 700... as the full size of the batch queue was recorded but not sent each time.

With this change the various export methods report the actual count of items sent, and byte size sent if requested, and the sendItems method records those values.

Testing:
I added a test called TestBatchProcessorSentBySize_withMaxSize, based on TestBatchProcessorSentBySize but with SendBatchMaxSize set and with all spans delivered in a single request so that the batch size is predictable. It does not attempt to validate the batch_send_size_bytes metric - the splitting of batches makes the method used in the original test fail due to different amounts of overhead.

(Incidentally, TestBatchProcessorSentBySize itself is rather brittle. Changing either sendBatchSize or spansPerRequest so that the former is not a multiple of the latter makes the test fail in several ways.)

linux-foundation-easycla · 2022-05-18T03:58:18Z

The committers listed above are authorized under a signed CLA.

✅ login: njvrzm / name: Nathan Vērzemnieks (7ef957c)

jaronoff97 · 2022-05-18T13:35:42Z

processor/batchprocessor/batch_processor.go

@@ -244,17 +245,24 @@ func (bt *batchTraces) add(item interface{}) {
 	td.ResourceSpans().MoveAndAppendTo(bt.traceData.ResourceSpans())
 }

-func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int) error {
+func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int, returnBytes bool) (int, int, error) {


question: should we do the stats.Record call within export rather than returning it? That way we don't need to make any function signature changes?

i can see why that would be annoying because then instead of a single stats record call, you have to make one in each batch processor. What do you think?

Yeah, I'd rather change the signature, especially since it's only used in one place, than duplicate the stats recording code.

jaronoff97 · 2022-05-18T13:36:50Z

processor/batchprocessor/batch_processor.go

@@ -244,17 +245,24 @@ func (bt *batchTraces) add(item interface{}) {
 	td.ResourceSpans().MoveAndAppendTo(bt.traceData.ResourceSpans())
 }

-func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int) error {
+func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int, returnBytes bool) (int, int, error) {


i can see why that would be annoying because then instead of a single stats record call, you have to make one in each batch processor. What do you think?

jaronoff97 · 2022-05-18T13:37:46Z

processor/batchprocessor/batch_processor.go

-	}
-
-	if err := bp.batch.export(bp.exportCtx, bp.sendBatchMaxSize); err != nil {
+	detailed := bp.telemetryLevel == configtelemetry.LevelDetailed


question: should we call out that we may want to remove this given it's barely used and almost always desired in a future PR?

It looks like the default for this setting is LevelBasic, so eliminating the check here would be a behavior change.

codecov · 2022-05-19T22:04:02Z

Codecov Report

Merging #5385 (3a85f7a) into main (528fd56) will decrease coverage by 0.00%.
The diff coverage is 91.66%.

❗ Current head 3a85f7a differs from pull request most recent head 18aebcb. Consider uploading reports for the commit 18aebcb to get more accurate results

@@            Coverage Diff             @@
##             main    #5385      +/-   ##
==========================================
- Coverage   90.89%   90.88%   -0.01%     
==========================================
  Files         191      190       -1     
  Lines       11421    11446      +25     
==========================================
+ Hits        10381    10403      +22     
- Misses        819      822       +3     
  Partials      221      221

Impacted Files	Coverage Δ
processor/batchprocessor/batch_processor.go	`88.94% <91.66%> (-2.59%)`	⬇️
service/service.go	`41.79% <0.00%> (-4.64%)`	⬇️
service/zpages.go	`70.08% <0.00%> (-1.69%)`	⬇️
pdata/internal/common.go	`94.61% <0.00%> (-0.77%)`	⬇️
service/host.go	`100.00% <0.00%> (ø)`
config/common.go	`100.00% <0.00%> (ø)`
config/exporter.go	`90.90% <0.00%> (ø)`
config/receiver.go	`90.90% <0.00%> (ø)`
config/extension.go	`90.90% <0.00%> (ø)`
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 528fd56...18aebcb. Read the comment docs.

codeboten

Thanks for catching and fixing this. Please add a changelog entry. Also can you confirm whether or not this addresses #3262

njvrzm · 2022-05-19T22:45:35Z

Thanks for having a look, @codeboten!

This does not address [processors/batch] Size calculations inconsistent, causing unbounded batch size in bytes #3262.
I've added a changelog entry.
Should I try to address the codecov check failure as well?

codeboten

@njvrzm please rebase and push again, there was a change that broke the build for the collector, we can then get this merged

The stat was getting sent before the max batch size was taken into account.

njvrzm requested review from a team and bogdandrutu May 18, 2022 03:58

jaronoff97 reviewed May 18, 2022

View reviewed changes

jaronoff97 approved these changes May 18, 2022

View reviewed changes

codeboten reviewed May 19, 2022

View reviewed changes

codeboten reviewed Jun 1, 2022

View reviewed changes

Send correct batch stats when SendBatchMaxSize is set

18aebcb

The stat was getting sent before the max batch size was taken into account.

codeboten approved these changes Jun 2, 2022

View reviewed changes

codeboten merged commit 65b7b1b into open-telemetry:main Jun 2, 2022

njvrzm deleted the njvrzm/fix_batch_processor_stats_with_max_size branch June 3, 2022 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Send correct batch stats when SendBatchMaxSize is set #5385

fix: Send correct batch stats when SendBatchMaxSize is set #5385

njvrzm commented May 18, 2022

linux-foundation-easycla bot commented May 18, 2022 •

edited

Loading

jaronoff97 May 18, 2022

jaronoff97 May 18, 2022

njvrzm May 18, 2022

jaronoff97 May 18, 2022

jaronoff97 May 18, 2022

njvrzm May 18, 2022

codecov bot commented May 19, 2022 •

edited

Loading

codeboten left a comment

njvrzm commented May 19, 2022 •

edited

Loading

codeboten left a comment

fix: Send correct batch stats when SendBatchMaxSize is set #5385

fix: Send correct batch stats when SendBatchMaxSize is set #5385

Conversation

njvrzm commented May 18, 2022

linux-foundation-easycla bot commented May 18, 2022 • edited Loading

jaronoff97 May 18, 2022

Choose a reason for hiding this comment

jaronoff97 May 18, 2022

Choose a reason for hiding this comment

njvrzm May 18, 2022

Choose a reason for hiding this comment

jaronoff97 May 18, 2022

Choose a reason for hiding this comment

jaronoff97 May 18, 2022

Choose a reason for hiding this comment

njvrzm May 18, 2022

Choose a reason for hiding this comment

codecov bot commented May 19, 2022 • edited Loading

Codecov Report

codeboten left a comment

Choose a reason for hiding this comment

njvrzm commented May 19, 2022 • edited Loading

codeboten left a comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented May 18, 2022 •

edited

Loading

codecov bot commented May 19, 2022 •

edited

Loading

njvrzm commented May 19, 2022 •

edited

Loading