-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Send correct batch stats when SendBatchMaxSize is set #5385
fix: Send correct batch stats when SendBatchMaxSize is set #5385
Conversation
|
@@ -244,17 +245,24 @@ func (bt *batchTraces) add(item interface{}) { | |||
td.ResourceSpans().MoveAndAppendTo(bt.traceData.ResourceSpans()) | |||
} | |||
|
|||
func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int) error { | |||
func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int, returnBytes bool) (int, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we do the stats.Record
call within export rather than returning it? That way we don't need to make any function signature changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can see why that would be annoying because then instead of a single stats record call, you have to make one in each batch processor. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd rather change the signature, especially since it's only used in one place, than duplicate the stats recording code.
@@ -244,17 +245,24 @@ func (bt *batchTraces) add(item interface{}) { | |||
td.ResourceSpans().MoveAndAppendTo(bt.traceData.ResourceSpans()) | |||
} | |||
|
|||
func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int) error { | |||
func (bt *batchTraces) export(ctx context.Context, sendBatchMaxSize int, returnBytes bool) (int, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can see why that would be annoying because then instead of a single stats record call, you have to make one in each batch processor. What do you think?
} | ||
|
||
if err := bp.batch.export(bp.exportCtx, bp.sendBatchMaxSize); err != nil { | ||
detailed := bp.telemetryLevel == configtelemetry.LevelDetailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we call out that we may want to remove this given it's barely used and almost always desired in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the default for this setting is LevelBasic
, so eliminating the check here would be a behavior change.
Codecov Report
@@ Coverage Diff @@
## main #5385 +/- ##
==========================================
- Coverage 90.89% 90.88% -0.01%
==========================================
Files 191 190 -1
Lines 11421 11446 +25
==========================================
+ Hits 10381 10403 +22
- Misses 819 822 +3
Partials 221 221
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching and fixing this. Please add a changelog entry. Also can you confirm whether or not this addresses #3262
Thanks for having a look, @codeboten!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njvrzm please rebase and push again, there was a change that broke the build for the collector, we can then get this merged
The stat was getting sent before the max batch size was taken into account.
Description:
This fixes a bug with the batch processor's
batch_send_size
andbatch_send_size_bytes
metrics. Their values were being calculated beforeSendBatchMaxSize
was applied.We observed this issue during performance testing. With
SendBatchMaxSize
set to a small enough value that it almost always took effect, graphs ofbatch_send_size
showed an odd sawtooth pattern and the total values were far in excess of the number of items actually sent. Looking at some individual metrics the issue was clear - with aSendBatchMaxSize
of 100, for instance, we'd seebatch_send_size
metrics looking like 1000, 900, 800, 700... as the full size of the batch queue was recorded but not sent each time.With this change the various
export
methods report the actual count of items sent, and byte size sent if requested, and thesendItems
method records those values.Testing:
I added a test called
TestBatchProcessorSentBySize_withMaxSize
, based onTestBatchProcessorSentBySize
but withSendBatchMaxSize
set and with all spans delivered in a single request so that the batch size is predictable. It does not attempt to validate thebatch_send_size_bytes
metric - the splitting of batches makes the method used in the original test fail due to different amounts of overhead.(Incidentally,
TestBatchProcessorSentBySize
itself is rather brittle. Changing eithersendBatchSize
orspansPerRequest
so that the former is not a multiple of the latter makes the test fail in several ways.)