-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bundler to exporter #39
Add bundler to exporter #39
Conversation
@stevencl1013 Please run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but a few bugs and nits. See review comments.
exporter/trace/trace.go
Outdated
if o.BundleDelayThreshold > 0 { | ||
b.DelayThreshold = o.BundleDelayThreshold | ||
} else { | ||
b.DelayThreshold = 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you extract constants to the top of the file, and give them descriptive names, please?
exporter/trace/trace.go
Outdated
b.BundleCountThreshold = 50 | ||
} | ||
if o.NumberOfWorkers > 0 { | ||
b.HandlerLimit = o.NumberOfWorkers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description https://github.com/googleapis/google-api-go-client/blob/cfc62336a21b9af45e2830a9898569d8aac1c5cc/support/bundler/bundler.go#L80-L82, the config parameter should be called MaxNumberOfWorkers
rather than just NumberOfWorkers
, or Just BundleHandlerLimit
to directly refer to the bundler parameter.
exporter/trace/trace.go
Outdated
b.HandlerLimit = o.NumberOfWorkers | ||
} | ||
// The measured "bytes" are not really bytes, see exportReceiver. | ||
b.BundleByteThreshold = b.BundleCountThreshold * 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done right after setting BundleCountThreshold
, to make sure they are set together.
Also, there is no exportReceiver
in this repo?
exporter/trace/trace.go
Outdated
} | ||
// The measured "bytes" are not really bytes, see exportReceiver. | ||
b.BundleByteThreshold = b.BundleCountThreshold * 200 | ||
b.BundleByteLimit = b.BundleCountThreshold * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move next to setting BundleCountThreshold
Looks like some changes aren't committed, please run |
@@ -29,11 +34,19 @@ import ( | |||
type traceExporter struct { | |||
o *options | |||
projectID string | |||
bundler *bundler.Bundler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia As far as I remember, there's some issues on using bundler and that's the reason why I avoided using bundler and used BatchWriteSpans in the first implementation. Could you share the background for the record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ymotongpoo I kind of remember some issue with bundler but I don't remember the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For record: at least the report in OC pointed the possibility of memory leak.
census-ecosystem/opencensus-go-exporter-ocagent#71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From that issue, looks like the memory leak issue only appears when the backend is not reachable. Should be relatively easy to reproduce the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ymotongpoo I don't really understand that issue in the ocagent repo. From the issue:
as long as there is a valid connection to the collector, the “handler” function, i.e. uploadTraces, will be called on the bundle to offload the bundled traces to the collector.
Isn't the handler function called regardless of whether there is a connection? That connection check in the issue is already inside the uploadTraces
function. And in this exporter, there is no such check for a connection, the upload function just calls BatchWriteSpans
.
Also, since the bundler's BufferedByteLimit is set here, the bundler will not keep more than that many bytes in memory.
exporter/trace/cloudtrace.go
Outdated
// TraceSpansBufferMaxBytes is the maximum size (in bytes) of spans that | ||
// will be buffered in memory before being dropped. | ||
// | ||
// If unset, a default of 8MB will be used. | ||
// TraceSpansBufferMaxBytes int | ||
TraceSpansBufferMaxBytes int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be just BufferMaxBytes similar to other option names (which are without TraceSpans).
This could potentially be refactored and shared with metrics.
exporter/trace/cloudtrace.go
Outdated
// BundleDelayThreshold determines the max amount of time | ||
// the exporter can wait before uploading view data or trace spans to | ||
// the backend. | ||
// Optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify the default value.
exporter/trace/cloudtrace.go
Outdated
|
||
// BundleCountThreshold determines how many view data events or trace spans | ||
// can be buffered before batch uploading them to the backend. | ||
// Optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
exporter/trace/cloudtrace.go
Outdated
// for now. The minimum number of workers is 1. | ||
NumberOfWorkers int | ||
// MaxNumberOfWorkers sets the maximum number of go rountines that send requests | ||
// to Stackdriver Trace. The minimum number of workers is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace Stackdriver with 'Cloud Trace'. Here and on line 91.
exporter/trace/cloudtrace.go
Outdated
// UploadFunction is the function used to upload spans to Cloud Trace. Used for | ||
// testing. Defaults to uploadSpans in trace.go. | ||
// Optional. | ||
UploadFunction func(ctx context.Context, spans []*tracepb.Span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use mock server instead of exporting Test-only interface.
exporter/trace/trace.go
Outdated
case o.accum == 0: | ||
o.pause = false | ||
case o.accum == 1: | ||
log.Println("OpenCensus Stackdriver exporter: failed to upload span: buffer full") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/OpenCensus Stackdriver/OpenTelemetry Cloud Trace/
exporter/trace/trace.go
Outdated
o.accum = 0 | ||
o.delay() | ||
default: | ||
log.Printf("OpenCensus Stackdriver exporter: failed to upload %d spans: buffer full", o.accum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
exporter/trace/trace.go
Outdated
o.mu.Lock() | ||
defer o.mu.Unlock() | ||
if !o.pause { | ||
log.Println("OpenCensus Stackdriver exporter: failed to upload span: buffer full") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but leaving up to @ymotongpoo whether a potential memory leak is a blocking issue for merging this.
Thank you @nilebox for asking me comments. As far as I have heard OC exporter users' experiences, I seldom heard the issues that originate to memory leak. However, still the issue is known, and haven't solved yet. So my request here is for this PR to have at least either (preferably both) of the following:
I think we need at least 1. |
@ymotongpoo I don't understand how that issue relates to this exporter. I've looked into it, and I mentioned what I found in this comment: #39 (comment) |
Sorry to come to this one very late. Yesterday I noticed that there are two different versions of the
When setting up your Tracing pipeline, you can configure either of these, by choosing using So, I wanted to make sure you'd taken a look at that & considered what do we gain by doing our own bundling in the exporter as opposed to just making use of the work that's already done when using the Batch Span Processor? |
Follow-up to @james-bebbington 's comment: currently we basically have 2 identical implementations of opentelemetry-operations-go/exporter/trace/trace.go Lines 51 to 64 in 8c1bbc5
What if we remove the FWIW there is already a precedent of this approach:
|
Thanks for the info @james-bebbington and @nilebox, I wasn't aware of the difference between I read in internal documentation that OpenTelemetry exporters should use both batching and async writes, so I still believe this PR implements those functionalities for this exporter when using the
If we were to only support |
No worries, we also weren't aware of it until recently. There is probably some general OpenTelemetry documentation missing with recommendations around how to go about implementing exporters.
Yea we definitely want to support batched & parallel exporting. At the same time, we want to make sure we don't make this confusing for users:
Note we could enforce which setup the user has to use by only implementing one of those interfaces (and optionally also creating a wrapper initialization function to set up the pipeline for them). I'm not really sure what the right thing is to do here though. Any thoughts @rghetia @ymotongpoo @nilebox? (overall this PR is probably okay as is but I just want to make sure we've thought this through) Also note even the Go SDK is not consistent:
As another point of reference, the Python SDK only has a single |
exporter/trace/cloudtrace_test.go
Outdated
@@ -127,14 +134,19 @@ func TestExporter_Timeout(t *testing.T) { | |||
mockTrace.spansUploaded = nil | |||
mockTrace.delay = 20 * time.Millisecond | |||
var exportErrors []error | |||
var wg sync.WaitGroup | |||
wg.Add(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more idiomatic to use a channel for this I think
@stevencl1013 Can we try addressing this issue in the SDK batcher then instead? That would benefit all users of Go SDK rather than just Stackdriver. I'm not sure what to suggest that would allow using bundler without making it confusing tbh:
|
@stevencl1013 Sorry for delayed response. Here's the reason why I mentioned that issue in OC exporter:
Anyway, as @nilebox and @james-bebbington suggested, now we have |
@stevencl1013 I had a offline discussion with @rghetia about the use of bundler. In short, as you confirmed first, we don't check the connectivity and won't in this exporter, so it's safe to use bundler here. Background: |
Apologies for jumping in in the middle of the discussion. As per OTel specification (https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#built-in-span-processors): The OTel standard SDK MUST implement the simple processor (i.e., sync case). With current simple process, a user will experience very high delay for each RPC. This change handles the simple processor case, by brining down latency from ~10sec to ~200 milliseconds. |
For reference, given the way the SDK is set up, we decided there isn't any ideal solution, but for now decided to:
i.e. we will implement this in the same way as the Jaeger exporter (see the Otel Go SDK). FYI @nilebox I know you had some reservations around the explicitness of the word @stevencl1013 - in the next PR it'd also be good to add/update the |
Bundles spans together and uploads them in separate worker threads