-
Notifications
You must be signed in to change notification settings - Fork 60
Update collector code to use [Trace|Metrics]Data. #431
Update collector code to use [Trace|Metrics]Data. #431
Conversation
I think we should change |
@@ -80,15 +80,11 @@ func NewJaegerThriftHTTPSender( | |||
} | |||
|
|||
// ProcessSpans sends the received data to the configured Jaeger Thrift end-point. | |||
func (s *JaegerThriftHTTPSender) ProcessSpans(batch *agenttracepb.ExportTraceServiceRequest, spanFormat string) (uint64, error) { | |||
func (s *JaegerThriftHTTPSender) ProcessSpans(td data.TraceData, spanFormat string) (uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q.: why passing td
by value? in the bigger scheme of things this shouldn't make much difference either way but I am to know what it the reasoning behind the choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple reasons to use value:
- Size is small because we embed pointers.
- Avoid heap allocation of this object
- Passing by value means that the next processor cannot modify it, for example if you have a multi processor than fans this out you don't want the first processor to modify the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I really see the benefit of point 3. Since we are carrying pointers to the underlying objects, if those objects are changed, those changes will be reflected in all other processors, so the first processor could effectively modify the object. td's pointers could not be changed to point at new objects i guess, but i think its very unlikely someone tries to do that instead of just modifying the objects it points to.
That said, I do not have a problem with passing td by value since its a very small object as you said, and it does avoid the heap allocation of a very pointer-filled object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not worrying about #1 the size of the structure versus the pointer on the call (it is a small structure), this only becomes a concern on rare cases.
For #2 when we receive the large chunk of the data is on the heap but more to the point passing the pointer down the pipe shouldn't cause extra allocation, anyway, probably the difference here is marginal either way.
For #3 some processors can change the data, e.g.: adding global-tags, since the data contain maps that's not really avoidable without deep copies. I think that having it as a pointer make clear that changes will be seen by others. I see the problem with parallel stages on the pipe modifying the data but I don't think that this protects us against that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always change back to pointer if we determine any problem, but this is also compatible with the https://github.com/census-instrumentation/opencensus-service/blob/master/processor/processor.go so I want to make your processor be the same as that one and simply remove your own processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can change it later if we decide to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoid heap allocation of this object
The heap vs stack allocation argument in other languages doesn't work for Go. A value could be allocated on the heap regardless of if it is a pointer or not, and it is only dependent upon escape analysis that can be determined at compile time. For example, given a non-pointer value used in two different contexts
package main
import "fmt"
func foo() {
x := "abcdef"
fmt.Println(x)
}
func bar() {
x := "abcdef"
println(x)
}
when compiled with
go tool compile -m main.go
produces
main.go:5:6: can inline foo
main.go:7:13: inlining call to fmt.Println
main.go:10:6: can inline bar
main.go:7:13: x escapes to heap
main.go:7:13: io.Writer(os.Stdout) escapes to heap
main.go:7:13: foo []interface {} literal does not escape
<autogenerated>:1: os.(*File).close .this does not escape
The only way to safely ensure it doesn't escape is to use the //go:noescape
pragma directive but that's quite unnecessary and will riddle this code with too much magic.
- Passing by value means that the next processor cannot modify it
Passing by value is going to always create a copy of the variable, so let's just be ware of that for the future about how much memory is going to be copied around. I think let's ensure that no processor starts modifying its arguments when passing by pointers. I'd be shocked if any of the Go code in this repository (or OpenCensus repositories) modifies arguments.
I personally wasn't enthusiastic about the change to use MetricData and TraceData as non-pointers but I let it slide as I found the change was already merged but that's alright as it was minimal.
Overall though, sure I think this change should help unify things and should go in. We can switch to using pointers later on if need but I just wanted to clarify that we should be careful about the assumptions on benefits of pointer vs non-pointer as those don't apply necessarily in Go. Anyways since no one seems to be importing these packages, we can switch back as warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @odeke-em for the explanation. I agree that we can change this since we don't make any guarantee on the interfaces stability for the moment.
1c5e141
to
d34eb7a
Compare
@sjkaris I disagree with the statement. If you add a new field in the proto you still need to change all the files to use the new field if necessary, if that field is not important for all the processors you change only the once that need to use the new field. The extra change that we need right now without embedding the main proto is that we have to add the new field in the TraceData. |
@bogdandrutu Hmm, if a field is added to the proto request, and then the proto-lib is updated, we will pass the new proto value along for free if we embed the request struct, with no code changes needed (for the oc to oc flow). If we don't do this, it is required that we update our data model to include this new field. On the flip, what is the downside of using the full object rather than just its fields? |
@sjkaris who is populated that field? If the proto comes from the wire then we need to also update the OCTrace receiver, I agree. The downside is that the request proto is design to be "network friendly" for example the Resource is not present if it is the same as previous message, or things like that, which I think we already take care in the receiver. Probably it makes sense in the OCTrace receiver to actually populate the Resource in each span then TraceData will be only the Spans. |
d34eb7a
to
8387c2a
Compare
8387c2a
to
f76032b
Compare
Ready for final review. |
Codecov Report
@@ Coverage Diff @@
## master #431 +/- ##
========================================
- Coverage 63.6% 63.4% -0.2%
========================================
Files 41 42 +1
Lines 3363 3375 +12
========================================
+ Hits 2139 2140 +1
- Misses 1069 1081 +12
+ Partials 155 154 -1
Continue to review full report at Codecov.
|
No description provided.