Skip to content
This repository has been archived by the owner on Nov 7, 2022. It is now read-only.

Update collector code to use [Trace|Metrics]Data. #431

Merged

Conversation

bogdandrutu
Copy link

@bogdandrutu bogdandrutu commented Feb 22, 2019

No description provided.

@bogdandrutu bogdandrutu requested a review from a team as a code owner February 22, 2019 02:46
@bogdandrutu bogdandrutu requested a review from songy23 February 22, 2019 02:46
@sjkaris
Copy link

sjkaris commented Feb 22, 2019

I think we should change data.TraceData to contain an ExportTraceServiceRequest instead of the objects it contains, so that we don't have to change all these files in the event that something is added to the proto request object.

@flands flands added this to the 0.2.0 milestone Feb 22, 2019
@@ -80,15 +80,11 @@ func NewJaegerThriftHTTPSender(
}

// ProcessSpans sends the received data to the configured Jaeger Thrift end-point.
func (s *JaegerThriftHTTPSender) ProcessSpans(batch *agenttracepb.ExportTraceServiceRequest, spanFormat string) (uint64, error) {
func (s *JaegerThriftHTTPSender) ProcessSpans(td data.TraceData, spanFormat string) (uint64, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q.: why passing td by value? in the bigger scheme of things this shouldn't make much difference either way but I am to know what it the reasoning behind the choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple reasons to use value:

  1. Size is small because we embed pointers.
  2. Avoid heap allocation of this object
  3. Passing by value means that the next processor cannot modify it, for example if you have a multi processor than fans this out you don't want the first processor to modify the object.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I really see the benefit of point 3. Since we are carrying pointers to the underlying objects, if those objects are changed, those changes will be reflected in all other processors, so the first processor could effectively modify the object. td's pointers could not be changed to point at new objects i guess, but i think its very unlikely someone tries to do that instead of just modifying the objects it points to.

That said, I do not have a problem with passing td by value since its a very small object as you said, and it does avoid the heap allocation of a very pointer-filled object.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not worrying about #1 the size of the structure versus the pointer on the call (it is a small structure), this only becomes a concern on rare cases.

For #2 when we receive the large chunk of the data is on the heap but more to the point passing the pointer down the pipe shouldn't cause extra allocation, anyway, probably the difference here is marginal either way.

For #3 some processors can change the data, e.g.: adding global-tags, since the data contain maps that's not really avoidable without deep copies. I think that having it as a pointer make clear that changes will be seen by others. I see the problem with parallel stages on the pipe modifying the data but I don't think that this protects us against that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always change back to pointer if we determine any problem, but this is also compatible with the https://github.com/census-instrumentation/opencensus-service/blob/master/processor/processor.go so I want to make your processor be the same as that one and simply remove your own processor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can change it later if we decide to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Avoid heap allocation of this object

The heap vs stack allocation argument in other languages doesn't work for Go. A value could be allocated on the heap regardless of if it is a pointer or not, and it is only dependent upon escape analysis that can be determined at compile time. For example, given a non-pointer value used in two different contexts

package main

import "fmt"

func foo() {
	x := "abcdef"
	fmt.Println(x)
}

func bar() {
	x := "abcdef"
	println(x)
}

when compiled with

go tool compile -m main.go

produces

main.go:5:6: can inline foo
main.go:7:13: inlining call to fmt.Println
main.go:10:6: can inline bar
main.go:7:13: x escapes to heap
main.go:7:13: io.Writer(os.Stdout) escapes to heap
main.go:7:13: foo []interface {} literal does not escape
<autogenerated>:1: os.(*File).close .this does not escape

The only way to safely ensure it doesn't escape is to use the //go:noescape pragma directive but that's quite unnecessary and will riddle this code with too much magic.

  1. Passing by value means that the next processor cannot modify it

Passing by value is going to always create a copy of the variable, so let's just be ware of that for the future about how much memory is going to be copied around. I think let's ensure that no processor starts modifying its arguments when passing by pointers. I'd be shocked if any of the Go code in this repository (or OpenCensus repositories) modifies arguments.

I personally wasn't enthusiastic about the change to use MetricData and TraceData as non-pointers but I let it slide as I found the change was already merged but that's alright as it was minimal.

Overall though, sure I think this change should help unify things and should go in. We can switch to using pointers later on if need but I just wanted to clarify that we should be careful about the assumptions on benefits of pointer vs non-pointer as those don't apply necessarily in Go. Anyways since no one seems to be importing these packages, we can switch back as warranted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @odeke-em for the explanation. I agree that we can change this since we don't make any guarantee on the interfaces stability for the moment.

@bogdandrutu
Copy link
Author

@sjkaris I disagree with the statement. If you add a new field in the proto you still need to change all the files to use the new field if necessary, if that field is not important for all the processors you change only the once that need to use the new field. The extra change that we need right now without embedding the main proto is that we have to add the new field in the TraceData.

@sjkaris
Copy link

sjkaris commented Feb 23, 2019

@bogdandrutu Hmm, if a field is added to the proto request, and then the proto-lib is updated, we will pass the new proto value along for free if we embed the request struct, with no code changes needed (for the oc to oc flow). If we don't do this, it is required that we update our data model to include this new field. On the flip, what is the downside of using the full object rather than just its fields?

@bogdandrutu
Copy link
Author

@sjkaris who is populated that field? If the proto comes from the wire then we need to also update the OCTrace receiver, I agree.

The downside is that the request proto is design to be "network friendly" for example the Resource is not present if it is the same as previous message, or things like that, which I think we already take care in the receiver. Probably it makes sense in the OCTrace receiver to actually populate the Resource in each span then TraceData will be only the Spans.

@bogdandrutu
Copy link
Author

Ready for final review.

@codecov
Copy link

codecov bot commented Feb 23, 2019

Codecov Report

Merging #431 into master will decrease coverage by 0.19%.
The diff coverage is 65.06%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #431     +/-   ##
========================================
- Coverage    63.6%   63.4%   -0.2%     
========================================
  Files          41      42      +1     
  Lines        3363    3375     +12     
========================================
+ Hits         2139    2140      +1     
- Misses       1069    1081     +12     
+ Partials      155     154      -1
Impacted Files Coverage Δ
internal/collector/processor/exporter_processor.go 0% <0%> (ø) ⬆️
processor/processortest/concurrent_sinks.go 0% <0%> (ø)
internal/collector/processor/processor_to_sink.go 0% <0%> (ø) ⬆️
internal/collector/processor/processor.go 0% <0%> (ø) ⬆️
internal/collector/processor/multi_processor.go 81.63% <100%> (ø) ⬆️
...anslator/trace/jaeger/protospan_to_jaegerthrift.go 82.08% <100%> (+0.6%) ⬆️
translator/trace/zipkin/zipkinv1_to_protospan.go 89.16% <100%> (ø) ⬆️
receiver/jaegerreceiver/trace_receiver.go 80.74% <100%> (ø) ⬆️
...lator/trace/zipkin/zipkinv1_thrift_to_protospan.go 74.32% <100%> (ø) ⬆️
...al/collector/processor/nodebatcher/node_batcher.go 78.18% <100%> (+0.45%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bad985a...75f7dbc. Read the comment docs.

@bogdandrutu bogdandrutu merged commit c54ee82 into census-instrumentation:master Feb 25, 2019
@bogdandrutu bogdandrutu deleted the updatecollector branch February 25, 2019 01:02
fivesheep pushed a commit to fivesheep/opencensus-service that referenced this pull request Jun 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants