-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: Support 128bit trace id with zipkin tracer #3386
Conversation
Signed-off-by: Gary Brown <[email protected]>
Was trying to think about best way to handle migrating to the 128bit trace ids - whether a new config option should be added that was disabled until all of the proxies had been updated to the appropriate version and then coordinate the config update. However in reality the only reason 128bit ids would be used is if: So one way to control the rollout of the change is to ensure the ingress proxy is the last to be upgrade. |
@adriancole do you mind taking a quick Zipkin sanity pass on this? |
@mattklein123 While waiting for sanity check - would it be worth discussing the implications for rolling this out? As not clear what the best option might be. |
ya sorry usually I'm quicker, but holiday beckoned. taking a look now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one bug, two adjustments, one nit
otherwise looks good!
|
||
/** | ||
* Constructor that creates a context object from the supplied trace, span and | ||
* parent ids, and sampled flag. | ||
* | ||
* @param trace_id The trace id. | ||
* @param trace_id The low 64 bits of the trace id. | ||
* @param trace_id_high The high 64 bits of the trace id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: usually we put the high bits before the low bits when ordering
ex trace_id_high then trace_id
/** | ||
* @return whether using 128 bit trace id. | ||
*/ | ||
bool is128BitTraceId() const { return trace_id_high_ > 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean != 0
zero is an invalid set of high bits, negative is fine
@@ -33,6 +33,7 @@ SpanPtr Tracer::startSpan(const Tracing::Config& config, const std::string& span | |||
uint64_t random_number = random_generator_.random(); | |||
span_ptr->setId(random_number); | |||
span_ptr->setTraceId(random_number); | |||
span_ptr->setTraceIdHigh(random_generator_.random()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this always uses 128-bit? Probably fine by now, but might break some folks. Usually, we make this optional setting. ex https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/Tracing.java#L270
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS this comment is about the implications of rollout. As-is, an old downstream service might break if they aren't prepared to consume 128-bit trace IDs. usually the behavior is to restart the trace, but not everything is consistent. That said, many libraries handle big IDs by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mattklein123 on this one..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriancole Originally wasn't sure whether to make it configurable, as seems like 128 is fairly standard now - but there is potential for it causing disruption in a mesh if using different versions of envoy. So I will look to make it configurable, with default of 64 bit - but atleast the newer version will be able to handle a 128bit id even if it doesn't use it for new trace instances by default.
@@ -479,7 +479,10 @@ class Span : public ZipkinBase { | |||
/** | |||
* @return the span's trace id as a hexadecimal string. | |||
*/ | |||
const std::string traceIdAsHexString() const { return Hex::uint64ToHex(trace_id_); } | |||
const std::string traceIdAsHexString() const { | |||
return (trace_id_high_.has_value() ? Hex::uint64ToHex(trace_id_high_.value()) : "") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know cpp, but guessing the conditional would be quicker made in a way that doesn't always cat values?
ex.
return trace_id_high_.has_value()
? Hex::uint64ToHex(trace_id_high_.value()) + Hex::uint64ToHex(trace_id_)
: Hex::uint64ToHex(trace_id_);
TEST_F(ZipkinDriverTest, ZipkinSpanContextFromB3Headers128TraceIdTest) { | ||
setupValidDriver(); | ||
|
||
const std::string trace_id = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually test this.. below you aren't really testing this literally. For example, test that the high bits are in the right spot (so the input left hex characters are trace_id_high and right ones are trace_id)
What's the concern? |
Signed-off-by: Gary Brown <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! thanks
@adriancole Thanks for the review - I'll add the configuration option as I think it will avoid any upgrade issues for users. |
…oid impacting users Signed-off-by: Gary Brown <[email protected]>
Signed-off-by: Gary Brown <[email protected]>
Signed-off-by: Gary Brown <[email protected]>
docs/root/api-v1/tracing.rst
Outdated
@@ -55,7 +55,8 @@ Zipkin driver | |||
"type": "zipkin", | |||
"config": { | |||
"collector_cluster": "...", | |||
"collector_endpoint": "..." | |||
"collector_endpoint": "...", | |||
"trace_id_128bit": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add any new v1 config options. For v1 configs just hard code it to off during the translation? Do you mind unwinding these changes and the schema changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated - let me know if I have missed anything or misunderstood.
Signed-off-by: Gary Brown <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, just some minor comments. Thank you!
/** | ||
* @return the sampled flag. | ||
*/ | ||
bool sampled() const { return sampled_; } | ||
|
||
private: | ||
uint64_t trace_id_high_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this var be const? Same for the others?
private: | ||
const std::string service_name_; | ||
Network::Address::InstanceConstSharedPtr address_; | ||
ReporterPtr reporter_; | ||
Runtime::RandomGenerator& random_generator_; | ||
bool trace_id_128bit_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't const due to the setter. Would you prefer this was added as a constructor parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general that would be clear if it fits in the code. Is it easy to do and get rid of the setter? Not a big deal if it's not possible.
@@ -40,6 +40,7 @@ class ZipkinCoreConstantValues { | |||
const std::string NOT_SAMPLED = "0"; | |||
|
|||
const std::string DEFAULT_COLLECTOR_ENDPOINT = "/api/v1/spans"; | |||
const bool DEFAULT_TRACE_ID_128BIT = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed (see below)
@@ -68,10 +68,14 @@ Driver::Driver(const Json::Object& config, Upstream::ClusterManager& cluster_man | |||
const std::string collector_endpoint = | |||
config.getString("collector_endpoint", ZipkinCoreConstants::get().DEFAULT_COLLECTOR_ENDPOINT); | |||
|
|||
tls_->set([this, collector_endpoint, &random_generator]( | |||
const bool trace_id_128bit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always going to return false. You can just remove these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused, as when I test with the jaegertracing example updating the front-proxy yaml to include the 'trace_id_128bit: true', it uses a 128bit trace id - so this must mean that the value in the yaml is being converted to json and accessed via this condition?
If that is not how the v2 config is working, is there some docs that explain how the v2 and v1 configs are being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From further examination of the code it looks like many components have a factory that can take the v1 json or v2 proto message - however the tracer extension doesn't seem to have been updated to support the v2 proto config directly - so assume it is json conversion from the proto message? If so, then may be better to leave this in for now until proper separation between the v1 and v2 config for the tracer extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v2 json isn't exactly the same as v2 proto (for example in proto we use byte arrays for IDs)
https://github.com/openzipkin/zipkin-api/blob/master/zipkin2-api.yaml#L298
https://github.com/openzipkin/zipkin-api/blob/master/zipkin.proto
Note the proto format is quite new, so requires a very recent version of zipkin-server or clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriancole Sorry Adrian, I wasn't clear - I am referring to the envoy v1/v2 APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@objectiser hmm. Sorry. I guess we don't have direct proto factory functions for the tracers yet. Sorry, I thought we had already added this. This will need to be added. I will open a follow up issue for this. Yes don't worry about it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Extract trace id - which can either be 128 or 64 bit. For 128 bit, | ||
// it needs to be divided into two 64 bit numbers (high and low). | ||
if (request_headers.XB3TraceId()->value().size() == 32) { | ||
std::string tid = request_headers.XB3TraceId()->value().c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const for all these local vars
@@ -20,7 +20,8 @@ TEST(ZipkinTracerConfigTest, ZipkinHttpTracer) { | |||
std::string valid_config = R"EOF( | |||
{ | |||
"collector_cluster": "fake_cluster", | |||
"collector_endpoint": "/api/v1/spans" | |||
"collector_endpoint": "/api/v1/spans", | |||
"trace_id_128bit": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert, and make a new v2 style YAML test to exercise the new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have reverted the change. May need discussion on the type of v2 test, based on the outcome of the v1/v2 config discussion above.
@@ -106,7 +106,8 @@ TEST_F(ZipkinDriverTest, InitializeDriver) { | |||
std::string valid_config = R"EOF( | |||
{ | |||
"collector_cluster": "fake_cluster", | |||
"collector_endpoint": "/api/v1/spans" | |||
"collector_endpoint": "/api/v1/spans", | |||
"trace_id_128bit": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per other comments these will need to switch to v2 YAML tests.
Signed-off-by: Gary Brown <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, LMK if you want to change the setter thing or not. Thank you!
I can make the change but will have to do it later or tomorrow as out today. |
Signed-off-by: Gary Brown <[email protected]>
Signed-off-by: Gary Brown <[email protected]>
@mattklein123 Hopefully should be ok now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
Thanks a lot @objectiser and @mattklein123 for the review! |
Description:
Support 128 bit
X-B3-TraceId
. Currently if a 128bit id is provided by an external application, it results in aNullSpan
so the activity performed in the proxy is not reported to the backend tracing system. New trace instances created by the proxy will now also use a 128bit trace id.NOTE: If a service mesh includes proxies with a mix of versions, some with and without this change, an older proxy will no longer trace activity. It is the same as if (with the current version) an application outside the mesh provides a trace context using the 128bit trace id.
Risk Level: Low
Testing:
Test added to confirm support for 128bit id. The
Span
already had support for the high part of the trace id - just was never used - but tests were already included.Manual testing using the
zipkin-tracing
andjaeger-tracing
examples, supplying no trace id (so generating 128bit id in proxy), and supplying both 64bit and 128bit trace ids to check works with both.Docs Changes:
Don't believe any documentation changes are required, as the format of the trace id is not discussed (I believe).
Release Notes:
TO BE DONE - once it is decided how best to deal with the impact.
May address issue #3361
Signed-off-by: Gary Brown [email protected]