Added Python3 support #43

xdralex · 2017-05-02T21:46:20Z

Preamble

I have tried to use Jaeger client from the Python 3 application, which didn't work because of incompatibilities, and switching back to Py2 was not really an option.

This PR ensures that all tests complete successfully in Python 2.7, and most of them (except two specific cases described below) – in Python 3.6 when running make bootstrap; make test goals. A sample code below sending spans to Jaeger server seems to be working fine as well.

from jaeger_client import Config
config = Config(config={'sampler': {'type': 'const', 'param': True}}, service_name='hello')
tracer = config.initialize_tracer()
for n in range(0, 10):
    with tracer.start_span('why') as span:
        with tracer.start_span('is', child_of=span):
            time.sleep(3)
        with tracer.start_span('my', child_of=span):
            time.sleep(4)
        time.sleep(1)
        span.log_event('wifi', payload={'not working': 42})
    print('YEAH')
tracer.close()

Summary of the changes

Thrift version in the Makefile was changed to 0.10.0 and thrift_gen files were recreated with make thrift goal. This should have made generated Thrift files to be Python 3 compatible.
Files in directories crossdock, crossdock/server, jaeger_client, tests directories were updated using futurize script, which comes from a Python Future library. The script took care of syntax incompatibilities/changes such as 0L number literals, dict.iteritems(), integer divisions and so on.
```
futurize --stage1 -w *.py
futurize --stage2 -w *.py
```

In Python 2, string literals defined as 'xyz' are pretty much equivalent to bytes and can be alternatively written as b'xyz'. In Python 3, 'xyz' is a unicode string, and to make it bytes a b prefix is mandatory: b'xyz'. As far as I understand, it's desired to support unicode strings on the client side, but BinaryAnnotation supports only bytes.

Moreover, in Python 3 certain tests were failing with (iter13 is a BinaryAnnotation):

tests/test_thrift.py:107: in test_large_ids
    serialize(trace_id)
tests/test_thrift.py:104: in serialize
    _marshall_span(span)
tests/test_thrift.py:90: in _marshall_span
    args.write(prot)
jaeger_client/thrift_gen/agent/Agent.py:160: in write
    iter6.write(oprot)
jaeger_client/thrift_gen/zipkincore/ttypes.py:530: in write
    iter13.write(oprot)
jaeger_client/thrift_gen/zipkincore/ttypes.py:326: in write
    oprot.writeBinary(self.value)
env3/lib/python3.6/site-packages/thrift/protocol/TCompactProtocol.py:42: in nested
    return func(self, *args, **kwargs)
env3/lib/python3.6/site-packages/thrift/protocol/TCompactProtocol.py:272: in __writeBinary
    self.trans.write(s)
env3/lib/python3.6/site-packages/thrift/transport/TTransport.py:231: in write
    self._buffer.write(buf)
E   TypeError: a bytes-like object is required, not 'str'

To fix that, methods like thrift.make_string_tag were changed to accept utf8 strings (in Python 3) and 8-bit strings (in Python 2) and convert them into bytes representation suitable for the BinaryAnnotation.

I have checked utf8 strings in spans from Python 3 – seems to working fine with Jaeger server:

config = Config(config={'sampler': {'type': 'const', 'param': True}}, service_name='hello привет')
...
    with tracer.start_span('why почему') as span:

What's missing

There are still two tests failing in Python 3:

For the first one, it seems that opentracing_instrumentation package itself is not Python 3 compatible.

tests/test_crossdock.py:29: in <module>
    from crossdock.server import server
crossdock/server/server.py:15: in <module>
    from opentracing_instrumentation.client_hooks import tornado_http
env3/lib/python3.6/site-packages/opentracing_instrumentation/client_hooks/tornado_http.py:25: in <module>
    import urlparse
E   ModuleNotFoundError: No module named 'urlparse'

Regarding the second, I'm not entirely sure what to do – seems like MagicMock doesn't overload comparison operations correctly in Python 3.

tests/test_sampler.py:340: in test_remotely_controlled_sampler
    sampler._delayed_polling()
jaeger_client/sampler.py:400: in _delayed_polling
    periodic.start()  # start the periodic cycle
env3/lib/python3.6/site-packages/tornado/ioloop.py:1006: in start
    self._schedule_next()
env3/lib/python3.6/site-packages/tornado/ioloop.py:1036: in _schedule_next
    if self._next_timeout <= current_time:
E   TypeError: '<=' not supported between instances of 'MagicMock' and 'MagicMock'

Curious to hear your thoughts!

CLAassistant · 2017-05-02T21:46:30Z

All committers have signed the CLA.

coveralls · 2017-05-02T22:08:23Z

Coverage increased (+0.2%) to 94.565% when pulling cfe390c on stitchfix:python3 into 46bfb9e on uber:master.

yurishkuro

this is looking like a scary change to me. There are some perf concerns, and compatibility concerns, and given our Python footprint we cannot afford releasing a breaking change to this library.

re opentracing_instrumentation, I think it would also need to be upgraded to 3.x before we can merge this.

yurishkuro · 2017-05-03T01:26:10Z

crossdock/server/endtoend.py

@@ -99,7 +101,7 @@ def generate_traces(self, request, response_writer):
        tracer = self.tracers[sampler_type]
        for _ in range(req.get('count', 0)):
            span = tracer.start_span(req['operation'])
-            for k, v in req.get('tags', {}).iteritems():
+            for k, v in req.get('tags', {}).items():


this is not an equivalent change. itemitems() in 2.7 returns an iterator, while items() creates a full copy.

Yeah, I agree. I was coming from an assumption that there will be not so many tags per span and spans will be sampled (i.e. 1 span per 100 or 1000 operation runs) so the performance was not a huge concern from my point of view.

To keep performance on the same level, something like this helper function should probably work better:

def iteritems(d): if sys.version_info[0] == 2: return d.iteritems() else: return iter(d.items())

Well, it's already a part of the six package – I'll go ahead and change that.

Ok addressed this in 2612c73

yurishkuro · 2017-05-03T01:26:37Z

crossdock/server/serializer.py

@@ -93,7 +94,7 @@ def traced_service_object_to_json(obj):


 def set_traced_service_object_values(obj, values, downstream_func):
-    for k in values.iterkeys():
+    for k in values.keys():


I assume similar issue here, unnecessary allocation in 2.7

yurishkuro · 2017-05-03T01:30:46Z

jaeger_client/codecs.py

-        parent_id = long(parts[2], 16)
+        trace_id = int(parts[0], 16)
+        span_id = int(parts[1], 16)
+        parent_id = int(parts[2], 16)


why is int equivalent to long? isn't it dependent on the architecture?

As far as I understand int and long types were pretty much unified as of Python 2.4: https://www.python.org/dev/peps/pep-0237/. In Python 3 any further distinction was erased: there is no long() function and no L postfix for long integer types.

Here is an output from my 2.7 console:

Python 2.7.12 (default, Oct 11 2016, 05:20:59) >>> int("1000000000000000000", 16) 4722366482869645213696L >>> long("1000000000000000000", 16) 4722366482869645213696L >>> long("10", 16) 16L >>> int("10", 16) 16

This also means that replacing _max_unsigned_id = (1L << 64) with _max_unsigned_id = (1 << 64) somewhere else in this PR is also probably safe:

Python 2.7.12 (default, Oct 11 2016, 05:20:59) >>> 1L << 64 18446744073709551616L >>> 1 << 64 18446744073709551616L

yurishkuro · 2017-05-03T01:32:20Z

jaeger_client/constants.py

@@ -33,10 +33,10 @@
 DEFAULT_FLUSH_INTERVAL = 1

 # Name of the HTTP header used to encode trace ID
-TRACE_ID_HEADER = b'uber-trace-id'


I remember we explicitly ran into an issue with this string being Unicode in some instrumentation of urllib2. Why can we not keep this as b?

I believe in Python 2.7 string and bytes types are essentially equivalent:

Python 2.7.12 (default, Oct 11 2016, 05:20:59) >>> a = 'helloпривет' >>> b = b'helloпривет' >>> a 'hello\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82' >>> b 'hello\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82' >>> type(a) <type 'str'> >>> type(b) <type 'str'>

They are different in Python 3 though:

>>> a='helloпривет' >>> b=b'helloпривет' File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters. >>> b=b'hello' >>> a 'helloпривет' >>> b b'hello' >>> type(a) <class 'str'> >>> type(b) <class 'bytes'>

So, with this string being explicitly marked as bytes, I had the following error when running tests in Python3:

tests/test_tracer.py:178: in test_tracer_tags_hostname t = Tracer(service_name='x', reporter=reporter, sampler=sampler) jaeger_client/tracer.py:62: in __init__ debug_id_header=debug_id_header, jaeger_client/codecs.py:56: in __init__ self.trace_id_header = trace_id_header.lower().replace('_', '-') E TypeError: a bytes-like object is required, not 'str'

Which makes sense, because '_' is a string object, not bytes in Python 3. Changing trace_id_header.lower().replace('_', '-') to trace_id_header.lower().replace(b'_', b'-') was fixing this test, but crashing few other ones which were using trace id string without b prefix:

def test_context_from_readable_headers(self): # provide headers all the way through Config object config = Config( service_name='test', config={ 'trace_id_header': 'Trace_ID', 'baggage_header_prefix': 'Trace-Attr-', }) tracer = config.create_tracer( ... tests/test_codecs.py:170: in test_context_from_readable_headers sampler=ConstSampler(True), jaeger_client/config.py:279: in create_tracer debug_id_header=self.debug_id_header, jaeger_client/tracer.py:62: in __init__ debug_id_header=debug_id_header, jaeger_client/codecs.py:56: in __init__ self.trace_id_header = trace_id_header.lower().replace(b'_', b'-') E TypeError: replace() argument 1 must be str, not bytes

So, weighing the options of changing every affected test and changing just single line TRACE_ID_HEADER = b'uber-trace-id' I've chosen the one with the least amount of changes.

Talking about urllib2 issue, I'm not sure what could have caused it given that bytes and strings are same in Py2 – maybe you have an example?

I found these comments in the commit when we changed headers to b:

Force plain (non-unicode) strings
Summary:
image upload was going through multipart form submission code path previously untested with Jaeger, and was failing with 'utf8' codec can't decode byte 0xff in position 152: invalid start byte. Turns out it was due to httplib getting confused on unicode strings used as Jaeger headers. This change forces those headers to be plain strings.

And this was the error stack trace

File "opentracing_instrumentation/client_hooks/urllib2.py", line 97, in https_open return self.do_open(req, httplib.HTTPSConnection) File "opentracing_instrumentation/client_hooks/urllib2.py", line 54, in do_open resp = urllib2.AbstractHTTPHandler.do_open(self, conn, req) File "python2.7/urllib2.py", line 1174, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "python2.7/httplib.py", line 966, in request self._send_request(method, url, body, headers) File "python2.7/httplib.py", line 1000, in _send_request self.endheaders(body) File "python2.7/httplib.py", line 962, in endheaders self._send_output(message_body) File "python2.7/httplib.py", line 820, in _send_output msg += message_body

The += is what was failing when the headers were defined as unicode strings.

I guess I should've written a test for that (facepalm)

@yurishkuro Seems like I can't reproduce the error, so a test case would be beneficial...

I'm also not sure how switching 'uber-trace-id' to b'uber-test-case' would help alone – to my knowledge these two should be equivalent in Python 2.

Here is what I've tried to do (with some variations):

# coding=utf-8 import urllib2 headers = { 'User-Agent': 'Mozilla/5.0', 'header_хидер': (u'value_значение').encode('utf-8'), 'header_klüft_skräms_große': (u'À quelle fréquence envoyez-vous des données étranges?').encode('utf-8') } body = 'uber-trace-id' r = urllib2.Request('http://localhost:11111', data=body, headers=headers) response = urllib2.urlopen(r) data = response.read() print(response) print('\n\n----------------------\n\n') print(data)

So here's the scenario in Python 2.7, which I believe reflects what was happening:

>>> x=b'x' >>> x 'x' >>> y=u'y' >>> y u'y' >>> b=bytes(chr(255)) >>> b '\xff' >>> x+b 'x\xff' >>> y+b Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

The code in httplib.py looks like this

def _send_output(self, message_body=None): . . . if isinstance(message_body, str): msg += message_body

Now, I don't know why message_body contained byte sequence that wasn't a valid Unicode, but it did happen in production code (as I mention, in the image upload, fwiw). It's my understanding that normally that HTTP request's buffer was composed of non-Unicode string, which shown in the example to be OK with concatenating non-Unicode sequence (x+b). However, when tracing headers were added and those headers were defined without b prefix, they we automatically converted into u string. When they were appended to the buffer, it turns the buffer into u, e.g.

>>> x+y u'xy'

And then later when the body with invalid seq is appended, Python tries to parse it as Unicode and blows up.

For this reason I had to declare the headers as b, non-Unicode strings. It may not be an issue in Python3, but will definitely be an issue in Python 2.7. So perhaps these assignments can be made conditional using six.PY3

Trying to fix this in #109

yurishkuro · 2017-05-03T01:37:28Z

setup.py

@@ -33,12 +33,13 @@
        'License :: OSI Approved :: MIT License',
        'Natural Language :: English',
        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3.6'


should also add to the matrix in .travis.yaml

yurishkuro · 2017-05-03T01:38:38Z

setup.py

        'futures',
        'threadloop>=1,<2',
-        # we want thrift>=0.9.2.post1,<0.9.3, but we let the users pin to that
-        'thrift',
+        'thrift>=0.10.0',


many apps at uber are using 0.9.2 or 0.93, pinning the dependency like this is going to break them. Can we just document that with python 3 people should use >=0.10?

I am also not sure how the generated files will work with thrift < 0.10.

Internally we use tox to run unit tests with several versions of dependencies, e.g. tornado. I would like to do that here with thrift

It seems that generated files require thrift==0.10.0 to work correctly. With forcefully installed thrift==0.9.3, I'm receiving ImportError: cannot import name 'TFrozenDict' exception in jaeger_client/thrift_gen/zipkincore/ZipkinCollector.py, which makes sense: TFrozenDict appears only in 0.10.0 version.

Given this, I'm not sure how to go with Python 3 support:

stubs generated by thrift 0.9 don't support Python 3

stubs generated by thrift 0.10 do support Python 3, but they require thrift 0.10 package

Maybe there should be two sets of stubs generated (thrift_gen_0_10_0 and thrift_gen_0_9_3) and the code would switch between them with:

if sys.version_info[0] == 2 # or should it be a check for thrift version?: from jaeger_client import thrift_gen_0_9_3 as thrift_gen else: from jaeger_client import thrift_gen_0_10_0 as thrift_gen import thrift_gen.zipkincore.ZipkinCollector as zipkin_collector

@yurishkuro what do you think?

six.iteritems(dict) behave like dict.iteritems() in Python 2 and like iter(dict.items()) in Python 3. This should keep performance on the same level for both Py2/3 environments.

coveralls · 2017-05-05T20:41:05Z

Coverage increased (+0.2%) to 94.587% when pulling 2612c73 on stitchfix:python3 into 46bfb9e on uber:master.

yurishkuro · 2017-05-05T21:05:40Z

crossdock/setup_crossdock.py

@@ -20,6 +20,7 @@
        ]
    },
    install_requires=[
-        # all dependencies are included in tchannel already
+        # most of dependencies are included in tchannel already
+        'six'


should not be needed since it's already imported in the main setup.py

Got it, will fix.
I guess I just don't entirely understand what this crossdock thing is all about :)

coveralls · 2017-05-05T21:18:34Z

Coverage increased (+0.2%) to 94.587% when pulling af5b703 on stitchfix:python3 into 46bfb9e on uber:master.

yurishkuro · 2017-05-13T21:27:04Z

@xdralex sorry for the delay, I was at OSCON this week. Will review this next week. We need to do some testing internally to make sure there is no regression in 2.7.

xdralex · 2017-05-30T17:41:10Z

@yurishkuro just curious if you have had a chance to take another look :)
I can try to update opentracing_instrumentation as well if it helps...

yurishkuro · 2017-05-30T23:14:49Z

Hi, sorry I was traveling again last week, will try to get to it this week.

yurishkuro · 2017-06-15T23:14:50Z

jaeger_client/sampler.py

@@ -45,7 +49,7 @@
 SAMPLER_TYPE_TAG_KEY = 'sampler.type'
 SAMPLER_PARAM_TAG_KEY = 'sampler.param'
 DEFAULT_SAMPLING_PROBABILITY = 0.001
-DEFAULT_LOWER_BOUND = 1.0 / (10.0 * 60.0)  # sample once every 10 minutes
+DEFAULT_LOWER_BOUND = old_div(1.0, (10.0 * 60.0))  # sample once every 10 minutes


What's the reason for this change? Doesn't Python3 work with 1.0 / (10.0 * 60.0)?

I'd like to have comments in the code for non-trivial decisions.

yurishkuro · 2017-06-15T23:17:17Z

jaeger_client/thrift.py

+
+
+def str_to_binary(value):
+    return value if sys.version_info[0] == 2 else value.encode('utf-8')


if this is checking the python version, I would rather use six.PY2 constant.

yurishkuro · 2017-06-15T23:23:28Z

setup.py

        'threadloop>=1,<2',
-        # we want thrift>=0.9.2.post1,<0.9.3, but we let the users pin to that
-        'thrift',
+        'thrift>=0.10.0',


we need to keep the import without pinning to higher version, as it will make the library incompatible with many existing installations that may be using thrift <0.10. What's the reason to require 0.10, because PY3 doesn't work with 0.9? Is there a way to make this dependency sensitive to the Python version, i.e. only require 0.10 on Py3?

yurishkuro · 2017-06-15T23:52:41Z

I think this change is too large for a single commit. What if we do this upgrade in stages? First I would run the futurize only and do a PR, without introducing Py3. Then some other (manual) changes. Finally, we need to figure out what to do about the Thrift upgrade.

I've also done some simple benchmarks about iteritems using six. The mean when using six is only 2% higher.

--------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------
Name (time in ns)         Min                    Max                  Mean                StdDev                Median                 IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_py2_iteritems     0.0000 (1.0)      37,670.1355 (1.0)      3,686.1449 (1.0)      1,045.8132 (1.15)     3,337.8601 (1.0)      317.8914 (1.0)       5976;12658   99865           3
test_six_iteritems     0.0000 (1.0)      40,690.1042 (1.08)     3,762.2817 (1.02)       908.0106 (1.0)      3,655.7515 (1.10)     317.8914 (1.0)       6834;10482   89241           3
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

yurishkuro · 2018-04-16T16:36:18Z

Py3 compatibility has been achieved with other PRs.

Alexander added 4 commits May 1, 2017 12:01

Switched Thrift version to 0.10.0 which supports Python 3

771bcb0

Applied futurize script from futures

1373dcd

Fixed most tests

05bcb4a

setup.py housekeeping

cfe390c

xdralex mentioned this pull request May 2, 2017

SyntaxError in Python 3 #41

Closed

yurishkuro reviewed May 3, 2017

View reviewed changes

Replaced Py3-style dict.items() with six.iteritems(dict)

2612c73

six.iteritems(dict) behave like dict.iteritems() in Python 2 and like iter(dict.items()) in Python 3. This should keep performance on the same level for both Py2/3 environments.

yurishkuro reviewed May 5, 2017

View reviewed changes

Fixed setup_crossdock.py

af5b703

Updated travis.yml

d1740b0

xdralex mentioned this pull request Jun 15, 2017

SURVEY: Who is using Jaeger jaegertracing/jaeger#207

Open

yurishkuro reviewed Jun 15, 2017

View reviewed changes

MrSaints mentioned this pull request Jul 8, 2017

Futurize for Py3 #57

Merged

yurishkuro mentioned this pull request Jul 9, 2017

Support Python 3.x #59

Closed

yurishkuro closed this Apr 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Python3 support #43

Added Python3 support #43

xdralex commented May 2, 2017

CLAassistant commented May 2, 2017 •

edited

Loading

coveralls commented May 2, 2017

yurishkuro left a comment

yurishkuro May 3, 2017

xdralex May 5, 2017

xdralex May 5, 2017

yurishkuro May 3, 2017

yurishkuro May 3, 2017

xdralex May 5, 2017

yurishkuro May 3, 2017

xdralex May 5, 2017

yurishkuro May 5, 2017 •

edited

Loading

yurishkuro May 5, 2017

yurishkuro May 5, 2017 •

edited

Loading

xdralex May 9, 2017

yurishkuro Jun 15, 2017

yurishkuro Dec 13, 2017

yurishkuro May 3, 2017

xdralex May 9, 2017

yurishkuro May 3, 2017

xdralex May 9, 2017

coveralls commented May 5, 2017

yurishkuro May 5, 2017

xdralex May 5, 2017

coveralls commented May 5, 2017

yurishkuro commented May 13, 2017

xdralex commented May 30, 2017

yurishkuro commented May 30, 2017

yurishkuro Jun 15, 2017

yurishkuro Jun 15, 2017

yurishkuro Jun 15, 2017

yurishkuro commented Jun 15, 2017 •

edited

Loading

yurishkuro commented Apr 16, 2018



		def str_to_binary(value):
		return value if sys.version_info[0] == 2 else value.encode('utf-8')

Added Python3 support #43

Added Python3 support #43

Conversation

xdralex commented May 2, 2017

CLAassistant commented May 2, 2017 • edited Loading

coveralls commented May 2, 2017

yurishkuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro May 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro May 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 5, 2017

yurishkuro commented May 13, 2017

xdralex commented May 30, 2017

yurishkuro commented May 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Jun 15, 2017 • edited Loading

yurishkuro commented Apr 16, 2018

CLAassistant commented May 2, 2017 •

edited

Loading

yurishkuro May 5, 2017 •

edited

Loading

yurishkuro May 5, 2017 •

edited

Loading

yurishkuro commented Jun 15, 2017 •

edited

Loading