Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle unicode and byte streams with pantsd for Python 3 #7130

Merged
merged 22 commits into from
Jan 26, 2019

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Jan 23, 2019

Problem

Any test with @ensure_daemon fails when ran with ./pants3. This is due to unicode issues.

For example, ./pants3 test tests/python/pants_test/base:exiter_integration will fail with:

Exception caught: (builtins.TypeError)
  File "/home/eric/pants/src/python/pants/bin/pants_loader.py", line 89, in <module>
    main()
  File "/home/eric/pants/src/python/pants/bin/pants_loader.py", line 85, in main
    PantsLoader.run()
  File "/home/eric/pants/src/python/pants/bin/pants_loader.py", line 81, in run
    cls.load_and_execute(entrypoint)
  File "/home/eric/pants/src/python/pants/bin/pants_loader.py", line 74, in load_and_execute
    entrypoint_main()
  File "/home/eric/pants/src/python/pants/bin/pants_exe.py", line 39, in main
    PantsRunner(exiter, start_time=start_time).run()
  File "/home/eric/pants/src/python/pants/bin/pants_runner.py", line 48, in run
    return RemotePantsRunner(self._exiter, self._args, self._env, options_bootstrapper).run()
  File "/home/eric/pants/src/python/pants/bin/remote_pants_runner.py", line 190, in run
    self._run_pants_with_retry(pantsd_handle)
  File "/home/eric/pants/src/python/pants/bin/remote_pants_runner.py", line 114, in _run_pants_with_retry
    return self._connect_and_execute(pantsd_handle.port)
  File "/home/eric/pants/src/python/pants/bin/remote_pants_runner.py", line 155, in _connect_and_execute
    result = client.execute(self.PANTS_COMMAND, *self._args, **modified_env)
  File "/home/eric/pants/src/python/pants/java/nailgun_client.py", line 269, in execute
    return self._session.execute(cwd, main_class, *args, **environment)
  File "/home/eric/pants/src/python/pants/java/nailgun_client.py", line 109, in execute
    return self._process_session()
  File "/home/eric/pants/src/python/pants/java/nailgun_client.py", line 80, in _process_session
    self._write_flush(self._stdout, payload)
  File "/home/eric/pants/src/python/pants/java/nailgun_client.py", line 63, in _write_flush
    fd.write(payload)

Exception message: write() argument must be str, not bytes

Solution

Use sys.std{out,err}.buffer with Py3.

We reaffirmed in #7073 a prior decision that the Exiter related code should be using a bytes interface. However, we did not fix the pantsd related code because it was causing regressions. We now fix these usages.

Note that in pants_daemon.py, we override sys.stdout to our own custom _LoggerStream object. To ensure Python 3 support, we add a buffer property.

Issues running on macOS

tests/python/pants_test/base:exiter_integration will fail on macOS still, due to an upstream non fork safe osx lib (see https://bugs.python.org/issue28342). But this bug also affects Python 2.

We use bytes for local pants runner, so should also with pantsd.
In Py2, we had to add code to decode all files from a watchman's `event["files"]` (see pantsbuild#3951). This was necessary to fix a unicode bug.

However, in Py3, it appears that this value is already unicode, as we get an error saying `str` does not have an attribute `decode()`.

So, we conditionally check if we need to decode or not based off of the interpreter.
This was causing an issue with daemon_pants_runner when it tried to use sys.stdout.buffer. Now the two correspond.

Note this could completely be wrong..
…ts w/ py3 constraints

Curious to see if @pants_daemon is still failing
It looks like this PR will fix a couple failing targets! (Which I wasn't expecting because they timeout locally...)

Multiple targets are still failing for other reasons, which we leave off.
@Eric-Arellano
Copy link
Contributor Author

I've fixed the unicode issues, but it leads to a new issue. Any test with @ensure_daemon will always time out. When removing the timeout logic from process_manager.py, we got no output, suggesting the process never terminates.

To reproduce, pull this PR and run ./pants3 test tests/python/pants_test/base:exiter_integration. (Be sure to run ./pants3 clean-all first).

I haven't been able to find yet what might be causing this. Note that the daemon does work as intended with Python 2.

@OniOni
Copy link
Contributor

OniOni commented Jan 24, 2019

Going to take a stab at this one.

@Eric-Arellano Eric-Arellano requested review from stuhood, illicitonion, jsirois and cosmicexplorer and removed request for stuhood and illicitonion January 25, 2019 21:37
@Eric-Arellano Eric-Arellano changed the title WIP: properly handle unicode and bytes streams with pantsd Properly handle unicode and byte streams with pantsd Jan 25, 2019
@Eric-Arellano
Copy link
Contributor Author

Question for reviewers: can you please pull down this PR and run the command ./pants3 clean-all test tests/python/pants_test/engine/legacy:owners_integration - do the tests finish for you, or do you get a timeout error?

On macOS with Python 3.7, the third test never terminates for me, even when removing the timeout functionality with:

diff --git a/src/python/pants/pantsd/process_manager.py b/src/python/pants/pantsd/process_manager.py
index e42cbfa6e..e01629420 100644
--- a/src/python/pants/pantsd/process_manager.py
+++ b/src/python/pants/pantsd/process_manager.py
@@ -131,8 +131,8 @@ class ProcessMetadataManager(object):
         return True

       now = time.time()
-      if now > deadline:
-        raise cls.Timeout('exceeded timeout of {} seconds while waiting for {}'.format(timeout, action_msg))
+      # if now > deadline:
+      #   raise cls.Timeout('exceeded timeout of {} seconds while waiting for {}'.format(timeout, action_msg))

       if now > info_deadline:
         logger.info('waiting for {}...'.format(action_msg))

However, Mathieu cannot reproduce this with either 3.6 on macOS or 3.6 on Ubuntu, I also can't reproduce with 3.6 on Ubuntu, and CI works (which runs 3.6 and Ubuntu). I'm trying to figure out if this is just weirdness with my mac's setup or a 3.7 issue.

@Eric-Arellano Eric-Arellano changed the title Properly handle unicode and byte streams with pantsd Properly handle unicode and byte streams with pantsd for Python 3 Jan 25, 2019
@Eric-Arellano
Copy link
Contributor Author

Update: we discovered the timeout issue is indeed a problem with Python 3.7... Opened #7160 to track. No need to run the tests locally.

I think we should not let this Python 3.7 issue block this PR. It does fix 3.6 behavior, which is the primary interpreter we're targeting, and the fixes are also likely useful for 3.7. Before we announce "Full Python 3 support", I think we need to close #7160, but this gets us closer in the meantime to "Experimental Python 3 support".

Contrib tests were failing because they did not include the recent commit that modifies contrib/go.
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for digging in here!

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of .buffer that slyly changes things to unicode feels surprising to me -- what if we wanted to write something with an unknown encoding to the logger stream? Making a subclass as suggested feels easy. I'm warming up to the thought, but making sys.stdout.buffer (the way this stream is accessed) the same as not using .buffer feels suprising to me.

"""A sys.{stdout,stderr} replacement that pipes output to a logger."""
"""A sys.std{out,err} replacement that pipes output to a logger.

N.B. the Logger object expects unicode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to insert a multi-line suggestion, but I wouldn't leave this line alone from the rest, and would fill it to just:

  N.B. the Logger object expects unicode. However, most of our outstream logic, such as in
  `Exiter.py`, will use `sys.std{out,err}.buffer` and thus a bytes interface when running with
  Python 3. So, we must provide a `buffer` property, and change the semantics of the buffer to
  always convert the message to unicode.

@@ -135,7 +136,7 @@ def _process_event_queue(self):
try:
subscription, is_initial_event, files = (event['subscription'],
event['is_fresh_instance'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if others would find this appropriate, but fixing the indentation of this line sounds like a good idea and a small change.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After slack convo, this is fantastic if the bit about logging.Logger requiring unicode is added to the docstring!

Thanks Danny for running through a couple iterations!
@Eric-Arellano Eric-Arellano merged commit c5d4ea1 into pantsbuild:master Jan 26, 2019
@Eric-Arellano Eric-Arellano deleted the pantsd-bytes branch January 26, 2019 05:59
@Eric-Arellano
Copy link
Contributor Author

Enormous thank you to @OniOni for spending two days diving into this and figuring it all out! Down to six targets 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants