Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check stderr first before stdout on VCS Install #9234

Closed
wants to merge 6 commits into from
Closed

Check stderr first before stdout on VCS Install #9234

wants to merge 6 commits into from

Conversation

Mikuana
Copy link

@Mikuana Mikuana commented Dec 6, 2020

When working with large amounts of data, Git reports on stderr instead of
stdout. For some reason, on Git for Windows (I have not been able to reproduce
this on Linux), this can cause the subprocess to completely stall while asking
for a return from the stdout. In the context of a pip install git+https://,
this results in the Clone step freezing, without providing any errors or
context about what's happening (or what has gone wrong).

This fix circumvents that by first check stderr for output, and then checking
stdout (if none is found).

When working with large amounts of data, Git reports on stderr instead of
stdout. For some reason, on Git for Windows (I have not been able to reproduce
this on Linux), this can cause the subprocess to completely stall while asking
for a return from the stdout. In the context of a `pip install git+https://`,
this results in the Clone step freezing, without providing any errors or
context about what's happening (or what has gone wrong).

This fix circumvents that by first check stderr for output, and then checking
stdout (if none is found).
@uranusjr
Copy link
Member

uranusjr commented Dec 7, 2020

Can this be done simpler with Popen(..., stderr=subprocess.STDOUT)?

@uranusjr uranusjr added C: vcs pip's interaction with version control systems like git, svn and bzr type: bugfix labels Dec 7, 2020
@Mikuana
Copy link
Author

Mikuana commented Dec 8, 2020

Can this be done simpler with Popen(..., stderr=subprocess.STDOUT)?

I'm not sure if that would address the Git issue, as it may effectively resolve to the same thing as the current solution. I'll try it though, as it would definitely make for a cleaner change.

Mikuana and others added 2 commits December 7, 2020 20:00
When working with large amounts of data, Git reports on stderr instead of
stdout. For some reason, on Git for Windows (I have not been able to reproduce
this on Linux), this can cause the subprocess to completely stall while asking
for a return from the stdout. In the context of a `pip install git+https://`,
this results in the Clone step freezing, without providing any errors or
context about what's happening (or what has gone wrong).
@Mikuana
Copy link
Author

Mikuana commented Dec 8, 2020

Can this be done simpler with Popen(..., stderr=subprocess.STDOUT)?

That was a good suggestion. It works on my Windows environment with a Git install. Updated pull request with the change.

@Mikuana
Copy link
Author

Mikuana commented Dec 8, 2020

FYI - automated checks are failing, but that's happening when I pull down master and run it on my local as well.

@uranusjr
Copy link
Member

uranusjr commented Dec 8, 2020

Hmm, reading the code again, stderr=subprocess.STDOUT may not be a good idea after all, since it affects showing the subprocess error afterwards (content of the all_output variable).

@Mikuana
Copy link
Author

Mikuana commented Dec 9, 2020

@uranusjr I have to admit that I don't have enough experience with subprocess to know whether or not this is a good idea. If we reverted to my first suggestion, and used readlines on sterr and then stdout for each printed line, would this avoid the later error display?

@uranusjr
Copy link
Member

So in the original implementation, only contents from stdout would be added to all_outputs, but in both the initial and current implementation in this PR, both stdout and stderr are added to all_output.

But reading the original implementation yet again, stderr was piped but never read in any way before being closed. So I’m going to assume it’s actually a bug, and this reflects the original author’s intention. We can always fix it if we’re wrong 🙂

news/8876.bugfix.rst Outdated Show resolved Hide resolved
Co-authored-by: Tzu-ping Chung <[email protected]>
@Mikuana
Copy link
Author

Mikuana commented Dec 10, 2020

So in the original implementation, only contents from stdout would be added to all_outputs, but in both the initial and current implementation in this PR, both stdout and stderr are added to all_output.

But reading the original implementation yet again, stderr was piped but never read in any way before being closed. So I’m going to assume it’s actually a bug, and this reflects the original author’s intention. We can always fix it if we’re wrong

Oh gotcha. So even if we capture details that we don't intend to from stderr, it won't go anywhere anyways, so there's no practical effect. In that case, I'll keep the current PR version since it is a cleaner change.

@@ -121,7 +121,7 @@ def call_subprocess(
# Convert HiddenText objects to the underlying str.
reveal_command_args(cmd),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stderr=subprocess.STDOUT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. This bug was introduced in #7969 which was intended to not merge stdout and stderr, because some VCS command log warnings on stderr and we don't want to capture them in the command output. So I'd say we should leave stderr alone to be printed on the console for the user to see.

Actually I was wondering this week why I was seeing pip failing on git exit codes while not showing the error details. That is probably it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mikuana could you check if simply removing this stderr=subprocess.STDOUT, line works ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbidoul that worked as well. I've updated this PR to remove that line instead of piping it to stdout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately tests are red. It's because some calls (one actually, via get_repository_root) need to capture stderr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would your recommendation be? Should we modify the tests, or should we trust the test and instead redirect the stderr to stdout as I had it previously?

Sorry if that seems like a silly question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirecting stderr to stdout won't work because that would reopen #7545 and #7968 where the vcs logs warnings which get mixed with the stdout we want to extract and parse.

If we let stderr go to the console, this will create unwanted noise on the console (which is why the tests fail), and bypass the pip logging and verbosity control mechanisms.

So what can we do? Not a silly question indeed.

I see two approach.

1/ The easy one is to use Popen.communicate() which has a safe (multithreaded) mechanism to capture stderr and stdout separately. There are two logging-related drawbacks to this: a) in debug mode it would not display the process output until it has terminated b) stdout and stderr could only be showed one after the other instead of the natural line order produced by the subprocess.

2/ The hard one is to reimplement a variant of communicate to both debug log and capture (a kind of tee)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I went for approach 1/ in #9327

Mikuana and others added 2 commits December 17, 2020 12:52
This feature may have been the root cause in the introduction of hanging
git installs in 20.2.
This was referenced Dec 18, 2020
@Mikuana
Copy link
Author

Mikuana commented Dec 20, 2020

Closing in favor of #9327

@Mikuana Mikuana closed this Dec 20, 2020
@Mikuana Mikuana deleted the large-git-install-bugfix branch December 20, 2020 17:58
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: vcs pip's interaction with version control systems like git, svn and bzr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants