-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Improve diagnostic task runner" #1097
Revert "Improve diagnostic task runner" #1097
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit confusado about this PR - in any case monsieur @bouweandela knows more about its functionality so I'll leave it up to him but do hurry up pls @mattiarighi is waiting on this for the frequency PR testing 🍺
@@ -408,13 +416,10 @@ def _start_diagnostic_script(self, cmd, env): | |||
try: | |||
process = subprocess.Popen( | |||
cmd, | |||
universal_newlines=True, | |||
bufsize=1, | |||
stdout=subprocess.PIPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont get it - why do we need to remove this (and in general why this PR) - all this does is flush all output to text in new lines 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion in #1059 for the reason behind reverting this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah cheers, looks pretty hairy - check out my comment there, I suspect it's all about the i/o encoding
while returncode is None: | ||
returncode = process.poll() | ||
txt = process.stdout.read() | ||
txt = txt.decode(encoding='utf-8', errors='ignore') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to set the encoding, it's picked up automatically from locale
and if the locale encoding is not UTF-8 it's not safe to set it at decode level
This PR solves my problems with the tool "hanging" when writing the provenance info. Tests with recipe_smpi.yml were successful (again). |
Any decision about this? We need to solve this problem, otherwise we can't run any test on other PRs. |
I would propose to merge this PR now, so we can continue working on the other PRs while trying to find a solution for for the problems caused by the changes we revert with this PR. |
The tool does not hang when writing provenance info, it hangs when executing an NCL diagnostic script that writes a lot of text to stdout or stderr. I suspect that the actual problem is that NCL crashes when the stdout pipe buffer is full (instead of waiting until it can write to it again). This can be merged, but it does not fix the problem, it just makes it less likely that the problem will occur, because it changes the buffer size from 1 line of text to a few thousand bytes. |
Could this be related to the large output from the NetCDF library mentioned here? |
Yes, I suspect that is why we did not see the issue before. |
I don't think this is the problem. At least in my case, the log file contains the output from the "leave message" issued the NCL diagnostic, which shows that the very last line of the NCL script has been executed successfully. There are also no crash or error messages from NCL in the log files (at least for what I tested), so I doubt that NCL is the problem here. |
But wait, @axel-lauer does not have the NetCDF error in his log, since he is using an older environment (right?). Another interesting thing is that commenting out the calls to It looks like an interplay of different issues. |
The problem was made worse by #1075 instead of better, which is why Axel was now even seeing it without the large NetCDF debug messages.
I think this was accidental, because I was also able to run perfmetrics without error occasionally. |
@mattiarighi : I am using an "old" environment and I did not see such netCDF related error messages. |
Reverts #1075