Revert "Improve diagnostic task runner" #1097

mattiarighi · 2019-05-27T16:12:32Z

Reverts #1075

valeriupredoi

a bit confusado about this PR - in any case monsieur @bouweandela knows more about its functionality so I'll leave it up to him but do hurry up pls @mattiarighi is waiting on this for the frequency PR testing 🍺

valeriupredoi · 2019-05-28T10:52:24Z

esmvaltool/_task.py

@@ -408,13 +416,10 @@ def _start_diagnostic_script(self, cmd, env):
        try:
            process = subprocess.Popen(
                cmd,
-                universal_newlines=True,
-                bufsize=1,
                stdout=subprocess.PIPE,


I dont get it - why do we need to remove this (and in general why this PR) - all this does is flush all output to text in new lines 😕

See the discussion in #1059 for the reason behind reverting this PR.

yeah cheers, looks pretty hairy - check out my comment there, I suspect it's all about the i/o encoding

valeriupredoi · 2019-05-28T10:54:21Z

esmvaltool/_task.py

+            while returncode is None:
+                returncode = process.poll()
+                txt = process.stdout.read()
+                txt = txt.decode(encoding='utf-8', errors='ignore')


no need to set the encoding, it's picked up automatically from locale and if the locale encoding is not UTF-8 it's not safe to set it at decode level

axel-lauer · 2019-05-28T13:09:13Z

This PR solves my problems with the tool "hanging" when writing the provenance info. Tests with recipe_smpi.yml were successful (again).

mattiarighi · 2019-05-29T06:29:11Z

Any decision about this? We need to solve this problem, otherwise we can't run any test on other PRs.

axel-lauer · 2019-05-29T08:23:27Z

I would propose to merge this PR now, so we can continue working on the other PRs while trying to find a solution for for the problems caused by the changes we revert with this PR.

bouweandela · 2019-05-29T09:25:56Z

This PR solves my problems with the tool "hanging" when writing the provenance info. Tests with recipe_smpi.yml were successful (again).

The tool does not hang when writing provenance info, it hangs when executing an NCL diagnostic script that writes a lot of text to stdout or stderr.

I suspect that the actual problem is that NCL crashes when the stdout pipe buffer is full (instead of waiting until it can write to it again). This can be merged, but it does not fix the problem, it just makes it less likely that the problem will occur, because it changes the buffer size from 1 line of text to a few thousand bytes.

mattiarighi · 2019-05-29T09:30:24Z

Could this be related to the large output from the NetCDF library mentioned here?

bouweandela · 2019-05-29T09:30:58Z

Yes, I suspect that is why we did not see the issue before.

axel-lauer · 2019-05-29T09:31:02Z

I don't think this is the problem. At least in my case, the log file contains the output from the "leave message" issued the NCL diagnostic, which shows that the very last line of the NCL script has been executed successfully. There are also no crash or error messages from NCL in the log files (at least for what I tested), so I doubt that NCL is the problem here.

mattiarighi · 2019-05-29T09:33:03Z

But wait, @axel-lauer does not have the NetCDF error in his log, since he is using an older environment (right?).

Another interesting thing is that commenting out the calls to log_provenance in perfmetrics my run was successful.

It looks like an interplay of different issues.

bouweandela · 2019-05-29T09:45:09Z

The problem was made worse by #1075 instead of better, which is why Axel was now even seeing it without the large NetCDF debug messages.

Another interesting thing is that commenting out the calls to log_provenance in perfmetrics my run was successful.

I think this was accidental, because I was also able to run perfmetrics without error occasionally.

axel-lauer · 2019-05-29T10:30:54Z

@mattiarighi : I am using an "old" environment and I did not see such netCDF related error messages.

Revert "Improve diagnostic task runner"

568348b

mattiarighi requested review from bouweandela and axel-lauer May 27, 2019 16:12

mattiarighi added the enhancement label May 28, 2019

This was referenced May 28, 2019

fixed missing frequency for cmip6 cmor checks #985

Closed

Version 2 provenance ncl seaice #888

Merged

valeriupredoi reviewed May 28, 2019

View reviewed changes

mattiarighi merged commit 6bee39d into version2_development May 29, 2019

mattiarighi deleted the revert-1075-better_diagnostic_runner branch May 29, 2019 09:30

bouweandela mentioned this pull request May 29, 2019

Increase stdout buffer size to prevent NCL crash #1102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Improve diagnostic task runner" #1097

Revert "Improve diagnostic task runner" #1097

mattiarighi commented May 27, 2019

valeriupredoi left a comment

valeriupredoi May 28, 2019

mattiarighi May 28, 2019

valeriupredoi May 28, 2019

valeriupredoi May 28, 2019

axel-lauer commented May 28, 2019

mattiarighi commented May 29, 2019

axel-lauer commented May 29, 2019

bouweandela commented May 29, 2019

mattiarighi commented May 29, 2019

bouweandela commented May 29, 2019

axel-lauer commented May 29, 2019

mattiarighi commented May 29, 2019

bouweandela commented May 29, 2019

axel-lauer commented May 29, 2019

Revert "Improve diagnostic task runner" #1097

Revert "Improve diagnostic task runner" #1097

Conversation

mattiarighi commented May 27, 2019

valeriupredoi left a comment

Choose a reason for hiding this comment

valeriupredoi May 28, 2019

Choose a reason for hiding this comment

mattiarighi May 28, 2019

Choose a reason for hiding this comment

valeriupredoi May 28, 2019

Choose a reason for hiding this comment

valeriupredoi May 28, 2019

Choose a reason for hiding this comment

axel-lauer commented May 28, 2019

mattiarighi commented May 29, 2019

axel-lauer commented May 29, 2019

bouweandela commented May 29, 2019

mattiarighi commented May 29, 2019

bouweandela commented May 29, 2019

axel-lauer commented May 29, 2019

mattiarighi commented May 29, 2019

bouweandela commented May 29, 2019

axel-lauer commented May 29, 2019