-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SystemError: error return without exception set #264
Comments
Without being able to reproduce it, I'm not sure what I can do. :( If you can manage to reproduce the error outside of Nuke then I'd be able to narrow it down. Alternatively, you could narrow down the scope of the program inside of Nuke, by for example launching a simpler program with threading and try to catch it with as little code as possible. For example.. import threading
def my_program():
print("Hello world")
thread = threading.Thread(target=my_program)
thread.start() And then try adding things to the program until you experience the bug. |
Maybe other people will experience this eventually, although I suspect it might be something to do with the collectors I'm using Nuke. Did try to run through the collection * 100, and didn't experience any problems with This only error I experienced when investigating was this: import time
import pyblish_qml
server = pyblish_qml.show()
time.sleep(5)
server.stop()
But I don't think this is the problem. |
Further to the investigation this is the erroring method in Nuke: # Functions for parallel threads to run stuff that can only be
# in the main Nuke thread. Formerly in nukescripts.utils
import traceback
import threading
import types
import _nuke
def executeInMainThreadWithResult( call, args = (), kwargs = {}):
""" Execute the callable 'call' with optional arguments 'args' and named arguments 'kwargs' in
Nuke's main thread and wait for the result to become available. """
if type(args) != types.TupleType:
args = (args,)
resultEvent = threading.Event()
id = _nuke.RunInMainThread.request(call, args, kwargs, resultEvent )
resultEvent.wait()
try:
r = _nuke.RunInMainThread.result(id)
except:
traceback.print_exc()
r = None
return r It seems like the error comes from the try/except, where the result of running something in the main thread of Nuke produces a error that Python does not recognize. |
To add to the mystery, not only does it not happen consistently but it's different per machine. One machine has been seen to crash in NukeStudio with this message, almost always, but the same project on another machine processes the project without an issue. |
This is the closest I've come to a reproducible so far: # Functions for parallel threads to run stuff that can only be
# in the main Nuke thread. Formerly in nukescripts.utils
import traceback
import threading
import types
import _nuke
from pyblish_qml import ipc
def executeInMainThreadWithResult( call, args = (), kwargs = {}):
""" Execute the callable 'call' with optional arguments 'args' and named arguments 'kwargs' in
Nuke's main thread and wait for the result to become available. """
if type(args) != types.TupleType:
args = (args,)
resultEvent = threading.Event()
id = _nuke.RunInMainThread.request(call, args, kwargs, resultEvent )
resultEvent.wait()
try:
r = _nuke.RunInMainThread.result(id)
except:
traceback.print_exc()
r = None
return r
def call():
server = None
try:
service = ipc.service.Service()
server = ipc.server.Server(service, targets=[])
except Exception:
# If for some reason, the GUI fails to show.
traceback.print_exc()
proxy = ipc.server.Proxy(server)
for count in range(0, 1000):
executeInMainThreadWithResult(call) This will produce a lot of background processes, to the point where Nuke and the machine is going to become unresponsive.
I don't think this is exact issue, since people don't get an unresponsive Nuke, but it could be a breadcrumb. |
If it helps, this is the exact point where Nuke is asked to do something by Pyblish QML..
|
I'm trying to find a piece of code, that'll show and close the pyblish-qml windows multiple times. I'm hoping that stress testing the showing and closing of pyblish-qml will reveal something. import os
import time
import pyblish.api
import pyblish_qml
from pyblish_qml import api
def on_shown():
server = api.current_server()
server.stop()
time.sleep(5)
pyblish_qml.show()
pyblish.api.deregister_all_callbacks()
callback = "pyblishQmlShown", on_shown
pyblish.api.register_callback(*callback)
os.environ["PYBLISHPLUGINPATH"] = ""
pyblish_qml.show()
|
Can't be sure, but try adding a |
Found a way: import os
import time
from Qt import QtCore
import pyblish.api
import pyblish_qml
from pyblish_qml import api, _state
_state["count"] = 0
def on_shown():
print _state["count"]
server = api.current_server()
server.stop()
if _state["count"] < 5:
QtCore.QTimer.singleShot(1000, pyblish_qml.show)
_state["count"] += 1
pyblish.api.deregister_all_callbacks()
callback = "pyblishQmlShown", on_shown
pyblish.api.register_callback(*callback)
os.environ["PYBLISHPLUGINPATH"] = ""
pyblish_qml.show() Interestingly when I run this through a lot of iterations, 200 or so, we get a problem with timers. Nuke crashes at this point. Although still not the issue error, but maybe another breadcrumb. |
I've found a replicable way of crashing Nuke with pyblish-qml.
import pyblish_qml
pyblish_qml._state["installed"] = True
pyblish_qml.show() This crashes Nuke with I know we are manipulating the Again not the issue error, but could help to stabilize pyblish-qml. |
I think I have a reproducible code for the issue error: import threading
import _nuke
import traceback
import types
import pyblish_qml
print pyblish_qml._state
def threaded_wrapper(func, *args, **kwargs):
if type(args) != types.TupleType:
args = (args,)
resultEvent = threading.Event()
id = _nuke.RunInMainThread.request(func, args, kwargs, resultEvent)
try:
r = _nuke.RunInMainThread.result(id)
except:
traceback.print_exc()
r = None
return r
pyblish_qml._state["dispatchWrapper"] = threaded_wrapper
pyblish_qml.show() The first execution works, but the second and any subsequent execution will error with the issue error. Notice that we are missing the |
What is |
Think I get why it works on the first execution. Its because the rest of the data members: |
Its a built in module. I'm guessing C code. |
import _nuke
print _nuke
# Result: <module '_nuke' (built-in)> |
A little suspicious that it looks to be a private member (prefixed with |
The public method is what we are using now def executeInMainThreadWithResult( call, args = (), kwargs = {}):
""" Execute the callable 'call' with optional arguments 'args' and named arguments 'kwargs' in
Nuke's main thread and wait for the result to become available. """
if type(args) != types.TupleType:
args = (args,)
resultEvent = threading.Event()
id = _nuke.RunInMainThread.request(call, args, kwargs, resultEvent )
resultEvent.wait()
try:
r = _nuke.RunInMainThread.result(id)
except:
traceback.print_exc()
r = None
return r |
I actually wanted to compare with what Maya has in its
|
Found this from # Functions for parallel threads to run stuff that can only be
# in the main thread.
__main_thread_lock = threading.Lock()
__main_thread_event = threading.Event()
def executeInMainThreadWithResult(call, *args, **kwargs):
""" Execute the callable 'call' with optional arguments 'args' and named arguments 'kwargs' in
the main thread and wait for the result.
Note that this method expects a single item for args or a tuple, and a dictionary for kwargs.
It will not accept anything else.
Examples of how to use this method (that work):
def someMethod(firstParameter, kwArg0=None, kwArg1=None)
print firstParameter
return kwArg1
result = executeInMainThreadWithResult(someMethod, "First positional parameter")
result = executeInMainThreadWithResult(someMethod, "First positional parameter", {'kwArg0': "arg0"})
result = executeInMainThreadWithResult(someMethod, ("First positional parameter", "kwArg0 passed as positional parameter"))
result = executeInMainThreadWithResult(someMethod, ("First positional parameter", "kwArg0 passed as positional parameter"), {'kwArg1': "arg1 as well"})
result = executeInMainThreadWithResult(someMethod, "First positional parameter", {'kwArg1': "arg1"})
An example of what won't work:
result = executeInMainThreadWithResult(someMethod, "First positional parameter", "Second positional parameter")
The above fails because the second parameter to executeInMainThread must be a dictionary.
"""
import types
if type(args) != types.TupleType:
args = (args,)
__main_thread_lock.acquire()
_fnpython.RunInMainThread.request(call, args, kwargs, __main_thread_event)
__main_thread_event.wait()
try:
r = _fnpython.RunInMainThread.result()
finally:
__main_thread_event.clear()
__main_thread_lock.release()
return r |
And here is the same method in Houdini: def executeInMainThreadWithResult(code, *args, **kwargs):
return _queueDeferred(code, args, kwargs, block=True)
execute_in_main_thread_with_result = executeInMainThreadWithResult
def _queueDeferred(code, args, kwargs, block, num_waits=0):
"""Run the specified Python code in the main thread during the next idle
event.
code: Either a string containing a Python expression, a callable object,
or a code object.
args, kwargs: Only valid for callable objects.
block: If True, this function will wait until the code is executed and
return whatever that code evaluated to.
"""
# Make sure args and kwargs where only specified for callable objects.
if not callable(code) and len(args) + len(kwargs) != 0:
raise ValueError(
"You cannot pass arguments unless you pass in a callable object")
_addEventLoopCallback()
_condition.acquire()
_queue.append((code, block, num_waits, args, kwargs))
if block:
_condition.wait()
result = _last_result
exc_info = _last_exc_info
_condition.release()
if block:
if exc_info is None:
return result
# TODO: Right now we discard the traceback information. Should we
# somehow encode it into the exception?
raise exc_info[1]
_is_running = False
def _addEventLoopCallback():
"""Add the event loop callback if it has not already been added."""
global _is_running
if not _is_running:
hou.ui.addEventLoopCallback(_processDeferred)
_is_running = True Seems like other applications, handle threading with lock whereas Nuke does not lock and release. |
That is interesting.. normally, anything run in a thread does not block. So, if there really isn't any locking, what you should be seeing is that even though you have a thread running, you should still be able to interact with Nuke. Or at the very least be able to run commands via this It would be interesting to see whether you can call multiple commands with For example, if the first thread started sleeps for 3 seconds, the second thread for 1 second, than you should be seeing the first one finish first after 3 seconds, and the second thread at 4 seconds total. You could print the wall-clock time in each call, to verify that they finish when you'd expect. If the second thread finishes first, then you've found a Nuke bug. |
You mean something like this? import time
import datetime
import nuke
def sleep_and_print_time(seconds, name):
print name
print datetime.datetime.now().time()
time.sleep(seconds)
print datetime.datetime.now().time()
nuke.executeInMainThreadWithResult(sleep_and_print_time, args=(3, "thread1"))
nuke.executeInMainThreadWithResult(sleep_and_print_time, args=(1, "thread2"))
|
Yes, then I think there is locking somewhere. It might be inside of that opaque call to |
I think I have simulated the issue problem with this: import pyblish_qml
def threaded_wrapper(func, *args, **kwargs):
return None
pyblish_qml._state["dispatchWrapper"] = threaded_wrapper
pyblish_qml.show()
Visually what happens is that the splash screen appears and then the GUI appears and disappears without any errors. This is what happens when people get the issue error. The reason why I return None in the # Functions for parallel threads to run stuff that can only be
# in the main Nuke thread. Formerly in nukescripts.utils
import traceback
import threading
import types
import _nuke
def executeInMainThreadWithResult( call, args = (), kwargs = {}):
""" Execute the callable 'call' with optional arguments 'args' and named arguments 'kwargs' in
Nuke's main thread and wait for the result to become available. """
if type(args) != types.TupleType:
args = (args,)
resultEvent = threading.Event()
id = _nuke.RunInMainThread.request(call, args, kwargs, resultEvent )
resultEvent.wait()
try:
r = _nuke.RunInMainThread.result(id)
except:
traceback.print_exc()
r = None
return r It seems (to me) that for whatever reason an event does not finish before Nuke is requesting the results, then the try/except handling causes the return value to be Maybe we should handle this case better, instead of letting the GUI appear and disappear, which causes the user to repetitively try to show the GUI? Since the problem can resolve itself by waiting a little while, could indicate that the event or the main thread gets reset. The question is how to force a reset? |
Since this issue can resolve itself after a while, we could inform the user to either wait a little or restart Nuke? |
This issue happens almost consistent when evaluating a larger script, meaning launching when Nuke is working on evaluating nodes. Trying to figure out how to replicate this. |
Ooh, that's interesting. You're saying that by triggering Pyblish when Nuke is already busy, it won't wait for Nuke to become ready, but rather start right away? Try putting Pyblish in the Qt queue like this. QtCore.QTimer.singleShot(0, call_pyblish) It'll put the function in a queue to be run only when the Nuke GUI (which I expect should include any processing) is idle. Could either put the On top of that, you should lock the GUI during Pyblish process. What normally happens is that a plug-in is run, control is returned to Nuke, and then another plug-in is run. This could confuse Nuke, especially if it gathers new events to be run during idle, which it may think is happening during those intermissions. Normally you can lock the background GUI by opening a modal dialog; that won't work here, but you could open up a hidden modal dialog, just for the purposes of blocking input. There are probably other ways of blocking Nuke from doing anything until we tell it to. |
Just putting
My test case is that I have a computational heavy script with a large frame range. I playback the frame range to get Nuke to constantly evaluate the nodes, as it caches the frame range. While Nuke is caching the frame range, I show the pyblish-qml window and almost always get the issue error. The interesting part is that it doesn't error out at the same stage every time. Sometimes it errors out immediately and others the collection gets through a couple of plugins before erroring. This leads me to suspect that each call to process a plugin could be causing the race condition, so I tried to have the threaded wrapper as a singleshot call with: def threaded_wrapper(func, *args, **kwargs):
return QtCore.QTimer.singleShot(
0, lambda: nuke.executeInMainThreadWithResult(func, args, kwargs)
) But this gives me this error:
I then wanted to try with using QThreads instead of the normal threads used here, and importing Qt (
I've now tried with class YourThreadName(QtCore.QThread):
def __init__(self, service, popen):
QtCore.QThread.__init__(self)
self.service = service
self.popen = popen
def __del__(self):
self.wait()
def run(self):
"""This runs in a thread"""
for line in iter(self.popen.stdout.readline, b""):
if six.PY3:
line = line.decode("utf8")
try:
response = json.loads(line)
except Exception:
# This must be a regular message.
sys.stdout.write(line)
else:
if response.get("header") == "pyblish-qml:popen.request":
payload = response["payload"]
args = payload["args"]
wrapper = _state.get("dispatchWrapper",
default_wrapper)
func = getattr(self.service, payload["name"])
result = wrapper(func, *args) # block..
# Note(marcus): This is where we wait for the host to
# finish. Technically, we could kill the GUI at this
# point which would make the following commands throw
# an exception. However, no host is capable of kill
# the GUI whilst running a command. The host is locked
# until finished, which means we are guaranteed to
# always respond.
data = json.dumps({
"header": "pyblish-qml:popen.response",
"payload": result
})
if six.PY3:
data = data.encode("ascii")
self.popen.stdin.write(data + b"\n")
self.popen.stdin.flush()
else:
# In the off chance that a message
# was successfully decoded as JSON,
# but *wasn't* a request, just print it.
sys.stdout.write(line) But that just freezes Nuke. Got any ideas how to proceed from here? |
Ok, those are good tests to rule out, but there's one other main thing we need to rule out. Basically, what's happening at the moment is that What we need is a window of time in which Pyblish is in control, and doesn't let Nuke do anything. The canonical way of doing that is running a function in the main thread that's blocking, until it's done. That doesn't happen here, because QML is sending requests to Nuke "on the fly". When no more requests are sent, it's considered done. This is the function running in a thread, calling Give it a try. Sorry I can't be more helpful, it's difficult when I can't reproduce the problem. But I admire your perseverance in figuring this out! |
Thanks :) Any ideas are very welcome, so I do appreciate you taking the time to think about this. |
I tried changing it to execute the listening method in a non-threaded way: if not self.listening:
_listen()
self.listening = True That strangely enough freezes Nuke in the same that running it in a QThread did. What I find particularly odd if that when the |
I've got a good feeling about the above. If it works, you may not even need threadedWrapper at all, as it would all be running in the main thread to begin with with.
It will appear as though Nuke has frozen, because it's waiting for that function to return. It should return once you close QML, did it?
Try not using the wrapper. |
|
Ok, that's actually not crashing. It's just waiting. That dialog is coming from Windows, in response to any process not responding. Odds are the GUI hasn't been told to show when we call |
Not calling Still trying to wrap my head around the subprocess workflow of pyblish-qml, but I'm guessing the pyblish-qml GUI is unresponsive because its waiting on a message from its parent (Nuke) to do anything? |
Ok, so I've given this a try in Maya, the same principle should apply. Here's what you can do.
Then you can do this. (run line by line) from pyblish_qml import show
server = show()
# Now the GUI will appear, Nuke should remain unblocked
server.listen()
# Now Nuke is blocked, and QML should come alive The call to If this works, then we could have a look at an option like |
Correct me if I'm wrong, but we are missing having to change I tried without the above change, and encountered the issue error, while evaluating the Nuke nodes. After the above change, I couldn't replicate the issue error (whoooot!!!), which means we might have a winner. |
Yes, that's true. You need to make that change as well, else the listener will run in the main thread, calling the threaded wrapper even though it doesn't need to. It might even cause more trouble, as it'd be scheduling something for a thread, when it itself is in that thread.
That's great, let me know how it goes! |
This is effectively not using any threading, so if it was a problem with threads, I'd expect this to work. If it does, there are probably things we can do to regain the non-modal behaviour from before. The original work on threading is rather heavy handed; like the fact that it does in fact return control to Nuke inbetween running plug-ins. That's a little unnecessary. It just hasn't been a problem up till now. It also reminds me of when you asked whether QML could run without threading; I said no and I honestly hadn't considered what we're doing now to be possible. If I had, maybe a lot of this could have been avoided, sorry about that! |
I'll open up a PR so we can discuss implementation specifics. When we are happy with the implementation, I'll put it into production here and see how well it solves the problem, but as you said it circumvents threading which the issue area. |
I'm going to say this issue is resolved with #269. Phew! |
Phew indeed! Threading issues are a pain. |
Issue
We quite often get threading issue with pyblish-qml and Nuke with the following message:
Threading issues are always tough to replicate. but users seem to experience them in bulks. This error makes pyblish-qml window open and then immediately close, which makes the users try it again and again in sequence.
Seems like the issue can resolve itself by waiting and/or trying multiple times.
For these reasons maybe it could be something to do with a pyblish-qml instance that does not get closed/shutdown properly?
The text was updated successfully, but these errors were encountered: