Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Tutorial with Python' issues running the reaction_diffusion.py example file #307

Open
Rick-Fusion opened this issue Jan 14, 2025 · 4 comments
Assignees

Comments

@Rick-Fusion
Copy link

Rick-Fusion commented Jan 14, 2025

Hi, I'm Rick, doing my graduation project at Ignition Computing.

I am completely new to MUSCLE3, so I am going through the tutorial on https://muscle3.readthedocs.io/en/latest/tutorial.html.
While trying to run the example file 'reaction_diffusion.py', being connected to the iter sdcc server, I noticed no figure was appearing.
Upon further inspection in the NoMachine Client VSC environment, it was found that an error occurred (still no figure was appearing):

TimeoutError: timed out
ERROR:libmuscle.runner:Component micro crashed, please check the log file for error messages"

it also gave the following error in the muscle3.macro.log file:

/git_repos/muscle3/docs/source/examples/python/reaction_diffusion.py:122: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown"

Maarten suggested to change Matplotlib's backend using the command: export MPLBACKEND=TkAgg

When now running the python file, it shows the figure in a separate window for a brief moment, before the Timeout error kicks in. Then the window disappears.

Trying the same in VSC, logging in with ssh -X iter_login03 to the iter sdcc server, and changing the MPLBACKEND to TkAgg, it gives the following error while running the file:

(venv) [vanschr@sdcc-login03 python]$ python reaction_diffusion.py
Traceback (most recent call last):
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/reaction_diffusion.py", line 156, in <module>
    run_simulation(configuration, implementations)
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/runner.py", line 332, in run_simulation
    run_instances(instances, controller.manager_location)
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/runner.py", line 289, in run_instances
    raise RuntimeError(msg)
RuntimeError: Instance(s) micro failed to shut down cleanly. Here is the final bit of the output:
 ---------- micro ----------

  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 1110, in pre_receive
    msg, saved_until = self._communicator.receive_message(port_name, slot)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/communicator.py", line 232, in receive_message
    mpp_message_bytes, profile = client.receive(recv_endpoint.ref())
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mpp_client.py", line 54, in receive
    return self._transport_client.call(encoded_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_transport_client.py", line 88, in call
    length = recv_int64(self._socket)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_util.py", line 63, in recv_int64
    buf = recv_all(socket, 8)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_util.py", line 25, in recv_all
    received_now = socket.recv_into(
                   ^^^^^^^^^^^^^^^^^
TimeoutError: timed out
ERROR:libmuscle.runner:Component micro crashed, please check the log file for error messages

See muscle3.micro.log for the complete output

The muscle3.micro.log contains:

(venv) [vanschr@sdcc-login03 python]$ cat muscle3.micro.log 
INFO:libmuscle.instance:Registered with the manager
INFO:libmuscle.instance:Received peer locations and base settings
INFO:libmuscle.communicator:Connecting to peer macro at ['tcp:16.1.15.2:41221,10.153.0.128:41221']
Traceback (most recent call last):
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/runner.py", line 150, in implementation_process
    implementation()
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/reaction_diffusion.py", line 19, in reaction
    while instance.reuse_instance():
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 231, in reuse_instance
    do_reuse = self._decide_reuse_instance()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 854, in _decide_reuse_instance
    got_f_init_messages = self._pre_receive()
                          ^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 1064, in _pre_receive
    self.__pre_receive_f_init()
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 1126, in __pre_receive_f_init
    pre_receive(port_name, None)
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/instance.py", line 1110, in pre_receive
    msg, saved_until = self._communicator.receive_message(port_name, slot)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/communicator.py", line 232, in receive_message
    mpp_message_bytes, profile = client.receive(recv_endpoint.ref())
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mpp_client.py", line 54, in receive
    return self._transport_client.call(encoded_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_transport_client.py", line 88, in call
    length = recv_int64(self._socket)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_util.py", line 63, in recv_int64
    buf = recv_all(socket, 8)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/build/venv/lib/python3.11/site-packages/libmuscle/mcp/tcp_util.py", line 25, in recv_all
    received_now = socket.recv_into(
                   ^^^^^^^^^^^^^^^^^
TimeoutError: timed out
ERROR:libmuscle.runner:Component micro crashed, please check the log file for error messages

The muscle3.macro.log file contains:

INFO:libmuscle.instance:Registered with the manager
INFO:libmuscle.instance:Received peer locations and base settings
INFO:libmuscle.communicator:Connecting to peer micro at ['tcp:16.1.15.2:38835,10.153.0.128:38835']

The muscle3_manager.log file contains;

muscle_manager 2025-01-14 13:39:04,028 INFO    libmuscle.manager.profile_store: Overwriting profiling database /home/ITER/vanschr/git_repos/muscle3/docs/source/examples/python/performance.sqlite
muscle_manager 2025-01-14 13:39:04,103 INFO    libmuscle.manager.mmp_server: Registered instance macro
muscle_manager 2025-01-14 13:39:04,106 INFO    libmuscle.manager.mmp_server: Registered instance micro
micro          2025-01-14 13:39:07,272 ERROR   micro: Component micro crashed, please check the log file for error messages

In local VSC, it does not even show the plot. (maybe it runs so fast, that the window has no time to open, before the timeout occurs)

I don't know what the problem is, but I was asked to report this issue. Regards!

@LourensVeen
Copy link
Contributor

LourensVeen commented Jan 14, 2025

Hi Rick, nice to meet you!

I think this may be related to some changes I made to the networking code in October, in combination with SDCC being slow somehow. (I'm not sure how that would happen, but I've heard similar reports recently from another user on SDCC. Heavy I/O from some other user's job could do it.).

Are you running the latest release, or an in-development version?

If the latter, could you try going to libmuscle/python/libmuscle/mcp/tcp_transport_client.py line 131 and change

try:
    sock.settimeout(20.0 if patient else 3.0)     # seconds
    sock.connect(sockaddr)
except Exception:
    sock.close()
    continue

(if it looks like that) into

try:
    sock.settimeout(20.0 if patient else 3.0)     # seconds
    sock.connect(sockaddr)
    sock.settimeout(60.0)
except Exception:
    sock.close()
    continue

and see if that helps?

The shortened timeout is intended to help skip over inoperable networks more quickly on startup, but it occurs to me that that timeout remains set on the socket so that any delay of more than three seconds on the working network will then cause a timeout. And your last log messages are just about three seconds apart...

Once we know the network works we can afford to wait a bit longer if the system has a hiccup, so the above code tries to reset the timeout to something more reasonable once we have a connection.

@LourensVeen LourensVeen self-assigned this Jan 14, 2025
@Rick-Fusion
Copy link
Author

Hi Lourens,

Thanks for your quick message.

While checking in SDCC modules what the version of MUSCLE3 was, I just noticed that I made a big mistake. It wasn't even there!!
I previously added the module in the .bashrc file, but apparently I forgot to save it, and didn't even check anymore.

I cloned the MUSCLE3 repository in the SDCC environment and went from there, assuming that the MUSCLE3 module was loaded.

I added it now (properly....) and the file runs perfectly.

Excuses, there is no issue at all now.

Thanks again!

@LourensVeen
Copy link
Contributor

Hi Rick,

No, you did in fact find an issue. From the paths in the backtrace above, you were running from the Git repository, probably the current develop version. That has the new networking code in there, with the too-short timeout.

The module on SDCC contains the latest released version, which is older and does not have the shortened timeout code yet. So by switching to it, you've now got the examples running, but the problem is still there on the develop branch.

I'm planning to make a new release this month which would have had this problem in it if you hadn't reported it, and then it would have showed up again. So thanks for the report! I'll go fix this before the new version goes out.

@Rick-Fusion
Copy link
Author

Hi Lourens,

Good to know for my understanding. Glad I could help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants