You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I simulate being unable to allocate memory on the device, both for data and for streams, I get the following stack:
#5 0x00007ffff7e57995 in parsec_list_destruct (list=0x7ffff7fbd2a0 <parsec_per_stream_infos+64>)
at /home/bosilca/unstable/parsec/parsec/parsec/class/parsec_list.c:45
#6 0x00007ffff7e5bdaa in parsec_obj_run_destructors (object=0x7ffff7fbd2a0 <parsec_per_stream_infos+64>)
at /home/bosilca/unstable/parsec/parsec/parsec/class/parsec_object.h:446
#7 0x00007ffff7e5c102 in parsec_info_destructor (obj=0x7ffff7fbd260 <parsec_per_stream_infos>)
at /home/bosilca/unstable/parsec/parsec/parsec/class/info.c:34
#8 0x00007ffff7eb0ceb in parsec_obj_run_destructors (object=0x7ffff7fbd260 <parsec_per_stream_infos>)
at /home/bosilca/unstable/parsec/parsec/parsec/class/parsec_object.h:446
#9 0x00007ffff7eb35bd in parsec_mca_device_fini () at /home/bosilca/unstable/parsec/parsec/parsec/mca/device/device.c:572
#10 0x00007ffff7e764d0 in parsec_fini (pcontext=0x7fffffff49a0) at /home/bosilca/unstable/parsec/parsec/parsec/parsec.c:1235
#11 0x000000000040374f in main (argc=1, argv=0x7fffffff4b38)
at /home/bosilca/unstable/parsec/parsec/tests/dsl/dtd/dtd_test_allreduce.c:237
The issue seems to be during the release of parsec_per_stream_infos because there are still infos registered inside. The CUDA code seems to perform actually really well, the devices failing to allocate memory are removed, and the execution unfolds without them.
The issue seems to be during the release of
parsec_per_stream_infos
because there are still infos registered inside. The CUDA code seems to perform actually really well, the devices failing to allocate memory are removed, and the execution unfolds without them.Originally posted by @bosilca in #630 (comment)
The text was updated successfully, but these errors were encountered: