Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer") #11

Open
Socob opened this issue Jul 11, 2023 · 4 comments

Comments

@Socob
Copy link

Socob commented Jul 11, 2023

Not sure if this is something new to more recent Julia versions, but I’m getting the following when using SlurmClusterManager and Julia exits:

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/rtutils.c:41
ijl_switch at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/task.c:634
try_yieldto at ./task.jl:910
wait at ./task.jl:984
#wait#621 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
_trywait at ./asyncevent.jl:138
wait at ./asyncevent.jl:155 [inlined]
sleep at ./asyncevent.jl:240
#7 at ~/.julia/packages/SlurmClusterManager/R0zin/src/slurmmanager.jl:93
unknown function (ip: 0x14d06861ecb2)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
run_finalizer at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:417
jl_gc_run_finalizers_in_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:507
run_finalizers at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:553
ijl_atexit_hook at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:299
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:718
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)

which seems to be due to the calls to wait/sleep in the finalizer defined in launch():

https://github.com/kleinhenz/SlurmClusterManager.jl/blob/0bfcf079889ce3a7f64b9aef1c1cbe3136bf5e44/src/slurmmanager.jl#L89-L94

See the docstring for finalizer:

help?> finalizer
search: finalizer UndefInitializer finalize

  finalizer(f, x)


  Register a function f(x) to be called when there are no program-accessible references to x, and return x. The type of x must be a mutable
  struct, otherwise the function will throw.

  f must not cause a task switch, which excludes most I/O operations such as println. Using the @async macro (to defer context switching to
  outside of the finalizer) or ccall to directly invoke IO functions in C may be helpful for debugging purposes.

The following avoids the error, but I’m not sure if that still accomplishes the same as the existing code (although I haven’t had any issues so far):

finalizer(manager) do manager
  @async begin
    wait(manager.srun_proc) 
    # need to sleep briefly here to make sure that srun exit is recorded by slurm daemons 
    # TODO find a way to wait on the condition directly instead of just sleeping 
    sleep(manager.srun_post_exit_sleep) 
  end
end 

A workaround which doesn’t involve changing the package source code is to call finalize on the SlurmManager at the end of the program.

@john-waczak
Copy link

Currently running into the same problem. The error appears when I run with 1.9.x but does not appear using version 1.7.2 (available on the cluster I'm using via module load julia)

BachoSeven added a commit to BachoSeven/homotopy-continuation that referenced this issue Dec 26, 2023
@BachoSeven
Copy link

Still getting this on 1.10.0-betaX.

kbtang28 added a commit to kbtang28/SlurmClusterManager.jl that referenced this issue Jul 2, 2024
@GuusAvis
Copy link

Today I ran into this with Julia 1.9.3.

@DilumAluthge
Copy link
Member

Does anyone have a minimal reproducible example for this bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants