Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix eager API thunk retention #258

Merged
merged 4 commits into from
Aug 13, 2021
Merged

Fix eager API thunk retention #258

merged 4 commits into from
Aug 13, 2021

Conversation

jpsamaroo
Copy link
Member

Should fix #248

@jpsamaroo
Copy link
Member Author

One thing to note about this PR is that, while EagerThunk and ThunkID references will now properly preserve their associated Thunk object, there will be a rather large increase in DRefs (distributed references), and so distributed GC traffic will definitely increase and start to become a bottleneck. I expect we'll have to find some way to optimize GC traffic transfers in MemPool to accomodate this.

@jpsamaroo
Copy link
Member Author

While I fix up the tests, @eschnett would you mind testing this locally to make sure it works for you?

@eschnett
Copy link

I'd be happy to! Just let me know when and which branch to test.

I just tested the jps/wavetoy-fix branch. It errors with

     Testing Running tests...
Error in eager listener:
AssertionError: t !== nothing
Stacktrace:
  [1] unwrap_weak_checked
    @ ~/.julia/packages/Dagger/kzwoA/src/thunk.jl:151 [inlined]
  [2] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [3] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [4] #19
    @ ./broadcast.jl:1098 [inlined]
  [5] macro expansion
    @ ./ntuple.jl:74 [inlined]
  [6] ntuple
    @ ./ntuple.jl:69 [inlined]
  [7] copy(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(Dagger.unwrap_weak_checked), Tuple{NTuple{5, Dagger.WeakThunk}}})
    @ Base.Broadcast ./broadcast.jl:1098
  [8] materialize
    @ ./broadcast.jl:883 [inlined]
  [9] (::Dagger.Sch.var"#dominates#34")(target::Dagger.Thunk, t::Dagger.Thunk)
    @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:138
 [10] (::Dagger.Sch.var"#32#35"{Dagger.Thunk})(_t::Dagger.Thunk)
    @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:138
 [11] _any(f::Dagger.Sch.var"#32#35"{Dagger.Thunk}, itr::Tuple{Dagger.Thunk}, #unused#::Colon)
    @ Base ./reduce.jl:876
 [12] any(f::Function, itr::Tuple{Dagger.Thunk})
    @ Base ./reduce.jl:871
 [13] (::Dagger.Sch.var"#dominates#34")(target::Dagger.Thunk, t::Dagger.Thunk)
    @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:138
 [14] _register_future!(ctx::Dagger.Context, state::Dagger.Sch.ComputeState, task::Task, tid::Int64, ::Tuple{Dagger.ThunkFuture, Dagger.Sch.ThunkID})
    @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:139
 [15] _add_thunk!(ctx::Dagger.Context, state::Dagger.Sch.ComputeState, task::Task, tid::Int64, ::Tuple{typeof(*), Tuple{Float64, Dagger.Sch.ThunkID}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Dagger.ThunkFuture, MemPool.DRef})
    @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:190
 [16] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [17] invokelatest
    @ ./essentials.jl:706 [inlined]
 [18] #23
    @ ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:69 [inlined]
 [19] lock(f::Dagger.Sch.var"#23#27"{Tuple{typeof(*), Tuple{Float64, Dagger.Sch.ThunkID}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Dagger.ThunkFuture, MemPool.DRef}, typeof(Dagger.Sch._add_thunk!), Dagger.Context, Dagger.Sch.ComputeState, Task}, l::ReentrantLock)
    @ Base ./lock.jl:187
 [20] macro expansion
    @ ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:68 [inlined]
 [21] (::Dagger.Sch.var"#22#26"{Dagger.Context, Dagger.Sch.ComputeState, Task, Distributed.RemoteChannel{Channel{Any}}, Distributed.RemoteChannel{Channel{Any}}})()
    @ Dagger.Sch ./task.jl:411
Stacktrace:
 [1] exec!(::Function, ::Dagger.Sch.SchedulerHandle, ::Function, ::Vararg{Any, N} where N)
   @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:112
 [2] add_thunk!(::Function, ::Dagger.Sch.SchedulerHandle, ::Float64, ::Vararg{Any, N} where N; future::Dagger.ThunkFuture, ref::MemPool.DRef, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/dynamic.jl:177
 [3] eager_thunk()
   @ Dagger.Sch ~/.julia/packages/Dagger/kzwoA/src/sch/eager.jl:81
 [4] macro expansion
   @ ~/.julia/packages/Dagger/kzwoA/src/processor.jl:154 [inlined]
 [5] (::Dagger.var"#51#52"{typeof(Dagger.Sch.eager_thunk), Tuple{}, NamedTuple{(:sch_uid, :sch_handle, :processor, :utilization), Tuple{UInt64, Dagger.Sch.SchedulerHandle, Dagger.ThreadProc, UInt64}}})()
   @ Dagger ./threadingconstructs.jl:169

@jpsamaroo
Copy link
Member Author

Crap, okay, I had it working at one point, but must have messed it back up along the way. Thankfully I can reproduce the failure locally.

@jpsamaroo
Copy link
Member Author

Actually, no, this failure is (probably) a red herring. Background: dominates is a function that walks the whole DAG upstream (along the inputs) of a given task, and wants to verify that we don't encounter our own task during the process, so that we can safely wait on the task of interest; if we encounter ourselves, then we would have a cycle in the DAG, and that is illegal (and would cause an infinite hang).

In this case, we can just ignore any tasks that have expired (currently those caught by the assertion), since if our own task is currently executing, it will not have expired; I wouldn't be surprised if later on I find that I'm wrong about this assumption, but I think it's a fair compromise for now.

@jpsamaroo
Copy link
Member Author

jpsamaroo commented Aug 13, 2021

@eschnett WaveToyDagger works locally for me with the current changes, and with CI passing, I'm ready to merge this whenever. If you want to test one last time, please go ahead and let me know if it's working for you!

Also, if you end up registering this or a similar package, I'd be happy to include it as part of the tests so we don't break your code again!

@eschnett
Copy link

Yes, I confirm it works.

@jpsamaroo jpsamaroo merged commit 5a6c939 into master Aug 13, 2021
@jpsamaroo jpsamaroo deleted the jps/wavetoy-fix branch August 13, 2021 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Error in eager listener: KeyError: key 578 not found"
2 participants