Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root the function object in jlcall #14301

Merged
merged 2 commits into from
Dec 11, 2015
Merged

Root the function object in jlcall #14301

merged 2 commits into from
Dec 11, 2015

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Dec 7, 2015

It took me a while to figure out but this is what causing a segfault for the following code with threading in #14190 .

f is a global function but has a closure specialization. It's address is inlined by the caller but is then replaced by a new method, causing a segfault when calling the old function when it's trying to access the closure variables. In some sense this is a consequence of #265 since the method is otherwise guaranteed to be rooted in the method table. This might make the gc root list bigger but until #265 is fixed, this is the only fix I can come up with.

using Base.Threads

ex = quote
    for j in 1:10
        println(j)
        @threads all for i in 1:10
            global f() = i
            gc(false)
            f()
        end
    end
end

for k in 1:10000
    eval(ex)
end

@yuyichao yuyichao added compiler:codegen Generation of LLVM IR and native code GC Garbage collector backport pending 0.4 labels Dec 7, 2015
@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 7, 2015

Huh?...... Travis/pr lin32 has a strange ENOSPC failure.....

Exception running test file :
On worker 2:
LoadError: start_watching (FileMonitor): no space left on device (ENOSPC)
 [inlined code] from stream.jl:1024
 in start_watching at poll.jl:295
 in wait at poll.jl:355
 in test_monitor_wait at /tmp/julia/share/julia/test/file.jl:162
 [inlined code] from essentials.jl:114
 in include_string at loading.jl:355
 in include_from_node1 at ./loading.jl:395
 [inlined code] from util.jl:179
 in runtests at /tmp/julia/share/julia/test/testdefs.jl:7
 in anonymous at /tmp/julia/share/julia/test/runtests.jl:36
 in anonymous at multi.jl:1005
 in run_work_thunk at multi.jl:709
 [inlined code] from multi.jl:1005
 in anonymous at task.jl:63
while loading /tmp/julia/share/julia/test/file.jl, in expression starting on line 182
ERROR: LoadError: Some tests exited with errors.
 in anonymous at /tmp/julia/share/julia/test/runtests.jl:64
 in cd at file.jl:47
 in include at ./boot.jl:260
 in include_from_node1 at ./loading.jl:392
 in process_options at ./client.jl:277
 in _start at ./client.jl:377
while loading /tmp/julia/share/julia/test/runtests.jl, in expression starting on line 13

@tkelman
Copy link
Contributor

tkelman commented Dec 7, 2015

probably hitting a disk quota due to the new caching stuff. still iterating on it a bit.

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 7, 2015

Added a non-threading repro as test...

@yuyichao
Copy link
Contributor Author

yuyichao commented Dec 8, 2015

Comment on this one?

@yuyichao yuyichao force-pushed the yyc/gc/jlcall-root branch 3 times, most recently from 7440563 to dce3d20 Compare December 10, 2015 00:50
@yuyichao
Copy link
Contributor Author

The additional root shouldn't be too bad since both the array and the function are likely old anyway. From the sysimg size it also looks OK (sys.so increases by ~5kB).

I'll merge on Friday if no one has a better solution.

@JeffBezanson
Copy link
Member

I do have a slightly better solution, but on jb/functions :)

@yuyichao
Copy link
Contributor Author

I do have a slightly better solution, but on jb/functions :)

Cool. Feel free to close this if jb/functions is ready before Friday. ;-P

Actually what's the better solution? Fixing #265?

yuyichao added a commit that referenced this pull request Dec 11, 2015
@yuyichao yuyichao merged commit 88435e8 into master Dec 11, 2015
@yuyichao yuyichao deleted the yyc/gc/jlcall-root branch December 11, 2015 12:55
yuyichao added a commit that referenced this pull request Jan 9, 2016
(cherry picked from commit c1005bf)
ref #14301
@tkelman
Copy link
Contributor

tkelman commented Jan 12, 2016

This causes segfaults in jl_restore_incremental in a bunch of packages when backported to release-0.4, so removing backport pending label unless you can come up with a better idea.

@yuyichao
Copy link
Contributor Author

@tkelman I couldn't reproduce the segfault in FixedPointNumbers but I could with the Cbc tests or using JuMP. The segfault happens on the tag of a singleton value (Base.DotMulFun() in this case) being 0x20. @vtjnash pointed out that this should be fixed recently by the commit a1e98be on jb/functions. I'm not exactly sure about the detail but the commit seems to apply on 0.4 cleanly and fixes the symptom locally.

Add back backport label. Can wait until next release. The commit linked above is probably a good candidate too.

c.c. @JeffBezanson

@JeffBezanson
Copy link
Member

Yes, that commit should definitely be applied to master and 0.4; I tried to keep it isolated for that purpose.

@tkelman
Copy link
Contributor

tkelman commented Jan 12, 2016

The FixedPointNumbers failure was a terminate timeout, that package's tests take forever if inlining is off.

@yuyichao
Copy link
Contributor Author

I tried to keep it isolated for that purpose.

That was very helpful! Might worth adding tests if there's any simpler and deterministic repro.

The FixedPointNumbers failure was a terminate timeout.

Ahh, that makes sense then. I was also wondering why the backtrace is different.....

tkelman pushed a commit that referenced this pull request Mar 7, 2016
(cherry picked from commit c1005bf)
ref #14301
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling f95ffcd on yyc/gc/jlcall-root into ** on master**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants