-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New precompilation crashes on Julia 1.11-rc1 #55147
Comments
I don't see any segmentation fault in the error you shared. |
Wasn't sure what the "Illegal instruction" is. Updated description to just be "Error". |
It means that the processor was asked to execute instructions it doesn't support ("illegal"). Think, for example, of trying to execute avx512 instructions on on a avx/avx2 processor (maybe because you compiled the program on a different machine, with a larger instructions set than the current one): it'd have no clue of what you're talking about. |
I see. So, guess it's a bug then? I see it on macOS M1 and then also the GitHub actions with ubuntu-latest: https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:640. This one gets the
|
Here's my git bisect log:
(Most of the skips are due to hanging precompilation.) |
Only 100 revisions left in the git bisect log but I unfortunately need to run. The bisect log is above if someone wants to start where I left off. (My computer seems pretty slow at compilation unfortunately) |
Executing unreachable code triggers a trap which on most architectures just becomes a SIGILL so it might be this |
It's also a good idea to test an assertions build first. |
Just tested assertions build; no extra info. Is there a way I can see what Julia CI builds are successful for each commit? It seems like I have to skip a lot of commits which is making this bisecting take much longer than expected |
I see a commit
It seems like all of the commits before that I hit infinite precompilation... So not sure I will be able to bisect this further. |
It's really obnoxious, but you can cherry pick that commit onto previous commits in your script |
I just realised the SIGTRAP was what you said the error was likely coming from. So I'll try to do another bisection, treating the infinite precompilation == bad. And if that doesn't work, I'll do the cherry pick stuff. |
Ok, found it! #51000 is the cause. |
More clues:
|
Can we add this to the 1.11 milestone? |
I redid the bisect because the PR you mentioned looked harmless and I think wouldn't cause an unreachable reached error but a GC errror. |
Weird. Maybe one of the commits was accidentally marked good/bad(?), since it is hard to know if the precompilation is truly hanging or not. Does 231ca24 make sense as the cause to you? |
Yep. I just tried on top of release-1.11 and reverting it does make it not crash |
Cool. So I guess it's the addition of |
Isn't this a bug in DispatchDoctor? |
DispatchDoctor doesn’t operate on generated functions, see https://github.com/MilesCranmer/DispatchDoctor.jl?tab=readme-ov-file#-special-cases |
Are we sure that not all generated functions are using |
@KristofferC I mean of course I agree with the general sentiment but some of the code it's crashing while precompiling involve |
@aviatesk I still see the bug. Can you provide details on your system? And to confirm, you saw the |
Yes, I am seeing exactly the same segfault, but the patch I applied yesterday wrongly seems to have eliminated the segfault due to the order of precompilation (specifically, changes in Preferences significantly alter DD's behavior, but preference changes do not necessarily trigger re-precompilation). If I clear the precompilation cache manually, the same error occurs even with both patches applied. |
Initially, I thought it was caused by the impureness of the |
This particular issue may not be related to diff --git a/src/stabilization.jl b/src/stabilization.jl
index fdfb60f..310fe7d 100644
--- a/src/stabilization.jl
+++ b/src/stabilization.jl
@@ -115,7 +115,7 @@ function _stabilize_all(ex::Expr, downward_metadata::DownwardMetadata; kws...)
@assert length(upward_metadata.unused_macros) == length(upward_metadata.macro_keys)
new_ex = Expr(:macrocall, upward_metadata.unused_macros[end]..., inner_ex)
- new_upward_metadata = UpwardMetadata(;
+ new_upward_metadata = UpwardMetadata(;
matching_function = upward_metadata.matching_function,
unused_macros = upward_metadata.unused_macros[1:end-1],
macro_keys = upward_metadata.macro_keys[1:end-1],
diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..66fb235 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -97,7 +97,7 @@ specializing_typeof(::Type{T}) where {T} = Type{T}
specializing_typeof(::Val{T}) where {T} = Val{T}
map_specializing_typeof(args...) = map(specializing_typeof, args)
-_promote_op(f, S::Type...) = Base.promote_op(f, S...)
+_promote_op(f, S::Type...) = Base.infer_return_type(f, S)
_promote_op(f, S::Tuple) = _promote_op(f, S...)
@static if isdefined(Core, :kwcall)
function _promote_op(
@@ -120,7 +120,7 @@ return false for `Union{}`, so that errors can propagate.
# so we implement a workaround.
@inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)
-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
::Type{T}, ::Val{union_limit}
) where {T,union_limit}
if T isa UnionAll While we ideally need to improve the reliability of |
Thanks for all of this, I appreciate it. So, looking into this alternative, one issue with julia> using DispatchDoctor # With the patch in https://github.com/JuliaLang/julia/issues/55147#issuecomment-2262472199
julia> @stable f(x) = x
f (generic function with 1 method)
julia> @code_llvm f(1) ; Function Signature: f(Int64)
; @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301 within `f`
define i64 @julia_f_6223(i64 signext %"x::Int64") #0 {
top:
%jlcallframe1 = alloca [6 x ptr], align 8
%gcframe2 = alloca [8 x ptr], align 16
call void @llvm.memset.p0.i64(ptr align 16 %gcframe2, i8 0, i64 64, i1 true)
%0 = getelementptr inbounds ptr, ptr %gcframe2, i64 2
%"new::OptimizationParams" = alloca { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, align 8
%1 = alloca { i64, { ptr, [1 x i64] }, ptr, { i64, i64, i64, i64, i64, i8, i8, i8, i8, i8 }, { i8, i64, i64, i64, i64, i64, i8, i8, i8 } }, align 8
%pgcstack = call ptr inttoptr (i64 6490275500 to ptr)(i64 261) #5
store i64 24, ptr %gcframe2, align 16
%task.gcstack = load ptr, ptr %pgcstack, align 8
%frame.prev = getelementptr inbounds ptr, ptr %gcframe2, i64 1
store ptr %task.gcstack, ptr %frame.prev, align 8
store ptr %gcframe2, ptr %pgcstack, align 8
; ┌ @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/utils.jl:101 within `_promote_op` @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/utils.jl:100
; │┌ @ reflection.jl:1872 within `infer_return_type`
; ││┌ @ reflection.jl:2586 within `get_world_counter`
%2 = call i64 @jlplt_ijl_get_world_counter_6226_got.jit()
; ││└
; ││┌ @ compiler/types.jl:373 within `NativeInterpreter`
; │││┌ @ compiler/types.jl:321 within `OptimizationParams`
; ││││┌ @ compiler/utilities.jl:506 within `inlining_enabled`
; │││││┌ @ options.jl:68 within `JLOptions`
; ││││││┌ @ pointer.jl:153 within `unsafe_load` @ pointer.jl:153
%pointerref.sroa.1.0.copyload = load i8, ptr getelementptr inbounds (i8, ptr @jl_options.found.jit, i64 110), align 2
; │││││└└
; │││││┌ @ promotion.jl:483 within `==` @ promotion.jl:639
%3 = icmp eq i8 %pointerref.sroa.1.0.copyload, 1
; ││││└└
; ││││ @ compiler/types.jl:321 within `OptimizationParams` @ compiler/types.jl:321
; ││││┌ @ compiler/types.jl:341 within `#OptimizationParams#314`
; │││││┌ @ compiler/types.jl:309 within `OptimizationParams`
%4 = zext i1 %3 to i8
store i8 %4, ptr %"new::OptimizationParams", align 8
%5 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 1
store <2 x i64> <i64 100, i64 1000>, ptr %5, align 8
%6 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 3
store <2 x i64> <i64 250, i64 20>, ptr %6, align 8
%7 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 5
store i64 32, ptr %7, align 8
%8 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 6
store i8 1, ptr %8, align 8
%9 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 7
store i8 0, ptr %9, align 1
%10 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 8
store i8 0, ptr %10, align 2
; │││└└└
call void @"j_#NativeInterpreter#315_6236"(ptr noalias nocapture noundef nonnull sret({ i64, { ptr, [1 x i64] }, ptr, { i64, i64, i64, i64, i64, i8, i8, i8, i8, i8 }, { i8, i64, i64, i64, i64, i64, i8, i8, i8 } }) %1, ptr noalias nocapture noundef nonnull %0, ptr nocapture nonnull readonly @"_j_const#10", ptr nocapture nonnull readonly %"new::OptimizationParams", i64 zeroext %2)
; ││└
%11 = call nonnull ptr @"j_#infer_return_type#34_6238"(i64 zeroext %2, ptr nocapture nonnull readonly %1, ptr nonnull readonly @"jl_global#6239.jit", ptr nonnull readonly @"jl_global#6240.jit")
; └└
; ... (truncated) Whereas with the current unpatched use of julia> @code_llvm f(1) ; @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301 within `f`
define i64 @julia_f_1198(i64 signext %0) #0 {
top:
; @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:306 within `f`
ret i64 %0
} Ideally I would like for DispatchDoctor to use whatever list comprehensions and Also, let me know if I can do anything to help with debugging |
Well, this is exactly why |
List comprehensions and |
Here's an example of what I mean with julia> function fake_promote_op(f::F, args...) where {F}
x = map(_ -> f(args...), 1:0)
return eltype(x)
end Note that this never actually calls However, this is completely type stable, known at compile time, and is correct: julia> fake_promote_op(+, 1.0)
Float64
julia> @code_typed fake_promote_op(+, 1.0)
CodeInfo(
1 ─ return Float64
) => Type{Float64}
julia> @code_llvm fake_promote_op(+, 1.0)
; @ REPL[11]:1 within `fake_promote_op`
define nonnull {}* @julia_fake_promote_op_1386(double %0) #0 {
top:
; @ REPL[11]:3 within `fake_promote_op`
ret {}* inttoptr (i64 4823715024 to {}*)
} Should I use this |
If
I think this is also true more broadly, rather than just DD. I would assume that empty collections generated by |
It's even more prevalent because the length of a collection could be based on the value rather than the type, meaning that |
Please recognize that the risks inherent in list comprehensions and Just to be sure, I confirmed that the issue is not with diff --git a/src/stabilization.jl b/src/stabilization.jl
index fdfb60f..310fe7d 100644
--- a/src/stabilization.jl
+++ b/src/stabilization.jl
@@ -115,7 +115,7 @@ function _stabilize_all(ex::Expr, downward_metadata::DownwardMetadata; kws...)
@assert length(upward_metadata.unused_macros) == length(upward_metadata.macro_keys)
new_ex = Expr(:macrocall, upward_metadata.unused_macros[end]..., inner_ex)
- new_upward_metadata = UpwardMetadata(;
+ new_upward_metadata = UpwardMetadata(;
matching_function = upward_metadata.matching_function,
unused_macros = upward_metadata.unused_macros[1:end-1],
macro_keys = upward_metadata.macro_keys[1:end-1],
diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..19d2b7b 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -97,7 +97,35 @@ specializing_typeof(::Type{T}) where {T} = Type{T}
specializing_typeof(::Val{T}) where {T} = Val{T}
map_specializing_typeof(args...) = map(specializing_typeof, args)
-_promote_op(f, S::Type...) = Base.promote_op(f, S...)
+# `promote_op` without any usage of `any`
+let ex = Expr(:block)
+ function gen_case_n(n)
+ blk = Expr(:block)
+ ret = :(Tuple{})
+ for i = 1:n
+ push!(blk.args, :(args[$i] === Union{} && return Union{}))
+ push!(ret.args, :(args[$i]))
+ end
+ return :(if length(args) == $n
+ $blk
+ return $ret
+ end)
+ end
+ for i = 0:15
+ push!(ex.args, gen_case_n(i))
+ end
+ @eval function TupleOrBottom_without_any(args...)
+ $ex
+ end
+end
+
+function promote_op_without_any(f, S::Type...)
+ argT = TupleOrBottom_without_any(S...)
+ argT === Union{} && return Union{}
+ return Core.Compiler.return_type(f, argT)
+end
+
+_promote_op(f, S::Type...) = promote_op_without_any(f, S...)
_promote_op(f, S::Tuple) = _promote_op(f, S...)
@static if isdefined(Core, :kwcall)
function _promote_op(
@@ -120,7 +148,7 @@ return false for `Union{}`, so that errors can propagate.
# so we implement a workaround.
@inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)
-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
::Type{T}, ::Val{union_limit}
) where {T,union_limit}
if T isa UnionAll |
What assumptions do you mean? Is it that DD dispatches on the return value of For example, will the following code potentially lead to a segfault? g() = [i for i in 1:0]
h() = g() isa Vector{Int}
@generated function foo()
if h()
return :(1)
else
return :(2)
end
end |
I don't know the exact mechanism causing the segfault in DD. |
Consider that I could just rewrite DispatchDoctor to use x = []
T = eltype(map(_ -> f(args...), x))
# equivalent to: T = Base.promote_op(f, typeof.(args)...) I'm not doing anything fancy here, this type of code could be found in anybody's library. No Julia internals. Yet, this code implicitly is using This type of code is even used within the Julia standard library: julia/stdlib/LinearAlgebra/src/dense.jl Line 344 in df1976d
or or If it is a bad assumption that any use of Not to mention there are 1,400+ files which make explicit uses of that symbol on GitHub: https://github.com/search?q=/Core.Compiler.return_type/+language:julia&type=code. I've already added a workaround from the main issue on 1.11.0 but it seems like it could be causing the other similar issues like #53847, #53843, #53848, #53761, ... which might be due to a similar phenomena. This is the certainly one we've drilled into the reasons behind, but maybe the others are caused by the same Julia bug. |
I guess this goes back to your earlier comment:
I think "unreliability is minimal" is pulling a lot of weight here. I think any unreliability should be fixed at the source, rather than shooting the messenger. The comment
DD's usage feels like it's actually very minimal changes in behavior, no? DD is just turning on/off logging from the result of the return type inference. The usage of It's almost like the phenomena caused by DD here is an "rr chaos mode" of the type inference system. If a bug shows up in rr chaos mode, it's still a bug. Similarly, if a segfault shows up from DD's usage of promote_op, it's still a segfault that should be patched in Base. |
Looking closer into this, it appears this segfaults because the DynamicExpressions module defined the return type of the simulator of
I haven't been able to make an MWE that simulates this however |
In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix #55147
In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix #55147
Very interesting, thanks for the note. Maybe a MWE of this inference stack overflow would be the following? foo() = [foo() for _ in 1:0] Or would this not replicate the issue? (On mobile so can’t check) |
In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix #55147
In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix #55147
In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix #55147 @aviatesk You might know how to solve this even better, using post-optimization effect refinements? Since we should actually only apply the refinement of terminates=false => terminates=true (and thus allowing concrete eval) if the optimization occurs, and not just in inference thinks the optimization would be legal. --------- Co-authored-by: Shuhei Kadowaki <[email protected]>
…#55338) In extreme cases, the compiler could mark this function for concrete-eval, even though that is illegal unless the compiler has first deleted this instruction. Otherwise the attempt to concrete-eval will re-run the function repeatedly until it hits a StackOverflow. Workaround to fix JuliaLang#55147 @aviatesk You might know how to solve this even better, using post-optimization effect refinements? Since we should actually only apply the refinement of terminates=false => terminates=true (and thus allowing concrete eval) if the optimization occurs, and not just in inference thinks the optimization would be legal. --------- Co-authored-by: Shuhei Kadowaki <[email protected]>
I'm seeing some precompilation crashes on Julia 1.11-rc1 when precompiling DynamicExpressions.jl with DispatchDoctor in-use on the package. (DispatchDoctor.jl is basically a package that calls
promote_op
on each function and uses that to flag type instabilities.)Here is the traceback:
versioninfo:
I installed Julia with juliaup. To reproduce this issue, you can run the following code:
I can prevent this error with the following PR on DispatchDoctor.jl: MilesCranmer/DispatchDoctor.jl@094b165~...b223a4d. The PR basically amounts to changing some functions into
@generated
form:However, it doesn't seem like DispatchDoctor.jl or DynamicExpressions.jl is doing anything wrong, so I'm not sure what's going on. Both before and after seem to be valid Julia code. Also, the downside of that PR is it introduces a type instability in Zygote autodiff, and there doesn't seem to be a way around it that both prevents the segfault while also eliminating the type instability.
I don't understand the conditions for reproducing this, so this is so far my only example. When I make various tweaks to
_promote_op
within DispatchDoctor.jl, I seem to end up with different segfaults – one of which is theUnreachable reached
bug.cc @avik-pal
The text was updated successfully, but these errors were encountered: