Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New precompilation crashes on Julia 1.11-rc1 #55147

Closed
MilesCranmer opened this issue Jul 16, 2024 · 68 comments · Fixed by #55338
Closed

New precompilation crashes on Julia 1.11-rc1 #55147

MilesCranmer opened this issue Jul 16, 2024 · 68 comments · Fixed by #55338
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@MilesCranmer
Copy link
Member

MilesCranmer commented Jul 16, 2024

I'm seeing some precompilation crashes on Julia 1.11-rc1 when precompiling DynamicExpressions.jl with DispatchDoctor in-use on the package. (DispatchDoctor.jl is basically a package that calls promote_op on each function and uses that to flag type instabilities.)

Here is the traceback:

ERROR: The following 1 direct dependency failed to precompile:

DynamicExpressions 

Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/Users/mcranmer/.julia/compiled/v1.11/DynamicExpressions/jl_cQE0v5".
[39250] signal 4: Illegal instruction: 4
in expression starting at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/DynamicExpressions.jl:120
_eval_tree_array at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/Evaluate.jl:92 [inlined]
#eval_tree_array#2 at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:306
eval_tree_array at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301
#test_all_combinations#1 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:7
test_all_combinations at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:22 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:168 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:153 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:138
do_precompilation at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:161
unknown function (ip: 0x11570c053)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
do_call at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_stmt_value at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:174
eval_body at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_eval_module_expr at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:743
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:952 [inlined]
ijl_toplevel_eval_in at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
_include at ./loading.jl:2603
include at ./Base.jl:558 [inlined]
include_package_for_output at ./loading.jl:2721
jfptr_include_package_for_output_69600.1 at /Users/mcranmer/.julia/juliaup/julia-1.11.0-rc1+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
do_call at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_stmt_value at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:174
eval_body at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:952 [inlined]
ijl_toplevel_eval_in at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
include_string at ./loading.jl:2553 [inlined]
exec_options at ./client.jl:316
_start at ./client.jl:526
jfptr__start_71098.1 at /Users/mcranmer/.julia/juliaup/julia-1.11.0-rc1+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
true_main at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/jlapi.c:1059
Allocations: 84690299 (Pool: 84689426; Big: 873); GC: 4

versioninfo:

julia> versioninfo()
Julia Version 1.11.0-rc1
Commit 3a35aec36d1 (2024-06-25 10:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 6 default, 0 interactive, 3 GC (on 6 virtual cores)
Environment:
  JULIA_FORMATTER_SO = /Users/mcranmer/julia_formatter.so
  JULIA_NUM_THREADS = auto
  JULIA_OPTIMIZE = 3
  JULIA_EDITOR = code

I installed Julia with juliaup. To reproduce this issue, you can run the following code:

cd $(mktemp -d)
# Install package
julia +1.11 --startup-file=no --project=. -e 'using Pkg; pkg"add Preferences DynamicExpressions [email protected]"'
# Enable DispatchDoctor.jl
julia +1.11 --startup-file=no --project=. -e 'using Preferences; set_preferences!("DynamicExpressions", "instability_check" => "warn")'
# Precompile:
julia +1.11 --startup-file=no --project=. -e 'using Pkg; pkg"precompile"'

I can prevent this error with the following PR on DispatchDoctor.jl: MilesCranmer/DispatchDoctor.jl@094b165~...b223a4d. The PR basically amounts to changing some functions into @generated form:

- map_specializing_typeof(args...) = map(specializing_typeof, args)
+ map_specializing_typeof(args::Tuple) = map(specializing_typeof, args)
  
- _promote_op(f, S::Type...) = Base.promote_op(f, S...)
- _promote_op(f, S::Tuple) = _promote_op(f, S...)
+ function _promote_op(f, S::Vararg{Type})
+     if @generated
+         :(Base.promote_op(f, S...))
+     else
+         Base.promote_op(f, S...)
+     end
+ end

However, it doesn't seem like DispatchDoctor.jl or DynamicExpressions.jl is doing anything wrong, so I'm not sure what's going on. Both before and after seem to be valid Julia code. Also, the downside of that PR is it introduces a type instability in Zygote autodiff, and there doesn't seem to be a way around it that both prevents the segfault while also eliminating the type instability.

I don't understand the conditions for reproducing this, so this is so far my only example. When I make various tweaks to _promote_op within DispatchDoctor.jl, I seem to end up with different segfaults – one of which is the Unreachable reached bug.

cc @avik-pal

@giordano
Copy link
Contributor

I don't see any segmentation fault in the error you shared.

@MilesCranmer MilesCranmer changed the title New precompilation segfaults on Julia 1.11-rc1 New precompilation errors on Julia 1.11-rc1 Jul 16, 2024
@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jul 16, 2024

Wasn't sure what the "Illegal instruction" is. Updated description to just be "Error".

@MilesCranmer MilesCranmer changed the title New precompilation errors on Julia 1.11-rc1 New precompilation crashes on Julia 1.11-rc1 Jul 16, 2024
@giordano
Copy link
Contributor

It means that the processor was asked to execute instructions it doesn't support ("illegal"). Think, for example, of trying to execute avx512 instructions on on a avx/avx2 processor (maybe because you compiled the program on a different machine, with a larger instructions set than the current one): it'd have no clue of what you're talking about.

@MilesCranmer
Copy link
Member Author

I see. So, guess it's a bug then?

I see it on macOS M1 and then also the GitHub actions with ubuntu-latest: https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:640. This one gets the Unreachable reached at 0x7fd831cd2d85 issue:

ERROR: The following 2 direct dependencies failed to precompile:

DynamicExpressions --code-coverage=@/home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none 

Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/home/runner/.julia/compiled/v1.11/DynamicExpressions/jl_0E9EhT".
Unreachable reached at 0x7fd831cd2d85

[4169] signal 4 (2): Illegal instruction
in expression starting at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/DynamicExpressions.jl:111
is_bad_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
##eval_tree_array_simulator#549#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:87 [inlined]
##eval_tree_array_simulator#549 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:66 [inlined]
#eval_tree_array#2 at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306 [inlined]
eval_tree_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
unknown function (ip: 0x7fd831d0322d)
#test_all_combinations#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:7
test_all_combinations at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:22 [inlined]
macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:169 [inlined]
macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:154 [inlined]
macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:139
do_precompilation at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:162
unknown function (ip: 0x7fd831ccfea2)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_eval_module_expr at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:743
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
_include at ./loading.jl:2603
include at ./Base.jl:558 [inlined]
include_package_for_output at ./loading.jl:2721
jfptr_include_package_for_output_69232.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
include_string at ./loading.jl:2553 [inlined]
exec_options at ./client.jl:316
_start at ./client.jl:526
jfptr__start_70709.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
true_main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7fd851629d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 84382407 (Pool: 84381692; Big: 715); GC: 32
SymbolicRegression --code-coverage=@/home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none 

Failed to precompile SymbolicRegression [8254be44-1295-4e6a-a16d-4[660](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:661)3ac705cb] to "/home/runner/.julia/compiled/v1.11/SymbolicRegression/jl_G4yS0y".
Unreachable reached at 0x7f2162ad2df5

[5165] signal 4 (2): Illegal instruction
in expression starting at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/DynamicExpressions.jl:111
is_bad_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
##eval_tree_array_simulator#549#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:87 [inlined]
##eval_tree_array_simulator#549 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:66 [inlined]
#eval_tree_array#2 at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306 [inlined]
eval_tree_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
unknown function (ip: 0x7f2162b0322d)
#test_all_combinations#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:7
test_all_combinations at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:22 [inlined]
macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:169 [inlined]
macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:154 [inlined]
macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:139
do_precompilation at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:162
unknown function (ip: 0x7f2162acfea2)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:[663](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:664)
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_eval_module_expr at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:743
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
_include at ./loading.jl:2603
include at ./Base.jl:558 [inlined]
include_package_for_output at ./loading.jl:2721
jfptr_include_package_for_output_69269.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
include_string at ./loading.jl:2553 [inlined]
exec_options at ./client.jl:316
_start at ./client.jl:526
jfptr__start_70709.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
true_main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7f2182429d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 82059736 (Pool: 82059067; Big: [669](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:670)); GC: 33
ERROR: LoadError: Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/home/runner/.julia/compiled/v1.11/DynamicExpressions/jl_YBf5sc".
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool; flags::Cmd, cacheflags::Base.CacheFlags, reasons::Dict{String, Int64})
    @ Base ./loading.jl:3002
  [3] (::Base.var"#1080#1081"{Base.PkgId})()
    @ Base ./loading.jl:2388
  [4] mkpidlock(f::Base.var"#1080#1081"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool})
    @ FileWatching.Pidfile /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:95
  [5] #mkpidlock#6
    @ /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:90 [inlined]
  [6] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64})
    @ FileWatching.Pidfile /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:116
  [7] #invokelatest#2
    @ ./essentials.jl:1045 [inlined]
  [8] invokelatest
    @ ./essentials.jl:1040 [inlined]
  [9] maybe_cachefile_lock(f::Base.var"#1080#1081"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
    @ Base ./loading.jl:3525
 [10] maybe_cachefile_lock
    @ ./loading.jl:3522 [inlined]
 [11] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:2384
 [12] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:2216
 [13] #invoke_in_world#3
    @ ./essentials.jl:1077 [inlined]
 [14] invoke_in_world
    @ ./essentials.jl:1074 [inlined]
 [15] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:2207
 [16] macro expansion
    @ ./loading.jl:2146 [inlined]
 [17] macro expansion
    @ ./lock.jl:273 [inlined]
 [18] __require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2103
 [19] #invoke_in_world#3
    @ ./essentials.jl:1077 [inlined]
 [20] invoke_in_world
    @ ./essentials.jl:1074 [inlined]
 [21] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2096
 [22] include
    @ ./Base.jl:558 [inlined]
 [23] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
    @ Base ./loading.jl:2721
 [24] top-level scope
    @ stdin:4
in expression starting at /home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl/src/SymbolicRegression.jl:1
in expression starting at stdin:4

@DilumAluthge DilumAluthge added the bug Indicates an unexpected problem or unintended behavior label Jul 16, 2024
@giordano
Copy link
Contributor

giordano commented Jul 16, 2024

Yes, it's definitely a bug, probably the compiler is emitting wrong instructions for the current ISA. There are a bunch of similar tickets: #53847, #53843, #53848, #53761, ...

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jul 16, 2024

Here's my git bisect log:

git bisect start
# status: waiting for both good and bad commits
# bad: [3a35aec36d13c3e651c97bac664da2e778d591ad] set VERSION to 1.11.0-rc1 (#54924)
git bisect bad 3a35aec36d13c3e651c97bac664da2e778d591ad
# status: waiting for good commit(s), bad commit known
# good: [48d4fd48430af58502699fdf3504b90589df3852] set VERSION to 1.10.4 (#54625)
git bisect good 48d4fd48430af58502699fdf3504b90589df3852
# good: [0ba6ec2d2282937a084d7e5e5a0b026dc953bb31] Restore link to list of packages in Base docs (#50353)
git bisect good 0ba6ec2d2282937a084d7e5e5a0b026dc953bb31
# skip: [e754f2036cbfc37ea24a33d02e86e41a9cf56af9] Add missing type annotation reported by JET (#52207)
git bisect skip e754f2036cbfc37ea24a33d02e86e41a9cf56af9
# skip: [959b474d0516df77a268d9f23ccda5d2ad32acdf] docs: update latest stable version (#52215)
git bisect skip 959b474d0516df77a268d9f23ccda5d2ad32acdf
# bad: [c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9] Fix variable name in scaling an `AbstractTriangular` with zero alpha (#52855)
git bisect bad c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9
# bad: [4115c725d25c19a86ce8d3e3a584f02d59a9a9ce] Create rand function for Base.KeySet and Base.ValueIterator{Dict} (#51608)
git bisect bad 4115c725d25c19a86ce8d3e3a584f02d59a9a9ce
# good: [8be469e275a455ca894fdc5fad8a80aafb359544] Separate foreign threads into a :foreign threadpool (#50912)
git bisect good 8be469e275a455ca894fdc5fad8a80aafb359544
# skip: [7d51502d7845246d6a231fdc4cf19451f42427e1] More missing constants from earlier libgit2 versions
git bisect skip 7d51502d7845246d6a231fdc4cf19451f42427e1
# skip: [4c3aaa2b34996708367f9d5e4472fb5a1062bf63] reflection: define `Base.generating_output` utility function (#51216)
git bisect skip 4c3aaa2b34996708367f9d5e4472fb5a1062bf63
# bad: [ca862df7bfc534d22d4d39d265d1f74d59c1ab77] fix `_tryonce_download_from_cache` (busybox.exe download error) (#51531)
git bisect bad ca862df7bfc534d22d4d39d265d1f74d59c1ab77
# skip: [5d82d8095042935be0eb044259098e0d7c695922] add tfuncs for `[and|or]_int` intrinsics (#51266)
git bisect skip 5d82d8095042935be0eb044259098e0d7c695922
# skip: [4e1c965b512967aaa20b77f37e1fe76548b1def7] Remove size(::StructuredMatrix, d) specializations (#51083)
git bisect skip 4e1c965b512967aaa20b77f37e1fe76548b1def7
# skip: [3fc4f6bb243cb623636f276cb143cf5c476bbc59] 🤖 [master] Bump the Downloads stdlib from f97c72f to 8a614d5 (#51246)
git bisect skip 3fc4f6bb243cb623636f276cb143cf5c476bbc59
# skip: [476572f749a035047d4d8e6e76ec5b701b85904e] makefile option to generate better code (#51105)
git bisect skip 476572f749a035047d4d8e6e76ec5b701b85904e
# skip: [8b3ffd8918e53d5241ad948e8500335848d3b602] cross-reference pathof and pkgdir in docstrings (#51298)
git bisect skip 8b3ffd8918e53d5241ad948e8500335848d3b602
# bad: [15f34aa649dbbb34e53ff6d16db15cd11ae4a887] [NFC] rng_split: some elaboration and clarification (#50680)
git bisect bad 15f34aa649dbbb34e53ff6d16db15cd11ae4a887
# skip: [f3d50b7de66b351dfdaa826fa529fefb75a829e1] Fix extended help hint to give full line to enter (#51193)
git bisect skip f3d50b7de66b351dfdaa826fa529fefb75a829e1
# skip: [dcb4060b58797cf64517a694fcab3ea16278cb87] docs: manual: point to `MutableArithmetics` in the Performance tips (#50987)
git bisect skip dcb4060b58797cf64517a694fcab3ea16278cb87
# skip: [70000ac7c3d5d5f21e42555cdf99e699a246f8ec] sysimg: Allow loading a system image that is already present in memory (#51121)
git bisect skip 70000ac7c3d5d5f21e42555cdf99e699a246f8ec
# skip: [7cadc6d70c0a3d2b2c20e50d4b3555475756f785] 🤖 [master] Bump the SHA stdlib from 2d1f84e to aaf2df6 (#51049)
git bisect skip 7cadc6d70c0a3d2b2c20e50d4b3555475756f785
# skip: [39a53168a824a9a223adc6642da31e7a26b6890a] optimize: fix `effect_free` refinement in post-opt dataflow analysis (#51185)
git bisect skip 39a53168a824a9a223adc6642da31e7a26b6890a
# skip: [dd0ce50f389981839d96969b279c3a11e0b4088e] Fix typo in command-line-interface.md (#51055)
git bisect skip dd0ce50f389981839d96969b279c3a11e0b4088e
# skip: [fbf73f44c000ee79d12b7bf1645f076b640fd10c] 🤖 [master] Bump the Pkg stdlib from 047734e4c to f570abd39 (#51186)
git bisect skip fbf73f44c000ee79d12b7bf1645f076b640fd10c
# skip: [b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8] Deprecate `permute!!` and `invpermute!!` (#51337)
git bisect skip b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8
# skip: [eab8d6b96b05f7e84103f66a902e4ee7ad395b48] Fix getfield codegen for tuple inputs and unknown symbol fields. (#51234)
git bisect skip eab8d6b96b05f7e84103f66a902e4ee7ad395b48
# skip: [a355403080167056d2af4ccee8eadfffd8fce97f] Annotate fieldnames for default cgparams [NFC]
git bisect skip a355403080167056d2af4ccee8eadfffd8fce97f
# skip: [354c36742eb1c2c4c5bfe454d6d4fe975565de96] Allow SparseArrays to catch `lu(::WrappedSparseMatrix)` (#51161)
git bisect skip 354c36742eb1c2c4c5bfe454d6d4fe975565de96
# bad: [e85f0a5a718f68e581b07eb60fd0d8203b0cd0da] complete false & true more generally as vals (#51326)
git bisect bad e85f0a5a718f68e581b07eb60fd0d8203b0cd0da
# skip: [91b8c9b99f05b99db8b259257adeb1997f8c4415] Add `JL_DLLIMPORT` to `small_typeof` declaration (#50892)
git bisect skip 91b8c9b99f05b99db8b259257adeb1997f8c4415
# skip: [27fa5de3f0e245cfff8c5cd1c850353742362cbf] Introduce cholesky and qr hooks for wrapped sparse matrices (#51220)
git bisect skip 27fa5de3f0e245cfff8c5cd1c850353742362cbf
# good: [74ce6cf070a2a04e836c3e5a2211228a3ac978ef] minor NFC in GC codebase (#50991)
git bisect good 74ce6cf070a2a04e836c3e5a2211228a3ac978ef
# skip: [3527213ccb1bfe0c48feab5da64d30cadbd4c526] simplify call to promote_eltype with repeated elements (#51135)
git bisect skip 3527213ccb1bfe0c48feab5da64d30cadbd4c526
# bad: [d51ad06f664b3439b4aee51b5cd5edd6b9d53c69] Avoid infinite loop when doing SIGTRAP in arm64-apple (#51284)
git bisect bad d51ad06f664b3439b4aee51b5cd5edd6b9d53c69

(Most of the skips are due to hanging precompilation.)

@MilesCranmer
Copy link
Member Author

Only 100 revisions left in the git bisect log but I unfortunately need to run. The bisect log is above if someone wants to start where I left off. (My computer seems pretty slow at compilation unfortunately)

@gbaraldi
Copy link
Member

Executing unreachable code triggers a trap which on most architectures just becomes a SIGILL so it might be this

@maleadt
Copy link
Member

maleadt commented Jul 17, 2024

It's also a good idea to test an assertions build first.

@MilesCranmer
Copy link
Member Author

Just tested assertions build; no extra info.

Is there a way I can see what Julia CI builds are successful for each commit? It seems like I have to skip a lot of commits which is making this bisecting take much longer than expected

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jul 19, 2024

I see a commit

Avoid infinite loop when doing SIGTRAP in arm64-apple

It seems like all of the commits before that I hit infinite precompilation... So not sure I will be able to bisect this further.

@vchuravy
Copy link
Member

It's really obnoxious, but you can cherry pick that commit onto previous commits in your script

@MilesCranmer
Copy link
Member Author

I just realised the SIGTRAP was what you said the error was likely coming from. So I'll try to do another bisection, treating the infinite precompilation == bad. And if that doesn't work, I'll do the cherry pick stuff.

@MilesCranmer
Copy link
Member Author

Ok, found it! #51000 is the cause.

@MilesCranmer
Copy link
Member Author

More clues:

  • I see a related issue on x86_64-linux-gnu. However, rather than a crash, the precompilation hangs.
  • The following commit solves this issue within the package in question: MilesCranmer/DispatchDoctor.jl@4ed36ca.
  • Using -O0 --compile=min does not seem to change the result compared to -O3.

@MilesCranmer
Copy link
Member Author

Can we add this to the 1.11 milestone?

@gbaraldi gbaraldi added this to the 1.11 milestone Jul 30, 2024
@gbaraldi
Copy link
Member

I redid the bisect because the PR you mentioned looked harmless and I think wouldn't cause an unreachable reached error but a GC errror.
Mine ended in 231ca24

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jul 30, 2024

Weird. Maybe one of the commits was accidentally marked good/bad(?), since it is hard to know if the precompilation is truly hanging or not.

Does 231ca24 make sense as the cause to you?

@gbaraldi
Copy link
Member

Yep. I just tried on top of release-1.11 and reverting it does make it not crash

@MilesCranmer
Copy link
Member Author

Cool. So I guess it's the addition of @_terminates_locally_meta is somehow an incorrect compiler assumption?

@aviatesk
Copy link
Member

aviatesk commented Jul 31, 2024

Isn't this a bug in DispatchDoctor?
I investigated quite deeply, and there doesn't seem to be a bug in Julia base or the compiler side. @_terminates_locally_meta certainly allows concrete evaluation for DispatchDoctor._Utils._promote_op, but that function is legally eligible for concrete evaluation.
Looking at the implementation of DispatchDoctor, it seems to use reflection (especially Core.Compiler._return_type through Base.promote_op) within a generator even if the target function is @generated. This breaks the @generated assumption and could potentially cause undefined behavior, e.g. if a function definition is added later.

@MilesCranmer
Copy link
Member Author

even if the target function is generated

DispatchDoctor doesn’t operate on generated functions, see https://github.com/MilesCranmer/DispatchDoctor.jl?tab=readme-ov-file#-special-cases

@aviatesk
Copy link
Member

aviatesk commented Jul 31, 2024

Are we sure that not all generated functions are using _promote_op?
Looking at the DispatchDoctor implementation, I have confirmed that DispatchDoctor does not transform functions that directly use @generated, but IIUC it still allows @generated function to use other @stable functions transformed by DD within the generator, which effectively uses Core.Compiler._return_type within the generator.

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jul 31, 2024

@KristofferC I mean of course I agree with the general sentiment but some of the code it's crashing while precompiling involve in(::Any, ::Tuple) calls so it's at least plausible.

@MilesCranmer
Copy link
Member Author

@aviatesk I still see the bug. Can you provide details on your system? And to confirm, you saw the Illegal instruction, rather than a different bug? Was it the same stacktrace?

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

Yes, I am seeing exactly the same segfault, but the patch I applied yesterday wrongly seems to have eliminated the segfault due to the order of precompilation (specifically, changes in Preferences significantly alter DD's behavior, but preference changes do not necessarily trigger re-precompilation). If I clear the precompilation cache manually, the same error occurs even with both patches applied.

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

Initially, I thought it was caused by the impureness of the @generated function, but it seems that is not the case because the issue still persists even if we make DD-functionality unreachable from any of the generators.
The problem is probably very subtle and due to the instability of Core.Compiler.return_type itself. Specifically, my PR enables the concrete evaluation of promote_op, giving inference a more chance to use and cache of Core.Compiler.return_type, which I believe is causing the issue (though I'm not entirely sure how it ends up surfacing as the segfault). I have confirmed that applying @invokelatest to promote_op to stop inference, or using Base.infer_return_type, can avoid this issue.
The use of Base.promote_op (and internally Core.Compiler.return_type) should be avoided in the first place (as stated in the documentation of Base.promote_op, xref), especially in packages like DD that change its behavior based on type inference results. Therefore, the best solution might be for DD to stop using Base.promote_op and switch to using Base.infer_return_type.

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

This particular issue may not be related to @generated functions (though it is certainly a good idea to avoid calling type inference reflection from generators).
The problem is likely the use of Base.promote_op itself, and applying the following patch can suppress the segfault while maintaining DD's behavior:

diff --git a/src/stabilization.jl b/src/stabilization.jl
index fdfb60f..310fe7d 100644
--- a/src/stabilization.jl
+++ b/src/stabilization.jl
@@ -115,7 +115,7 @@ function _stabilize_all(ex::Expr, downward_metadata::DownwardMetadata; kws...)
                 @assert length(upward_metadata.unused_macros) == length(upward_metadata.macro_keys)
 
                 new_ex = Expr(:macrocall, upward_metadata.unused_macros[end]..., inner_ex)
-                new_upward_metadata = UpwardMetadata(; 
+                new_upward_metadata = UpwardMetadata(;
                     matching_function = upward_metadata.matching_function,
                     unused_macros = upward_metadata.unused_macros[1:end-1],
                     macro_keys = upward_metadata.macro_keys[1:end-1],
diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..66fb235 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -97,7 +97,7 @@ specializing_typeof(::Type{T}) where {T} = Type{T}
 specializing_typeof(::Val{T}) where {T} = Val{T}
 map_specializing_typeof(args...) = map(specializing_typeof, args)
 
-_promote_op(f, S::Type...) = Base.promote_op(f, S...)
+_promote_op(f, S::Type...) = Base.infer_return_type(f, S)
 _promote_op(f, S::Tuple) = _promote_op(f, S...)
 @static if isdefined(Core, :kwcall)
     function _promote_op(
@@ -120,7 +120,7 @@ return false for `Union{}`, so that errors can propagate.
 # so we implement a workaround.
 @inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)
 
-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
     ::Type{T}, ::Val{union_limit}
 ) where {T,union_limit}
     if T isa UnionAll

While we ideally need to improve the reliability of Core.Compiler.return_type, it is a rather complex and difficult problem (so it still remains to be buggy). In the meantime, it would be better to avoid its use on the DD side to circumvent the issue.

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Aug 1, 2024

Thanks for all of this, I appreciate it.

So, looking into this alternative, one issue with Base.infer_return_type is that it is not known at compile time, meaning that LLVM can no longer strip the instability check for stable functions:

julia> using DispatchDoctor  # With the patch in https://github.com/JuliaLang/julia/issues/55147#issuecomment-2262472199

julia> @stable f(x) = x
f (generic function with 1 method)

julia> @code_llvm f(1)
; Function Signature: f(Int64)
;  @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301 within `f`
define i64 @julia_f_6223(i64 signext %"x::Int64") #0 {
top:
  %jlcallframe1 = alloca [6 x ptr], align 8
  %gcframe2 = alloca [8 x ptr], align 16
  call void @llvm.memset.p0.i64(ptr align 16 %gcframe2, i8 0, i64 64, i1 true)
  %0 = getelementptr inbounds ptr, ptr %gcframe2, i64 2
  %"new::OptimizationParams" = alloca { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, align 8
  %1 = alloca { i64, { ptr, [1 x i64] }, ptr, { i64, i64, i64, i64, i64, i8, i8, i8, i8, i8 }, { i8, i64, i64, i64, i64, i64, i8, i8, i8 } }, align 8
  %pgcstack = call ptr inttoptr (i64 6490275500 to ptr)(i64 261) #5
  store i64 24, ptr %gcframe2, align 16
  %task.gcstack = load ptr, ptr %pgcstack, align 8
  %frame.prev = getelementptr inbounds ptr, ptr %gcframe2, i64 1
  store ptr %task.gcstack, ptr %frame.prev, align 8
  store ptr %gcframe2, ptr %pgcstack, align 8
; ┌ @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/utils.jl:101 within `_promote_op` @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/utils.jl:100
; │┌ @ reflection.jl:1872 within `infer_return_type`
; ││┌ @ reflection.jl:2586 within `get_world_counter`
     %2 = call i64 @jlplt_ijl_get_world_counter_6226_got.jit()
; ││└
; ││┌ @ compiler/types.jl:373 within `NativeInterpreter`
; │││┌ @ compiler/types.jl:321 within `OptimizationParams`
; ││││┌ @ compiler/utilities.jl:506 within `inlining_enabled`
; │││││┌ @ options.jl:68 within `JLOptions`
; ││││││┌ @ pointer.jl:153 within `unsafe_load` @ pointer.jl:153
         %pointerref.sroa.1.0.copyload = load i8, ptr getelementptr inbounds (i8, ptr @jl_options.found.jit, i64 110), align 2
; │││││└└
; │││││┌ @ promotion.jl:483 within `==` @ promotion.jl:639
        %3 = icmp eq i8 %pointerref.sroa.1.0.copyload, 1
; ││││└└
; ││││ @ compiler/types.jl:321 within `OptimizationParams` @ compiler/types.jl:321
; ││││┌ @ compiler/types.jl:341 within `#OptimizationParams#314`
; │││││┌ @ compiler/types.jl:309 within `OptimizationParams`
        %4 = zext i1 %3 to i8
        store i8 %4, ptr %"new::OptimizationParams", align 8
        %5 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 1
        store <2 x i64> <i64 100, i64 1000>, ptr %5, align 8
        %6 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 3
        store <2 x i64> <i64 250, i64 20>, ptr %6, align 8
        %7 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 5
        store i64 32, ptr %7, align 8
        %8 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 6
        store i8 1, ptr %8, align 8
        %9 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 7
        store i8 0, ptr %9, align 1
        %10 = getelementptr inbounds { i8, i64, i64, i64, i64, i64, i8, i8, i8 }, ptr %"new::OptimizationParams", i64 0, i32 8
        store i8 0, ptr %10, align 2
; │││└└└
     call void @"j_#NativeInterpreter#315_6236"(ptr noalias nocapture noundef nonnull sret({ i64, { ptr, [1 x i64] }, ptr, { i64, i64, i64, i64, i64, i8, i8, i8, i8, i8 }, { i8, i64, i64, i64, i64, i64, i8, i8, i8 } }) %1, ptr noalias nocapture noundef nonnull %0, ptr nocapture nonnull readonly @"_j_const#10", ptr nocapture nonnull readonly %"new::OptimizationParams", i64 zeroext %2)
; ││└
    %11 = call nonnull ptr @"j_#infer_return_type#34_6238"(i64 zeroext %2, ptr nocapture nonnull readonly %1, ptr nonnull readonly @"jl_global#6239.jit", ptr nonnull readonly @"jl_global#6240.jit")
; └└
; ... (truncated)

Whereas with the current unpatched use of Base.promote_op, the instability check can be removed by the compiler:

julia> @code_llvm f(1)
;  @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301 within `f`
define i64 @julia_f_1198(i64 signext %0) #0 {
top:
;  @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:306 within `f`
  ret i64 %0
}

Ideally I would like for DispatchDoctor to use whatever list comprehensions and map(f, x) use for type inference, as those give types known at compile time. My assumption was that this is what Base.promote_op was used for. Do those methods use something else?

Also, let me know if I can do anything to help with debugging Core.Compiler.return_type.

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

Ideally I would like for DispatchDoctor to use whatever list comprehensions and map(f, x) use for type inference, as those give types known at compile time. My assumption was that this is what Base.promote_op was used for.

Well, this is exactly why Core.Compiler.return_type is buggy and its use should be avoided if possible. Maybe reading through the documentation of promote_op or the discussion at #44340 may clarify it.
It is possible to debug Core.Compiler.return_type, but this is a very subtle issue, and debugging it is quite a laborious task. I think the problem might be due to the difference between the world that Core.Compiler.return_type sees and the actual runtime world, but I am not fully sure. This issue might only occur during precompilation.

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

List comprehensions and map reluctantly use Core.Compiler.return_type for cases where the iterator they return has 0 elements, but probably that use should be replaced with Base.infer_return_type (although this might not be possible from a performance perspective). Having said that they only change the type of the iterator they return based on type inference, so the impact of Core.Compiler.return_type unreliability is minimal. Unfortunately, in the case of DD, the behavior changes based on the results of type inference significantly, such as printing or raising exceptions, leading to the worst-case scenario of a segfault. When Core.Compiler.return_type returns different results, it may hit branches that should not be executed or vice versa.

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Aug 1, 2024

Here's an example of what I mean with map:

julia> function fake_promote_op(f::F, args...) where {F}
           x = map(_ -> f(args...), 1:0)
           return eltype(x)
       end

Note that this never actually calls f(args...). All it does is type inference, because the 1:0 is empty.

However, this is completely type stable, known at compile time, and is correct:

julia> fake_promote_op(+, 1.0)
Float64

julia> @code_typed fake_promote_op(+, 1.0)
CodeInfo(
1return Float64
) => Type{Float64}

julia> @code_llvm fake_promote_op(+, 1.0)
;  @ REPL[11]:1 within `fake_promote_op`
define nonnull {}* @julia_fake_promote_op_1386(double %0) #0 {
top:
;  @ REPL[11]:3 within `fake_promote_op`
  ret {}* inttoptr (i64 4823715024 to {}*)
}

Should I use this fake_promote_op instead of Base.promote_op?

@MilesCranmer
Copy link
Member Author

List comprehensions and map reluctantly use Core.Compiler.return_type for cases where the iterator they return has 0 elements, but probably that use should be replaced with Base.infer_return_type

If Core.Compiler.return_type is not safe, but map and list comprehensions use it for empty collections, then perhaps we should try to fix it within Julia?

in the case of DD, the behavior changes based on the results of type inference significantly, such as printing or raising exceptions, leading to the worst-case scenario of a segfault.

I think this is also true more broadly, rather than just DD. I would assume that empty collections generated by map are quite common patterns, no? If creating empty collections can cause segfaults it seems we should fix this in Base rather than just the DD library.

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Aug 1, 2024

It's even more prevalent because the length of a collection could be based on the value rather than the type, meaning that Core.Compiler.return_type always appears in a compiled branch of map and list comprehensions for Vector, right?

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

Please recognize that the risks inherent in list comprehensions and map are of an entirely different dimension from the risks associated with DD.
List comprehensions and map themselves do not cause dangerous events like segfaults. Such events can occur when a package makes some bad assumptions about the types returned from Core.Compiler.return_type (including types of the returned values of these basic functions), and builds its codebase on those assumptions. DD is an extreme case where such assumptions are made, and it is causing actual segfaults, so it needs to be fixed. But it doesn't mean we should stop using map and list comprehension entirely since we can use them safely if we don't make such assumptions.

Just to be sure, I confirmed that the issue is not with any by applying the following patch to DD. The same segfault occurs with DD using the patch below, indicating that the problem lies in DD's use of Core.Compiler.return_type, and the relevant Julia base commit only brought it to light:

diff --git a/src/stabilization.jl b/src/stabilization.jl
index fdfb60f..310fe7d 100644
--- a/src/stabilization.jl
+++ b/src/stabilization.jl
@@ -115,7 +115,7 @@ function _stabilize_all(ex::Expr, downward_metadata::DownwardMetadata; kws...)
                 @assert length(upward_metadata.unused_macros) == length(upward_metadata.macro_keys)
 
                 new_ex = Expr(:macrocall, upward_metadata.unused_macros[end]..., inner_ex)
-                new_upward_metadata = UpwardMetadata(; 
+                new_upward_metadata = UpwardMetadata(;
                     matching_function = upward_metadata.matching_function,
                     unused_macros = upward_metadata.unused_macros[1:end-1],
                     macro_keys = upward_metadata.macro_keys[1:end-1],
diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..19d2b7b 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -97,7 +97,35 @@ specializing_typeof(::Type{T}) where {T} = Type{T}
 specializing_typeof(::Val{T}) where {T} = Val{T}
 map_specializing_typeof(args...) = map(specializing_typeof, args)
 
-_promote_op(f, S::Type...) = Base.promote_op(f, S...)
+# `promote_op` without any usage of `any`
+let ex = Expr(:block)
+    function gen_case_n(n)
+        blk = Expr(:block)
+        ret = :(Tuple{})
+        for i = 1:n
+            push!(blk.args, :(args[$i] === Union{} && return Union{}))
+            push!(ret.args, :(args[$i]))
+        end
+        return :(if length(args) == $n
+            $blk
+            return $ret
+        end)
+    end
+    for i = 0:15
+        push!(ex.args, gen_case_n(i))
+    end
+    @eval function TupleOrBottom_without_any(args...)
+        $ex
+    end
+end
+
+function promote_op_without_any(f, S::Type...)
+    argT = TupleOrBottom_without_any(S...)
+    argT === Union{} && return Union{}
+    return Core.Compiler.return_type(f, argT)
+end
+
+_promote_op(f, S::Type...) = promote_op_without_any(f, S...)
 _promote_op(f, S::Tuple) = _promote_op(f, S...)
 @static if isdefined(Core, :kwcall)
     function _promote_op(
@@ -120,7 +148,7 @@ return false for `Union{}`, so that errors can propagate.
 # so we implement a workaround.
 @inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)
 
-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
     ::Type{T}, ::Val{union_limit}
 ) where {T,union_limit}
     if T isa UnionAll

@MilesCranmer
Copy link
Member Author

But it doesn't mean we should stop using map and list comprehension entirely since we can use them safely if we don't make such assumptions.

What assumptions do you mean? Is it that DD dispatches on the return value of Base.promote_op? Isn't dispatching on the return type of a map or list comprehension the exact same thing?

For example, will the following code potentially lead to a segfault?

g() = [i for i in 1:0]
h() = g() isa Vector{Int}
@generated function foo()
    if h()
        return :(1)
    else
        return :(2)
    end
end

@aviatesk
Copy link
Member

aviatesk commented Aug 1, 2024

I don't know the exact mechanism causing the segfault in DD.
Your case is one example of making a bad assumption. This code itself may not cause a segfault, but since the value returned by foo is not defined, it could potentially lead to some dangerous events depending on its use case.

@MilesCranmer
Copy link
Member Author

Consider that I could just rewrite DispatchDoctor to use

x = []
T = eltype(map(_ -> f(args...), x))
# equivalent to: T = Base.promote_op(f, typeof.(args)...)

I'm not doing anything fancy here, this type of code could be found in anybody's library. No Julia internals. Yet, this code implicitly is using Base.promote_op.

This type of code is even used within the Julia standard library:

T = promote_type(map(x -> eltype(x.second), kv)...)

or

https://github.com/JuliaSparse/SparseArrays.jl/blob/e61663ad0a79a48906b0b12d53506e731a614ab8/src/sparsematrix.jl#L4030

or

https://github.com/JuliaSparse/SparseArrays.jl/blob/e61663ad0a79a48906b0b12d53506e731a614ab8/src/sparsematrix.jl#L4205

If it is a bad assumption that any use of map or list comprehension should not be used by any code referenced @generated functions, I don't think this is well known? Basically I think Core.Compiler.return_type should be fixed.

Not to mention there are 1,400+ files which make explicit uses of that symbol on GitHub: https://github.com/search?q=/Core.Compiler.return_type/+language:julia&type=code. I've already added a workaround from the main issue on 1.11.0 but it seems like it could be causing the other similar issues like #53847, #53843, #53848, #53761, ... which might be due to a similar phenomena. This is the certainly one we've drilled into the reasons behind, but maybe the others are caused by the same Julia bug.

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Aug 1, 2024

I guess this goes back to your earlier comment:

List comprehensions and map reluctantly use Core.Compiler.return_type for cases where the iterator they return has 0 elements, but probably that use should be replaced with Base.infer_return_type (although this might not be possible from a performance perspective). Having said that they only change the type of the iterator they return based on type inference, so the impact of Core.Compiler.return_type unreliability is minimal. Unfortunately, in the case of DD, the behavior changes based on the results of type inference significantly, such as printing or raising exceptions, leading to the worst-case scenario of a segfault. When Core.Compiler.return_type returns different results, it may hit branches that should not be executed or vice versa.

I think "unreliability is minimal" is pulling a lot of weight here. I think any unreliability should be fixed at the source, rather than shooting the messenger. The comment

behavior changes based on the results of type inference significantly, such as printing or raising exceptions, leading to the worst-case scenario of a segfault

DD's usage feels like it's actually very minimal changes in behavior, no? DD is just turning on/off logging from the result of the return type inference. The usage of Base.promote_op won't actually change the output of a calculation, for example. All bugs flagged by DD are loud and easily detected. If anything, DD is probably minimally affected by bugs in Core.Compiler.return_type – it just calls it frequently enough so that major issues like segfaults pop up more frequently.

It's almost like the phenomena caused by DD here is an "rr chaos mode" of the type inference system. If a bug shows up in rr chaos mode, it's still a bug. Similarly, if a segfault shows up from DD's usage of promote_op, it's still a segfault that should be patched in Base.

@vtjnash
Copy link
Member

vtjnash commented Aug 1, 2024

Looking closer into this, it appears this segfaults because the DynamicExpressions module defined the return type of the simulator of any to depend on its own Core.Compiler.return_type result, which is non-computable by inference, but inference attempts to infer it, and we have too many try/catch statements inside the Compiler so it quickly loses track of what went wrong after this hits a StackOverflow and starts to instead infer nonsense for promote_op

Tuple{DynamicExpressions.NodeModule.var"###any_simulator#430", DynamicExpressions.NodeModule.var"#75#77"{DynamicExpressions.NodeUtilsModule.var"#13#15"}, DynamicExpressions.NodeModule.Node{Float32}}

I haven't been able to make an MWE that simulates this however

vtjnash added a commit that referenced this issue Aug 1, 2024
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147
aviatesk pushed a commit that referenced this issue Aug 2, 2024
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147
@MilesCranmer
Copy link
Member Author

Very interesting, thanks for the note.

Maybe a MWE of this inference stack overflow would be the following?

foo() = [foo() for _ in 1:0]

Or would this not replicate the issue? (On mobile so can’t check)

aviatesk pushed a commit that referenced this issue Aug 9, 2024
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147
aviatesk pushed a commit that referenced this issue Aug 9, 2024
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147
aviatesk added a commit that referenced this issue Aug 10, 2024
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147

@aviatesk You might know how to solve this even better, using
post-optimization effect refinements? Since we should actually only
apply the refinement of terminates=false => terminates=true (and thus
allowing concrete eval) if the optimization occurs, and not just in
inference thinks the optimization would be legal.

---------

Co-authored-by: Shuhei Kadowaki <[email protected]>
lazarusA pushed a commit to lazarusA/julia that referenced this issue Aug 17, 2024
…#55338)

In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix JuliaLang#55147

@aviatesk You might know how to solve this even better, using
post-optimization effect refinements? Since we should actually only
apply the refinement of terminates=false => terminates=true (and thus
allowing concrete eval) if the optimization occurs, and not just in
inference thinks the optimization would be legal.

---------

Co-authored-by: Shuhei Kadowaki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants