Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Fix underflow in call relocation handling #5951

Merged
merged 1 commit into from
Mar 10, 2023

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Mar 7, 2023

👋 Hey,

Under some test case layouts the call relocation was panicking with an underflow. Use wrapping_sub to signal that this is expected.

The fuzzer took a while to generate such a test case. And I can't introduce it as a regression test because when running via the regular clif-util run tests the layout is different and the test case passes!

I think this is because in the fuzzer we only add one trampoline, while in clif-util we build trampolines for each function in the file.

@afonso360 afonso360 requested a review from jameysharp March 7, 2023 12:31
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 7, 2023
@bjorn3
Copy link
Contributor

bjorn3 commented Mar 7, 2023

Are you sure this is correct and not a case where two functions end up more than the max relocation distance away by chance? We don't handle that case correctly right now. See #4000.

@afonso360
Copy link
Contributor Author

afonso360 commented Mar 7, 2023

I don't think so, I gave this a pretty good run in the fuzzer (~24h) with these changes and it stopped complaining. It also fixed the same bug that was previously reported by the fuzzer.

Also, could that happen with 4-5 functions in the test case? I'll try to get it again, but I think that was how many there were.

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 7, 2023

Also, could that happen with 4-5 functions in the test case? I'll try to get it again, but I think that was how many there were.

If those functions don't fit in a single page and you are very unlucky, yes it can happen.

@jameysharp
Copy link
Contributor

@elliottt and I just spent a while puzzling over this. We believe that something like this PR is necessary for correctness, and that switching to wrapping_sub is the only correct solution. But we really had to think about it.

The point of this calculation is that hi20 should end up being close enough to pcrel that a signed 12-bit offset can hold the difference. (That it's signed wasn't clear from either the ABI doc or this code, and it'd be nice to have a comment here saying so.) This explains why subtracting one from the other underflows sometimes: the result is actually expected to be negative sometimes. In fact it should happen for exactly half the possible values of pcrel.

Trevor had a suggestion I really liked: Do all the intermediate arithmetic for pcrel on i32, and only convert to u32 at the end, when patching the instructions. This felt right intuitively, since lo12 is meant to be interpreted as signed. Unfortunately, if we do that without also changing to wrapping_sub like you've done here, it's still possible to overflow the subtraction when pcrel is greater than i32::MAX-0x800.

Since wrapping_sub is required either way, I guess we might as well stick with unsigned arithmetic, and merge this PR as-is.

One thing that confused us, if you want to add a comment into this PR: Unlike the ABI documentation linked in this code, hi20 isn't right-shifted 12 bits. I believe that's because you would otherwise have to left-shift it again to place it in auipc's immediate field, right?

I also thought about defining lo12 with ... << 20 instead of ... & 0xFFF, so both values are aligned in the appropriate immediate-operand fields. (The bit-masking isn't strictly necessary since the masked-out bits get shifted out anyway.) But I think that is less clear than the way you have it now.

@afonso360
Copy link
Contributor Author

afonso360 commented Mar 8, 2023

I've left the fuzzer running since yesterday on riscv64 (took about 18hours!) to try and find this again since I lost the original case.

Testcase
;; Run test case

test interpret
test run
set enable_alias_analysis=false
set use_egraphs=false
set enable_simd=true
set enable_safepoints=true
set enable_llvm_abi_extensions=true
set unwind_info=false
set machine_code_cfg_info=true
set enable_jump_tables=false
set enable_heap_access_spectre_mitigation=false
set enable_table_access_spectre_mitigation=false
target riscv64gc

function %d() system_v {
    ss0 = explicit_slot 126
    ss1 = explicit_slot 126
    ss2 = explicit_slot 0
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = colocated %CeilF64 sig1
    fn2 = colocated %FloorF32 sig2
    fn3 = colocated %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = colocated %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    stack_store v4, ss0
    stack_store v4, ss0+16
    stack_store v4, ss0+32
    stack_store v4, ss0+48
    stack_store v4, ss0+64
    stack_store v4, ss0+80
    stack_store v4, ss0+96
    stack_store v3, ss0+112  ; v3 = 0
    stack_store v2, ss0+120  ; v2 = 0
    stack_store v1, ss0+124  ; v1 = 0
    stack_store v4, ss1
    stack_store v4, ss1+16
    stack_store v4, ss1+32
    stack_store v4, ss1+48
    stack_store v4, ss1+64
    stack_store v4, ss1+80
    stack_store v4, ss1+96
    stack_store v3, ss1+112  ; v3 = 0
    stack_store v2, ss1+120  ; v2 = 0
    stack_store v1, ss1+124  ; v1 = 0
    return
}


function %c() system_v {
    sig0 = () system_v
    sig1 = (f32) -> f32 system_v
    sig2 = (f64) -> f64 system_v
    sig3 = (f32) -> f32 system_v
    sig4 = (f64) -> f64 system_v
    sig5 = (f32) -> f32 system_v
    sig6 = (f64) -> f64 system_v
    fn0 = %d sig0
    fn1 = %CeilF32 sig1
    fn2 = %CeilF64 sig2
    fn3 = %FloorF32 sig3
    fn4 = %FloorF64 sig4
    fn5 = %TruncF32 sig5
    fn6 = %TruncF64 sig6

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    return
}


function %b(i32 sext, i8 sext, i8 sext, i128) system_v {
    sig0 = () system_v
    sig1 = () system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    sig6 = (f32) -> f32 system_v
    sig7 = (f64) -> f64 system_v
    fn0 = %d sig0
    fn1 = %c sig1
    fn2 = colocated %CeilF32 sig2
    fn3 = %CeilF64 sig3
    fn4 = %FloorF32 sig4
    fn5 = %FloorF64 sig5
    fn6 = %TruncF32 sig6
    fn7 = %TruncF64 sig7

block0(v0: i32, v1: i8, v2: i8, v3: i128):
    v4 = iconst.i8 0
    v5 = iconst.i16 0
    v6 = iconst.i32 0
    v7 = iconst.i64 0
    v8 = uextend.i128 v7  ; v7 = 0
    return
}


function %a(i64 sext, f32, i32 uext, i16 sext, f32, i16 sext, i64 uext, f64, i128 sext, i8 sext) -> f64, i128 sext, i8 sext, i8 sext, i16 sext, i64 sext, f64, i32 sext, i64 sext, i64 sext, i64 sext, i64 sext system_v {
    ss0 = explicit_slot 26
    ss1 = explicit_slot 26
    sig0 = () system_v
    sig1 = () system_v
    sig2 = (i32 sext, i8 sext, i8 sext, i128) system_v
    sig3 = (f32) -> f32 system_v
    sig4 = (f64) -> f64 system_v
    sig5 = (f32) -> f32 system_v
    sig6 = (f64) -> f64 system_v
    sig7 = (f32) -> f32 system_v
    sig8 = (f64) -> f64 system_v
    fn0 = colocated %d sig0
    fn1 = colocated %c sig1
    fn2 = colocated %b sig2
    fn3 = colocated %CeilF32 sig3
    fn4 = colocated %CeilF64 sig4
    fn5 = colocated %FloorF32 sig5
    fn6 = colocated %FloorF64 sig6
    fn7 = colocated %TruncF32 sig7
    fn8 = colocated %TruncF64 sig8

block0(v0: i64, v1: f32, v2: i32, v3: i16, v4: f32, v5: i16, v6: i64, v7: f64, v8: i128, v9: i8):
    v10 = iconst.i16 0xffff_ffff_ffff_9b9b
    v11 = iconst.i16 0xffff_ffff_ffff_9b9b
    v12 = iconst.i8 0
    v13 = iconst.i16 0
    v14 = iconst.i32 0
    v15 = iconst.i64 0
    v16 = uextend.i128 v15  ; v15 = 0
    stack_store v16, ss0
    stack_store v15, ss0+16  ; v15 = 0
    stack_store v13, ss0+24  ; v13 = 0
    stack_store v16, ss1
    stack_store v15, ss1+16  ; v15 = 0
    stack_store v13, ss1+24  ; v13 = 0
    v46 = fcmp ne v4, v4
    v47 = f32const -0x1.000000p0
    v48 = f32const 0x1.000000p32
    v49 = fcmp le v4, v47  ; v47 = -0x1.000000p0
    v50 = fcmp ge v4, v48  ; v48 = 0x1.000000p32
    v51 = bor v49, v50
    v52 = bor v46, v51
    v53 = f32const 0x1.000000p0
    v54 = select v52, v53, v4  ; v53 = 0x1.000000p0
    v17 = fcvt_to_uint.i32 v54
    v42 = iconst.i16 0
    v43 = iconst.i16 1
    v44 = icmp eq v3, v42  ; v42 = 0
    v45 = select v44, v43, v3  ; v43 = 1
    v18 = urem v11, v45  ; v11 = 0xffff_ffff_ffff_9b9b
    v19 = bxor v5, v18
    v20 = select_spectre_guard v9, v17, v17
    v55 = fcmp ne v7, v7
    v56 = f64const -0x1.0000000000000p0
    v57 = f64const 0x1.0000000000000p32
    v58 = fcmp le v7, v56  ; v56 = -0x1.0000000000000p0
    v59 = fcmp ge v7, v57  ; v57 = 0x1.0000000000000p32
    v60 = bor v58, v59
    v61 = bor v55, v60
    v62 = f64const 0x1.0000000000000p0
    v63 = select v61, v62, v7  ; v62 = 0x1.0000000000000p0
    v21 = fcvt_to_uint.i32 v63
    v64 = fcmp ne v7, v7
    v65 = f64const -0x1.0000000000000p0
    v66 = f64const 0x1.0000000000000p32
    v67 = fcmp le v7, v65  ; v65 = -0x1.0000000000000p0
    v68 = fcmp ge v7, v66  ; v66 = 0x1.0000000000000p32
    v69 = bor v67, v68
    v70 = bor v64, v69
    v71 = f64const 0x1.0000000000000p0
    v72 = select v70, v71, v7  ; v71 = 0x1.0000000000000p0
    v22 = fcvt_to_uint.i32 v72
    v73 = fcmp ne v7, v7
    v74 = f64const -0x1.0000000000000p0
    v75 = f64const 0x1.0000000000000p32
    v76 = fcmp le v7, v74  ; v74 = -0x1.0000000000000p0
    v77 = fcmp ge v7, v75  ; v75 = 0x1.0000000000000p32
    v78 = bor v76, v77
    v79 = bor v73, v78
    v80 = f64const 0x1.0000000000000p0
    v81 = select v79, v80, v7  ; v80 = 0x1.0000000000000p0
    v23 = fcvt_to_uint.i32 v81
    v24 = bxor v4, v4
    v82 = fcmp ne v7, v7
    v83 = f64const -0x1.0000000000000p0
    v84 = f64const 0x1.0000000000000p32
    v85 = fcmp le v7, v83  ; v83 = -0x1.0000000000000p0
    v86 = fcmp ge v7, v84  ; v84 = 0x1.0000000000000p32
    v87 = bor v85, v86
    v88 = bor v82, v87
    v89 = f64const 0x1.0000000000000p0
    v90 = select v88, v89, v7  ; v89 = 0x1.0000000000000p0
    v25 = fcvt_to_uint.i32 v90
    v91 = fcmp ne v7, v7
    v92 = f64const -0x1.0000000000000p0
    v93 = f64const 0x1.0000000000000p32
    v94 = fcmp le v7, v92  ; v92 = -0x1.0000000000000p0
    v95 = fcmp ge v7, v93  ; v93 = 0x1.0000000000000p32
    v96 = bor v94, v95
    v97 = bor v91, v96
    v98 = f64const 0x1.0000000000000p0
    v99 = select v97, v98, v7  ; v98 = 0x1.0000000000000p0
    v26 = fcvt_to_uint.i32 v99
    v27 = stack_addr.i64 ss1+20
    v28 = load.i16 notrap v27
    v29 = iadd v8, v8
    v30 = iadd v29, v29
    call fn0()
    v100 = fcmp ne v24, v24
    v101 = f32const -0x1.000000p0
    v102 = f32const 0x1.000000p32
    v103 = fcmp le v24, v101  ; v101 = -0x1.000000p0
    v104 = fcmp ge v24, v102  ; v102 = 0x1.000000p32
    v105 = bor v103, v104
    v106 = bor v100, v105
    v107 = f32const 0x1.000000p0
    v108 = select v106, v107, v24  ; v107 = 0x1.000000p0
    v31 = fcvt_to_uint.i32 v108
    v109 = fcmp ne v24, v24
    v110 = f32const -0x1.000000p0
    v111 = f32const 0x1.000000p32
    v112 = fcmp le v24, v110  ; v110 = -0x1.000000p0
    v113 = fcmp ge v24, v111  ; v111 = 0x1.000000p32
    v114 = bor v112, v113
    v115 = bor v109, v114
    v116 = f32const 0x1.000000p0
    v117 = select v115, v116, v24  ; v116 = 0x1.000000p0
    v32 = fcvt_to_uint.i32 v117
    v118 = fcmp ne v24, v24
    v119 = f32const -0x1.000000p0
    v120 = f32const 0x1.000000p32
    v121 = fcmp le v24, v119  ; v119 = -0x1.000000p0
    v122 = fcmp ge v24, v120  ; v120 = 0x1.000000p32
    v123 = bor v121, v122
    v124 = bor v118, v123
    v125 = f32const 0x1.000000p0
    v126 = select v124, v125, v24  ; v125 = 0x1.000000p0
    v33 = fcvt_to_uint.i32 v126
    v34 = fcvt_from_sint.f32 v0
    v127 = fcmp ne v34, v34
    v128 = f32const -0x1.000000p0
    v129 = f32const 0x1.000000p32
    v130 = fcmp le v34, v128  ; v128 = -0x1.000000p0
    v131 = fcmp ge v34, v129  ; v129 = 0x1.000000p32
    v132 = bor v130, v131
    v133 = bor v127, v132
    v134 = f32const 0x1.000000p0
    v135 = select v133, v134, v34  ; v134 = 0x1.000000p0
    v35 = fcvt_to_uint.i32 v135
    v136 = fcmp ne v34, v34
    v137 = f32const -0x1.000000p0
    v138 = f32const 0x1.000000p32
    v139 = fcmp le v34, v137  ; v137 = -0x1.000000p0
    v140 = fcmp ge v34, v138  ; v138 = 0x1.000000p32
    v141 = bor v139, v140
    v142 = bor v136, v141
    v143 = f32const 0x1.000000p0
    v144 = select v142, v143, v34  ; v143 = 0x1.000000p0
    v36 = fcvt_to_uint.i32 v144
    v145 = fcmp ne v34, v34
    v146 = f32const -0x1.000000p0
    v147 = f32const 0x1.000000p32
    v148 = fcmp le v34, v146  ; v146 = -0x1.000000p0
    v149 = fcmp ge v34, v147  ; v147 = 0x1.000000p32
    v150 = bor v148, v149
    v151 = bor v145, v150
    v152 = f32const 0x1.000000p0
    v153 = select v151, v152, v34  ; v152 = 0x1.000000p0
    v37 = fcvt_to_uint.i32 v153
    v154 = fcmp ne v34, v34
    v155 = f32const -0x1.000000p0
    v156 = f32const 0x1.000000p32
    v157 = fcmp le v34, v155  ; v155 = -0x1.000000p0
    v158 = fcmp ge v34, v156  ; v156 = 0x1.000000p32
    v159 = bor v157, v158
    v160 = bor v154, v159
    v161 = f32const 0x1.000000p0
    v162 = select v160, v161, v34  ; v161 = 0x1.000000p0
    v38 = fcvt_to_uint.i32 v162
    v163 = fcmp ne v34, v34
    v164 = f32const -0x1.000000p0
    v165 = f32const 0x1.000000p32
    v166 = fcmp le v34, v164  ; v164 = -0x1.000000p0
    v167 = fcmp ge v34, v165  ; v165 = 0x1.000000p32
    v168 = bor v166, v167
    v169 = bor v163, v168
    v170 = f32const 0x1.000000p0
    v171 = select v169, v170, v34  ; v170 = 0x1.000000p0
    v39 = fcvt_to_uint.i32 v171
    v172 = fcmp ne v34, v34
    v173 = f32const -0x1.000000p0
    v174 = f32const 0x1.000000p32
    v175 = fcmp le v34, v173  ; v173 = -0x1.000000p0
    v176 = fcmp ge v34, v174  ; v174 = 0x1.000000p32
    v177 = bor v175, v176
    v178 = bor v172, v177
    v179 = f32const 0x1.000000p0
    v180 = select v178, v179, v34  ; v179 = 0x1.000000p0
    v40 = fcvt_to_uint.i32 v180
    v181 = fcmp ne v34, v34
    v182 = f32const -0x1.000000p0
    v183 = f32const 0x1.000000p32
    v184 = fcmp le v34, v182  ; v182 = -0x1.000000p0
    v185 = fcmp ge v34, v183  ; v183 = 0x1.000000p32
    v186 = bor v184, v185
    v187 = bor v181, v186
    v188 = f32const 0x1.000000p0
    v189 = select v187, v188, v34  ; v188 = 0x1.000000p0
    v41 = fcvt_to_uint.i32 v189
    return v7, v30, v9, v9, v11, v6, v7, v41, v6, v6, v6, v6  ; v11 = 0xffff_ffff_ffff_9b9b
}

; run: %a(-7234017283807667301, 0.0, 0, 0, 0.0, 0, 0, 0.0, 0, 0) == [0.0, 0, 0, 0, -25701, 0, 0.0, 1, 0, 0, 0, 0]
Fuzzer input

Base64:

IUIJCAAIMgAAACkBAAD/////KP//////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAFwAAACMABCMAAAAjAAAAAAAAAAAALgAA+PD4YfgAdTo6MgA99gDqo8oEAAAAMF8DAAAAALAA/3UwOjb/////////////////////O/+bm5ubm5ubm5ubm5ubdTE6MwCwAP91MDo2/////////wED/////////zv/myibm2VkZGRkZGRedZubm5ubm5ubm5vOZP+bm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubmw==

On this commit: 8bb183f

Again, this doesn't seem to reproduce via the regular CLIF test case, only via the fuzzer. But when compiling it comes out as 288 + 32 + 32 + 3208 = 3560 bytes which is slightly under one page. With this commit the input no longer crashes!


I'll look into making those changes tommorow. Just wanted to make sure it wasn't the issue @bjorn3 was mentioning above!

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 8, 2023

If it fits in a single page then it is likely not the issue I was talking about. Or does the fuzzer call finalize_definitions in between two function compilations? That would force them to end up in separately allocated pages.

@afonso360
Copy link
Contributor Author

afonso360 commented Mar 8, 2023

No, we define and declare all of them (inc trampolines), and call finalize_definitions once in TestFileCompiler::compile.

@afonso360 afonso360 force-pushed the riscv-fix-call-reloc branch from 41260be to 2e170f8 Compare March 9, 2023 18:58
@afonso360
Copy link
Contributor Author

I believe that's because you would otherwise have to left-shift it again to place it in auipc's immediate field, right?

Yes, but also we need it without the left shifts to calculate lo12, and at that point there was a bunch of shifting going on which seemed even more confusing. I've added that remark as a comment though.

Could you double check if the comments match what you expected?

@jameysharp
Copy link
Contributor

You've covered almost everything that confused me. Thanks! The remaining bit is that lo12 is also a signed offset, +/- 2kB, relative to PC+hi20.

Under some test case layouts the call relocation
panicking with an underflow. Use `wrapping_sub` to
signal that this is expected.

The fuzzer took a while to generate such a test case.
And I can't introduce it as a regression test because
when running via the regular clif-util run tests the
layout is different and the test case passes!

I think this is because in the fuzzer we only add
one trampoline, while in clif-util we build trampolines
for each funcion in the file.

Co-Authored-By: Jamey Sharp <[email protected]>
@afonso360 afonso360 force-pushed the riscv-fix-call-reloc branch from 2e170f8 to 186e81e Compare March 10, 2023 11:18
@afonso360 afonso360 enabled auto-merge March 10, 2023 11:22
@afonso360 afonso360 added this pull request to the merge queue Mar 10, 2023
Merged via the queue into bytecodealliance:main with commit e64fb6a Mar 10, 2023
@afonso360 afonso360 deleted the riscv-fix-call-reloc branch March 10, 2023 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants