Skip to content

Commit

Permalink
[mono] LLVM 11: Explicitly zero the unused bits of the result registe…
Browse files Browse the repository at this point in the history
…r for AddPairwiseScalar (#53694)

LLVM 11 and above optimize

    %9 = extractelement <2 x float> %arm64_ld1, i32 0
    %10 = extractelement <2 x float> %arm64_ld1, i32 1
    %arm64_faddp_scalar = fadd float %9, %10
    %11 = insertelement <2 x float> undef, float %arm64_faddp_scalar, i32 0

(which is translated to scalar `faddp`)

into

    %shift = shufflevector <2 x float> %arm64_ld1, <2 x float> undef, <2 x i32> <i32 1, i32 undef>
    %10 = fadd <2 x float> %arm64_ld1, %shift
    %11 = shufflevector <2 x float> %10, <2 x float> undef, <2 x i32> <i32 0, i32 undef>

(which is translated to a sequence of `dup` and vector `fadd`).

This change works around this by explicitly zeroing the unused bits of the
results of `AddPairwiseScalar`; the generated code is noisier, but the
semantics are correct. The "Arm Architecture Reference Manual Armv8, for
Armv8-A architecture profile" version G.a calls out the zero-extending
semantics of scalar operations that use SIMD registers (see
"aarch64/functions/registers/V") but judging by the generated code it doesn't
look like LLVM exploits this for optimization.

This also affects `vpadds_f32` in Clang.
  • Loading branch information
imhameed authored Jun 4, 2021
1 parent 90e201e commit c3126d0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/mono/mono/mini/mini-llvm.c
Original file line number Diff line number Diff line change
Expand Up @@ -10714,7 +10714,7 @@ process_bb (EmitContext *ctx, MonoBasicBlock *bb)
case OP_ARM64_ADDP_SCALAR: {
llvm_ovr_tag_t ovr_tag = INTRIN_vector128 | INTRIN_int64;
LLVMValueRef result = call_overloaded_intrins (ctx, INTRINS_AARCH64_ADV_SIMD_UADDV, ovr_tag, &lhs, "arm64_addp_scalar");
result = LLVMBuildInsertElement (builder, LLVMGetUndef (v64_i8_t), result, const_int32 (0), "");
result = LLVMBuildInsertElement (builder, LLVMConstNull (v64_i8_t), result, const_int32 (0), "");
values [ins->dreg] = result;
break;
}
Expand All @@ -10723,7 +10723,7 @@ process_bb (EmitContext *ctx, MonoBasicBlock *bb)
LLVMValueRef hi = LLVMBuildExtractElement (builder, lhs, const_int32 (0), "");
LLVMValueRef lo = LLVMBuildExtractElement (builder, lhs, const_int32 (1), "");
LLVMValueRef result = LLVMBuildFAdd (builder, hi, lo, "arm64_faddp_scalar");
result = LLVMBuildInsertElement (builder, LLVMGetUndef (ret_t), result, const_int32 (0), "");
result = LLVMBuildInsertElement (builder, LLVMConstNull (ret_t), result, const_int32 (0), "");
values [ins->dreg] = result;
break;
}
Expand Down

0 comments on commit c3126d0

Please sign in to comment.