[mono] LLVM 11: Explicitly zero the unused bits of the result registe…

…r for AddPairwiseScalar (#53694) LLVM 11 and above optimize %9 = extractelement <2 x float> %arm64_ld1, i32 0 %10 = extractelement <2 x float> %arm64_ld1, i32 1 %arm64_faddp_scalar = fadd float %9, %10 %11 = insertelement <2 x float> undef, float %arm64_faddp_scalar, i32 0 (which is translated to scalar `faddp`) into %shift = shufflevector <2 x float> %arm64_ld1, <2 x float> undef, <2 x i32> <i32 1, i32 undef> %10 = fadd <2 x float> %arm64_ld1, %shift %11 = shufflevector <2 x float> %10, <2 x float> undef, <2 x i32> <i32 0, i32 undef> (which is translated to a sequence of `dup` and vector `fadd`). This change works around this by explicitly zeroing the unused bits of the results of `AddPairwiseScalar`; the generated code is noisier, but the semantics are correct. The "Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile" version G.a calls out the zero-extending semantics of scalar operations that use SIMD registers (see "aarch64/functions/registers/V") but judging by the generated code it doesn't look like LLVM exploits this for optimization. This also affects `vpadds_f32` in Clang.
dotnet · Jun 4, 2021 · c3126d0 · c3126d0
1 parent 90e201e
commit c3126d0
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/src/mono/mono/mini/mini-llvm.c b/src/mono/mono/mini/mini-llvm.c
@@ -10714,7 +10714,7 @@ process_bb (EmitContext *ctx, MonoBasicBlock *bb)
 		case OP_ARM64_ADDP_SCALAR: {
 			llvm_ovr_tag_t ovr_tag = INTRIN_vector128 | INTRIN_int64;
 			LLVMValueRef result = call_overloaded_intrins (ctx, INTRINS_AARCH64_ADV_SIMD_UADDV, ovr_tag, &lhs, "arm64_addp_scalar");
-			result = LLVMBuildInsertElement (builder, LLVMGetUndef (v64_i8_t), result, const_int32 (0), "");
+			result = LLVMBuildInsertElement (builder, LLVMConstNull (v64_i8_t), result, const_int32 (0), "");
 			values [ins->dreg] = result;
 			break;
 		}
@@ -10723,7 +10723,7 @@ process_bb (EmitContext *ctx, MonoBasicBlock *bb)
 			LLVMValueRef hi = LLVMBuildExtractElement (builder, lhs, const_int32 (0), "");
 			LLVMValueRef lo = LLVMBuildExtractElement (builder, lhs, const_int32 (1), "");
 			LLVMValueRef result = LLVMBuildFAdd (builder, hi, lo, "arm64_faddp_scalar");
-			result = LLVMBuildInsertElement (builder, LLVMGetUndef (ret_t), result, const_int32 (0), "");
+			result = LLVMBuildInsertElement (builder, LLVMConstNull (ret_t), result, const_int32 (0), "");
 			values [ins->dreg] = result;
 			break;
 		}