Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable EVEX feature: embedded broadcast for Vector128/256/512.Add() i…
…n limited cases (#84821) * Enable EVEX feature: embedded broadcast Embedded Broadcast is enabled in Vector256<float>.Add() with limited cases: 1. Vector256.Add(Vec, Vector256.Create(DCon)); 2. Vector256<float> VecCns = Vector256.Create(DCon); Vector256.Add(Vec, VecCns); 3. Vector256.Add(Vec, Vector256.Create(LCL_VAR)); 4. Vector256<float> VecCns = Vector256.Create(LCL_VAR); Vector256.Add(Vec, VecCns); Note: Case 2 4 can only be optimized when DOTNET_TieredCompilation = 0. * remove some irrelevent change from previous main. * Enable containment at Broadcast intrinsic to improve the embedded broadcast enabling works. * Convert the check logics on broadcast into a flag * bug fixes: 1. fixed the contain logic at lowering, to accomadate the situation when both operands for a EB compatible node are EB candidates. 2. fixed some unexpected EVEX.b set at some non-EVEX instructions on x86 * apply format patch. * Add "insOpts" data structure to xarch: insOpts may contain information on the EVEX.b bit, currently only embedded broaddcast * Add "OperIsBroadcastScalar" check: This check is to ensure the intrinsic is actually a broadcast scalar intrinsic, the reason to add this check is that gentree flags are using overlapping definition, GTF_BROADCAST_EMBEDDED has some conflicting definition, so we need to ensure the flag we checked does not come from other overlapping flags. * rebase the branch and resolve conflicts * changes based on the reivews: 1. removed the gentree flag GTF_EMBEDDED_BROADCAST. 2. mark the embedded broadcast node by making it contained. 3. improved logics in GetMemOpSize() to return the correct pointer size when embedded broadcast is enabled. 4. improved logics in genOperandDesc() to emit scalar when constant vector operand is found to be created from scalar. * apply format patch * bug fixes * bug fixes * aaply format patch * Enable embedded broadcast for Vector128<float>.Add * Enable embedded broadcast for Vector512<float>.Add * make double as embedded broadcast supported * Add EB support to AVX_BroadcastScalarToVector* * apply format patch * Enable embedded broadcast for double const vector * Enable embedded broadcast for integer Add. * Changes based on the review: 1. Change GenTreeHWIntrinsic::OperIsEmbBroadcastHWIntrinsic to OperIsEmbBroadcastCompatible 2. removed OperIsBroadcastScalar 3. formatting 4. correct errors in the comments. * removed the gentree flag: GTF_VECCON_FROMSCALAR * Bug fixes on embedded broadcast with AVX_Broadcast * enable embedded broadcast in R_R_A path * apply format patch * bug fixes: re-introduce "OperIsBroadcastScalar", there are some cases when non-broadcast node (e.g. Load, Read) contained by embedded broadcast and embedded broadcast is enabled unexpectedly, using this method can filter out those cases. * Changes based on reviews: 1. code style improvement 2. fixes typos and errors in the comments. 3. extract the operand swap logic when lowering Create node into a function: TryCanonizeEmbBroadcastCandicate() * unfold VecCon node when lowering if this node is eligible for embedded broadcast. * apply format patch * bug fixes: 1. added missing default branch 2. filter out some possible embedded broadcast cases for some better optimization * resolve the mishandling for the previous conflict. * move the unfolding logic to ContainChecks * Code changes based on the review * apply format patch * support embedded broadcast for GT_IND as the operand of a broadcast node. * bug fixes: Long type should only be on 64-bit system. * apply format patch * Introduce MakeHWIntrinsicSrcContained(): This function will handle the case that constant vector is the operand of embedded broadcast ops. If the constant vector is eligible for embedded broadcast, will unfold the constatn vector to the corresponding broadcast intrinsic form. * Code changes based on reviews: 1. a helper function to detect embedded broadcast compatible flag 2. contain logic improvement. 3. typo fixes. * Code changes based on review * apply format patch * Code changes based on review: 1. deleted irrelevant comments. Move the contain check up to cover more cases. * Code changes based on review: 1. Update comment to keep up with the changes in InstrDesc. 2. Removed un-needed argumnet in the irrelevant method.
- Loading branch information