[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

banach-space · 2025-01-20T15:26:16Z

Hi folks!

Having reviewed a few recent PRs for VectorEmulateNarrowType.cpp, e.g.:

I believe it’s time to improve code reusability and unify naming conventions in the file. At ~1.7k lines, it's becoming increasingly difficult to review patches.

Proposal 1

To begin with, I would like for us to start re-using the logic from alignedConversionPrecondition. This hook effectively checks two conditions:

Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).
Could the input/source vector of sub-byte scalar types (e.g. vector<2xi4>) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

Similar checks are repeated throughout the code, e.g.

Cond 1: check 1, check 2, check 3 ... .
Cond 2: check 1, check 2, check 3, ... .

In order to enable re-use, it would help to do some minor variable renaming first. Otherwise, it's quite hard to see that basically identical quantity is computed/defined multiple times throughout the code. Rather than implementing this in one big patch, I have split it into 4 independent changes:

These are basically stacked PRs. I've made sure to link the relevant commit (1 per PR) in the summary, so that it's easy for you to find it.

Proposal 2

We should clarify the meaning of "source" and "destination" types. Currently, the usage is ambiguous. For example, for this arith.extsi Op,

  %0 = arith.extsi %arg0 : vector<8xi2> to vector<8xi32>

vector<8xi2> and vector<8xi32> are the "source" and "destination" types, respectively (i.e. that's the interpretation we tend to use).

However, patterns like RewriteAlignedSubByteIntExt introduce vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector<8xi2> to vector<2xi8>

I've noticed that we tend to consider both:

vector<2xi8>(the result of vector.bitcast), and vector<8xi32> (the output of arith.extsi)

as the destination type. That should be clarified. I suggest two steps:

Avoid "destination" and "source" when referring to the "emulation" logic. Reserve that for the original Op (e.g. arith.extsi above). Better still, use the naming from Op definitions (for arith.extsi that would mean in and out for "source" and "destination").
For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

WDYT?

Proposal 3

Document the updated naming in the file.

CC @lialan @ziereis @dcaballe

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-01-20T15:26:50Z

@llvm/issue-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Hi folks!

Having reviewed a few recent PRs for VectorEmulateNarrowType.cpp, e.g.:

I believe it’s time to improve code reusability and unify naming conventions in the file. At ~1.7k lines, it's becoming increasingly difficult to review patches.

Proposal 1

To begin with, I would like for us to start re-using the logic from alignedConversionPrecondition. This hook effectively checks two conditions:

Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).
Could the input/source vector of sub-byte scalar types (e.g. vector<2xi4>) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

Similar checks are repeated throughout the code, e.g.

Cond 1: check 1, check 2, check 3 ... .
Cond 2: check 1, check 2, check 3, ... .

In order to enable re-use, it would help to do some minor variable renaming first. Otherwise, it's quite hard to see that basically identical quantity is computed/defined multiple times throughout the code. Rather than implementing this in one big patch, I have split it into 4 independent changes:

These are basically stacked PRs. I've made sure to link the relevant commit (1 per PR) in the summary, so that it's easy for you to find it.

Proposal 2

We should clarify the meaning of "source" and "destination" types. Currently, the usage is ambiguous. For example, for this arith.extsi Op,

  %0 = arith.extsi %arg0 : vector&lt;8xi2&gt; to vector&lt;8xi32&gt;

vector<8xi2> and vector<8xi32> are the "source" and "destination" types, respectively (i.e. that's the interpretation we tend to use).

However, patterns like RewriteAlignedSubByteIntExt introduce vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector&lt;8xi2&gt; to vector&lt;2xi8&gt;

I've noticed that we tend to consider both:

vector<2xi8>(the result of vector.bitcast), and vector<8xi32> (the output of arith.extsi)

as the destination type. That should be clarified. I suggest two steps:

Avoid "destination" and "source" when referring to the "emulation" logic. Reserve that for the original Op (e.g. arith.extsi above). Better still, use the naming from Op definitions (for arith.extsi that would mean in and out for "source" and "destination").
For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

WDYT?

Proposal 3

Document the updated naming in the file.

CC @lialan @ziereis @dcaballe

This PR aims at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. Implements llvm#123630. **CHANGE 1** Renames `srcBits/dstBits` to `oldBits/newBits` to improve consistency in naming within the file. This is illustrated below: ```cpp // Extracted from VectorEmulateNarrowType.cpp Type oldElementType = op.getType().getElementType(); Type newElementType = convertedType.getElementType(); // BEFORE (mixing old/new and src/dst): // int srcBits = oldElementType.getIntOrFloatBitWidth(); // int dstBits = newElementType.getIntOrFloatBitWidth(); // AFTER (consistently using old/new): int oldBits = oldElementType.getIntOrFloatBitWidth(); int newBits = newElementType.getIntOrFloatBitWidth(); ``` Also adds some comments and unifies related "rewriter notification" messages. **CHANGE 2** Renames the variable "scale". Note, "scale" could mean either: * "original-elements-per-emulated-type", or * "emulated-elements-per-original-type". While from the context it is clear that it's always the former (original type is always a sub-byte type and the emulated type is usually `i8`), this PR reduces the cognitive load by making this clear. **CHANGE 3** Replaces `isUnalignedEmulation` with `isFullyAligned` Note, `isUnalignedEmulation` is always computed following a "per-element-alignment" condition: ```cpp // Check per-element alignment. if (newBits % oldBits != 0) { return rewriter.notifyMatchFailure(op, "unalagined element types"); } // (...) bool isUnalignedEmulation = origElements % elementsPerContainerType != 0; ``` Given that `isUnalignedEmulation` captures only one of two conditions required for "full alignment", it should be re-named as `isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and renamed it as `isFullyAligned`: ```cpp bool isFullyAligned = origElements % elementsPerContainerType == 0; ``` **CHANGE 4** Unifies various comments throughout the file (for consistency). **CHANGE 5** Adds new comments throughout the file and adds TODOs where high-level comments are missing. **CHANGE 6** Update `alignedConversionPrecondition` (1): This method didn't require the vector type for the "destination" argument. The underlying element type is sufficient. The corresponding argument has been renamed as `multiByteScalarTy` - this is meant as the multi-byte emulated type (`i8`, `i16`, `i32`, etc). **CHANGE 7** Update `alignedConversionPrecondition` (2): In llvm#121298, we replaced `dstElemBitwidt` in this calculation: ```cpp const int numSrcElemsPerDestElem = dstElemBitwidth / srcElemBitwidth; ``` with the hard-coded value of 8: ```cpp const int numSrcElemsPerDestElem = 8 / srcElemBitwidth; ``` That was correct as for the patterns for which this hook was/is used: * `RewriteAlignedSubByteIntExt`, * `RewriteAlignedSubByteIntTrunc`. The destination type (or, more precisely, the emulated type) was always `i8`. In this PR, I am switching back to a more generic approach - the calculation should take into account the bit-width of the emulated type. Note that at the call sites I am passing `i8` as the emulated type, so the end-result is effectively identical. However, the intent is clearer, i.e., the underlying value is 8 because the emulated type happens to be `i8` (as opposed using a magic number). **CHANGE 8** Update alignedConversionPrecondition (3): The final check has been replaced with a new helper method, `isSubByteVecFittable`. This new method is also re-used within the code and hopefully will allow us more code re-use moving forward (to avoid re-implementing the same condition).

dcaballe · 2025-01-23T17:20:50Z

Thanks for bringing this up.

Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).

Could the input/source vector of sub-byte scalar types (e.g. vector<2xi4>) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

I always thought that the term alignment is misused here. We are checking for the multiplicity/divisibility of the different type sizes. i8 size is a multiple of i2 size, i3 size is not. I think it would be better to use multiple or similar instead of alignment.

For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

Emulated/container or emulated/native types sound good to me. I think we also used "native" somewhere.

banach-space · 2025-01-23T18:35:15Z

I always thought that the term alignment is misused here.

I am glad that you brought it up. Agreed, I think that it's been a source of a lot of confusion. Lets fix it :)

Emulated/container or emulated/native types sound good to me.

I prefer "emulated/container", lets go with that.

So, mindful that #123526 has already received 3 x "+1" , would you be ok with that patch if I used

emulatedElemTy + containerElemTy, in place of:
oldElementType + newElementType

? That would fix consistency and take us back on the right "naming" path :)

This is PR 1 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. This PR renames: * `srcBits`/`dstBits` + `oldElementType`/`newElementType` to improve consistency in naming within the file. This is illustrated below: ```cpp // Extracted from VectorEmulateNarrowType.cpp // BEFORE (mixing old/new and src/dst): // Type oldElementType = op.getType().getElementType(); // Type newElementType = convertedType.getElementType(); // int srcBits = oldElementType.getIntOrFloatBitWidth(); // int dstBits = newElementType.getIntOrFloatBitWidth(); // AFTER (consistently using emulated/container): Type emulatedElemType = op.getType().getElementType(); Type containerElemType = convertedType.getElementType(); int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth(); int containerBits = containerElemTy.getIntOrFloatBitWidth(); ``` Also adds some comments and unifies related "rewriter notification" messages. **GitHub issue to track this work:** * #123630

…123526) This is PR 1 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. This PR renames: * `srcBits`/`dstBits` + `oldElementType`/`newElementType` to improve consistency in naming within the file. This is illustrated below: ```cpp // Extracted from VectorEmulateNarrowType.cpp // BEFORE (mixing old/new and src/dst): // Type oldElementType = op.getType().getElementType(); // Type newElementType = convertedType.getElementType(); // int srcBits = oldElementType.getIntOrFloatBitWidth(); // int dstBits = newElementType.getIntOrFloatBitWidth(); // AFTER (consistently using emulated/container): Type emulatedElemType = op.getType().getElementType(); Type containerElemType = convertedType.getElementType(); int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth(); int containerBits = containerElemTy.getIntOrFloatBitWidth(); ``` Also adds some comments and unifies related "rewriter notification" messages. **GitHub issue to track this work:** * llvm/llvm-project#123630

This is PR 2 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. This PR renames the variable "scale". Note, "scale" could mean either: * "original-elements-per-emulated-type", or * "emulated-elements-per-original-type". While from the context it is clear that it's always the former (original type is always a sub-byte type and the emulated type is usually `i8`), this PR reduces the cognitive load by making this clear. **DEPENDS ON:** * llvm#123526 123526 Please only review the [top commit](llvm@d40b31b). **GitHub issue to track this work**: llvm#123630

This is PR 2 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. **CHANGE 1** Renames the variable "scale". Note, "scale" could mean either: * "container-elements-per-emulated-type", or * "emulated-elements-per-container-type". While from the context it is clear that it's always the former (original type is always a sub-byte type and the emulated type is usually `i8`), this PR reduces the cognitive load by making this clear. **CHANGE 2** Replaces `isUnalignedEmulation` with `isFullyAligned` Note, `isUnalignedEmulation` is always computed following a "per-element-alignment" condition: ```cpp // Check per-element alignment. if (containerBits % emulatedBits != 0) { return rewriter.notifyMatchFailure( op, "impossible to pack emulated elements into container elements " "(bit-wise misalignment)"); } // (...) bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0; ``` Given that `isUnalignedEmulation` captures only one of two conditions required for "full alignment", it should be re-named as `isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and renamed it as `isFullyAligned`: ```cpp bool isFullyAligned = origElements % emulatedPerContainerElem == 0; ``` **CHANGE 3** * Unifies various comments throughout the file (for consistency). * Adds new comments throughout the file and adds TODOs where high-level comments are missing. **GitHub issue to track this work**: #123630

…123527) This is PR 2 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. **CHANGE 1** Renames the variable "scale". Note, "scale" could mean either: * "container-elements-per-emulated-type", or * "emulated-elements-per-container-type". While from the context it is clear that it's always the former (original type is always a sub-byte type and the emulated type is usually `i8`), this PR reduces the cognitive load by making this clear. **CHANGE 2** Replaces `isUnalignedEmulation` with `isFullyAligned` Note, `isUnalignedEmulation` is always computed following a "per-element-alignment" condition: ```cpp // Check per-element alignment. if (containerBits % emulatedBits != 0) { return rewriter.notifyMatchFailure( op, "impossible to pack emulated elements into container elements " "(bit-wise misalignment)"); } // (...) bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0; ``` Given that `isUnalignedEmulation` captures only one of two conditions required for "full alignment", it should be re-named as `isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and renamed it as `isFullyAligned`: ```cpp bool isFullyAligned = origElements % emulatedPerContainerElem == 0; ``` **CHANGE 3** * Unifies various comments throughout the file (for consistency). * Adds new comments throughout the file and adds TODOs where high-level comments are missing. **GitHub issue to track this work**: llvm/llvm-project#123630

This is PR 2 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. **CHANGE 1** Renames the variable "scale". Note, "scale" could mean either: * "container-elements-per-emulated-type", or * "emulated-elements-per-container-type". While from the context it is clear that it's always the former (original type is always a sub-byte type and the emulated type is usually `i8`), this PR reduces the cognitive load by making this clear. **CHANGE 2** Replaces `isUnalignedEmulation` with `isFullyAligned` Note, `isUnalignedEmulation` is always computed following a "per-element-alignment" condition: ```cpp // Check per-element alignment. if (containerBits % emulatedBits != 0) { return rewriter.notifyMatchFailure( op, "impossible to pack emulated elements into container elements " "(bit-wise misalignment)"); } // (...) bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0; ``` Given that `isUnalignedEmulation` captures only one of two conditions required for "full alignment", it should be re-named as `isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and renamed it as `isFullyAligned`: ```cpp bool isFullyAligned = origElements % emulatedPerContainerElem == 0; ``` **CHANGE 3** * Unifies various comments throughout the file (for consistency). * Adds new comments throughout the file and adds TODOs where high-level comments are missing. **GitHub issue to track this work**: llvm#123630

banach-space · 2025-03-08T09:19:44Z

NEXT STEPS (1):

We need to clarify the meaning of "source" and "destination" types.
Currently the usage is ambiguous.

For example, for this arith.extsi Op, vector<8xi2> and
vector<8xi32> are the "source" and "destination" types, respectively:

  %0 = arith.extsi %arg0 : vector<8xi2> to vector<8xi32>
}

However, patterns like RewriteAlignedSubByteIntExt introduce
vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector<8xi2> to vector<2xi8>

I've noticed that we tend to mix vector<2xi8> and vector<8xi32> as
the destination types and that should be clarified.

NEXT STEPS (2):

With this PR, I am introducing explicit references to "sub-byte" as
that is effectively what this logic is used of (i.e. for emulating
"sub-byte" types). We should either generalise (which would include
increasing test coverage) or restrict everything to "sub-byte" type
emulation.

llvmbot added the mlir label Jan 20, 2025

banach-space self-assigned this Jan 20, 2025

banach-space mentioned this issue Jan 20, 2025

[mlir][Vector] Update VectorEmulateNarrowType.cpp #123633

Closed

banach-space mentioned this issue Jan 24, 2025

[MLIR] Implement emulation of static indexing subbyte type vector stores #115922

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

banach-space commented Jan 20, 2025 •

edited

Loading

llvmbot commented Jan 20, 2025

Proposal 1

Proposal 2

Proposal 3

dcaballe commented Jan 23, 2025

banach-space commented Jan 23, 2025

banach-space commented Mar 8, 2025

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

Comments

banach-space commented Jan 20, 2025 • edited Loading

Proposal 1

Proposal 2

Proposal 3

llvmbot commented Jan 20, 2025

Proposal 1

Proposal 2

Proposal 3

dcaballe commented Jan 23, 2025

banach-space commented Jan 23, 2025

banach-space commented Mar 8, 2025

banach-space commented Jan 20, 2025 •

edited

Loading