Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

Open
banach-space opened this issue Jan 20, 2025 · 4 comments
Open

[mlir] Improve code re-use in VectorEmulateNarrowType.cpp #123630

banach-space opened this issue Jan 20, 2025 · 4 comments
Assignees
Labels

Comments

@banach-space
Copy link
Contributor

banach-space commented Jan 20, 2025

Hi folks!

Having reviewed a few recent PRs for VectorEmulateNarrowType.cpp, e.g.:

I believe it’s time to improve code reusability and unify naming conventions in the file. At ~1.7k lines, it's becoming increasingly difficult to review patches.

Proposal 1

To begin with, I would like for us to start re-using the logic from alignedConversionPrecondition. This hook effectively checks two conditions:

  1. Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).
  2. Could the input/source vector of sub-byte scalar types (e.g. vector<2xi4>) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

Similar checks are repeated throughout the code, e.g.

In order to enable re-use, it would help to do some minor variable renaming first. Otherwise, it's quite hard to see that basically identical quantity is computed/defined multiple times throughout the code. Rather than implementing this in one big patch, I have split it into 4 independent changes:

These are basically stacked PRs. I've made sure to link the relevant commit (1 per PR) in the summary, so that it's easy for you to find it.

Proposal 2

We should clarify the meaning of "source" and "destination" types. Currently, the usage is ambiguous. For example, for this arith.extsi Op,

  %0 = arith.extsi %arg0 : vector<8xi2> to vector<8xi32>

vector<8xi2> and vector<8xi32> are the "source" and "destination" types, respectively (i.e. that's the interpretation we tend to use).

However, patterns like RewriteAlignedSubByteIntExt introduce vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector<8xi2> to vector<2xi8>

I've noticed that we tend to consider both:

  • vector<2xi8>(the result of vector.bitcast), and vector<8xi32> (the output of arith.extsi)

as the destination type. That should be clarified. I suggest two steps:

  • Avoid "destination" and "source" when referring to the "emulation" logic. Reserve that for the original Op (e.g. arith.extsi above). Better still, use the naming from Op definitions (for arith.extsi that would mean in and out for "source" and "destination").
  • For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

WDYT?

Proposal 3

Document the updated naming in the file.

CC @lialan @ziereis @dcaballe

@llvmbot llvmbot added the mlir label Jan 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2025

@llvm/issue-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Hi folks!

Having reviewed a few recent PRs for VectorEmulateNarrowType.cpp, e.g.:

I believe it’s time to improve code reusability and unify naming conventions in the file. At ~1.7k lines, it's becoming increasingly difficult to review patches.

Proposal 1

To begin with, I would like for us to start re-using the logic from alignedConversionPrecondition. This hook effectively checks two conditions:

  1. Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).
  2. Could the input/source vector of sub-byte scalar types (e.g. vector&lt;2xi4&gt;) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

Similar checks are repeated throughout the code, e.g.

In order to enable re-use, it would help to do some minor variable renaming first. Otherwise, it's quite hard to see that basically identical quantity is computed/defined multiple times throughout the code. Rather than implementing this in one big patch, I have split it into 4 independent changes:

These are basically stacked PRs. I've made sure to link the relevant commit (1 per PR) in the summary, so that it's easy for you to find it.

Proposal 2

We should clarify the meaning of "source" and "destination" types. Currently, the usage is ambiguous. For example, for this arith.extsi Op,

  %0 = arith.extsi %arg0 : vector&lt;8xi2&gt; to vector&lt;8xi32&gt;

vector&lt;8xi2&gt; and vector&lt;8xi32&gt; are the "source" and "destination" types, respectively (i.e. that's the interpretation we tend to use).

However, patterns like RewriteAlignedSubByteIntExt introduce vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector&lt;8xi2&gt; to vector&lt;2xi8&gt;

I've noticed that we tend to consider both:

  • vector&lt;2xi8&gt;(the result of vector.bitcast), and vector&lt;8xi32&gt; (the output of arith.extsi)

as the destination type. That should be clarified. I suggest two steps:

  • Avoid "destination" and "source" when referring to the "emulation" logic. Reserve that for the original Op (e.g. arith.extsi above). Better still, use the naming from Op definitions (for arith.extsi that would mean in and out for "source" and "destination").
  • For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

WDYT?

Proposal 3

Document the updated naming in the file.

CC @lialan @ziereis @dcaballe

@banach-space banach-space self-assigned this Jan 20, 2025
banach-space added a commit to banach-space/llvm-project that referenced this issue Jan 20, 2025
This  PR aims at improving "VectorEmulateNarrowType.cpp". This is mainly
minor refactoring, no major functional changes are made/added.
Implements llvm#123630.

**CHANGE 1**
Renames `srcBits/dstBits` to `oldBits/newBits` to improve consistency in
naming within the file. This is illustrated below:

```cpp
  // Extracted from VectorEmulateNarrowType.cpp
  Type oldElementType = op.getType().getElementType();
  Type newElementType = convertedType.getElementType();

  // BEFORE (mixing old/new and src/dst):
  // int srcBits = oldElementType.getIntOrFloatBitWidth();
  // int dstBits = newElementType.getIntOrFloatBitWidth();

  // AFTER (consistently using old/new):
  int oldBits = oldElementType.getIntOrFloatBitWidth();
  int newBits = newElementType.getIntOrFloatBitWidth();
```

Also adds some comments and unifies related "rewriter notification"
messages.

**CHANGE 2**
Renames the variable "scale". Note, "scale" could mean either:

  * "original-elements-per-emulated-type", or
  * "emulated-elements-per-original-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**CHANGE 3**
Replaces `isUnalignedEmulation` with `isFullyAligned`

Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (newBits % oldBits != 0) {
  return rewriter.notifyMatchFailure(op, "unalagined element types");
}

// (...)

bool isUnalignedEmulation = origElements % elementsPerContainerType != 0;
```

Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:

```cpp
bool isFullyAligned = origElements % elementsPerContainerType == 0;
```

**CHANGE 4**
Unifies various comments throughout the file (for consistency).

**CHANGE 5**
Adds new comments throughout the file and adds TODOs where high-level
comments are missing.

**CHANGE 6**
Update `alignedConversionPrecondition` (1):

This method didn't require the vector type for the "destination"
argument. The underlying element type is sufficient. The corresponding
argument has been renamed as `multiByteScalarTy` - this is meant as the
multi-byte emulated type (`i8`, `i16`, `i32`, etc).

**CHANGE 7**
Update `alignedConversionPrecondition` (2):

In llvm#121298, we replaced `dstElemBitwidt` in this calculation:

```cpp
  const int numSrcElemsPerDestElem = dstElemBitwidth / srcElemBitwidth;
```

with the hard-coded value of 8:
```cpp
  const int numSrcElemsPerDestElem = 8 / srcElemBitwidth;
```

That was correct as for the patterns for which this hook was/is used:

  * `RewriteAlignedSubByteIntExt`,
  * `RewriteAlignedSubByteIntTrunc`.

The destination type (or, more precisely, the emulated type) was always
`i8`.

In this PR, I am switching back to a more generic approach - the
calculation should take into account the bit-width of the emulated type.

Note that at the call sites I am passing `i8` as the emulated type, so the
end-result is effectively identical. However, the intent is clearer, i.e.,
the underlying value is 8 because the emulated type happens to be `i8`
(as opposed using a magic number).

**CHANGE 8**
Update alignedConversionPrecondition (3):

The final check has been replaced with a new helper method,
`isSubByteVecFittable`. This new method is also re-used within the code
and hopefully will allow us more code re-use moving forward (to avoid
re-implementing the same condition).
@dcaballe
Copy link
Contributor

Thanks for bringing this up.

Per-element alignment (e.g. i2 and i8 are aligned, i3 and i8 are not aligned).

Could the input/source vector of sub-byte scalar types (e.g. vector<2xi4>) fit within the target mulit-byte container type (e.g. i8 or i32), this is checked here.

I always thought that the term alignment is misused here. We are checking for the multiplicity/divisibility of the different type sizes. i8 size is a multiple of i2 size, i3 size is not. I think it would be better to use multiple or similar instead of alignment.

For the "emulation" logic, use "emulated type" and "container name" for the original type (to be emulated, e.g. i2) and the target type that's used for emulation (e.g. i8).

Emulated/container or emulated/native types sound good to me. I think we also used "native" somewhere.

@banach-space
Copy link
Contributor Author

I always thought that the term alignment is misused here.

I am glad that you brought it up. Agreed, I think that it's been a source of a lot of confusion. Lets fix it :)

Emulated/container or emulated/native types sound good to me.

I prefer "emulated/container", lets go with that.

So, mindful that #123526 has already received 3 x "+1" , would you be ok with that patch if I used

  • emulatedElemTy + containerElemTy, in place of:
  • oldElementType + newElementType

? That would fix consistency and take us back on the right "naming" path :)

banach-space added a commit that referenced this issue Feb 2, 2025
This is PR 1 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames:
* `srcBits`/`dstBits` + `oldElementType`/`newElementType`

to improve consistency in naming within the file. This is illustrated
below:

```cpp
  // Extracted from VectorEmulateNarrowType.cpp

  // BEFORE (mixing old/new and src/dst):
  // Type oldElementType = op.getType().getElementType();
  // Type newElementType = convertedType.getElementType();

  // int srcBits = oldElementType.getIntOrFloatBitWidth();
  // int dstBits = newElementType.getIntOrFloatBitWidth();

  // AFTER (consistently using emulated/container):
  Type emulatedElemType = op.getType().getElementType();
  Type containerElemType = convertedType.getElementType();

  int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth();
  int containerBits = containerElemTy.getIntOrFloatBitWidth();
```

Also adds some comments and unifies related "rewriter notification"
messages.

**GitHub issue to track this work:**
* #123630
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 2, 2025
…123526)

This is PR 1 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames:
* `srcBits`/`dstBits` + `oldElementType`/`newElementType`

to improve consistency in naming within the file. This is illustrated
below:

```cpp
  // Extracted from VectorEmulateNarrowType.cpp

  // BEFORE (mixing old/new and src/dst):
  // Type oldElementType = op.getType().getElementType();
  // Type newElementType = convertedType.getElementType();

  // int srcBits = oldElementType.getIntOrFloatBitWidth();
  // int dstBits = newElementType.getIntOrFloatBitWidth();

  // AFTER (consistently using emulated/container):
  Type emulatedElemType = op.getType().getElementType();
  Type containerElemType = convertedType.getElementType();

  int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth();
  int containerBits = containerElemTy.getIntOrFloatBitWidth();
```

Also adds some comments and unifies related "rewriter notification"
messages.

**GitHub issue to track this work:**
* llvm/llvm-project#123630
banach-space added a commit to banach-space/llvm-project that referenced this issue Feb 2, 2025
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames the variable "scale". Note, "scale" could mean either:

  * "original-elements-per-emulated-type", or
  * "emulated-elements-per-original-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**DEPENDS ON:**
* llvm#123526 123526

Please only review the [top
commit](llvm@d40b31b).

**GitHub issue to track this work**:
llvm#123630
banach-space added a commit to banach-space/llvm-project that referenced this issue Feb 2, 2025
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames the variable "scale". Note, "scale" could mean either:

  * "original-elements-per-emulated-type", or
  * "emulated-elements-per-original-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**DEPENDS ON:**
* llvm#123526 123526

Please only review the [top
commit](llvm@d40b31b).

**GitHub issue to track this work**:
llvm#123630
banach-space added a commit to banach-space/llvm-project that referenced this issue Feb 3, 2025
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames the variable "scale". Note, "scale" could mean either:

  * "original-elements-per-emulated-type", or
  * "emulated-elements-per-original-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**DEPENDS ON:**
* llvm#123526 123526

Please only review the [top
commit](llvm@d40b31b).

**GitHub issue to track this work**:
llvm#123630
banach-space added a commit that referenced this issue Feb 6, 2025
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

**CHANGE 1** 

Renames the variable "scale". Note, "scale" could mean either:

  * "container-elements-per-emulated-type", or
  * "emulated-elements-per-container-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**CHANGE 2** 

Replaces `isUnalignedEmulation` with `isFullyAligned`

Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (containerBits % emulatedBits != 0) {
  return rewriter.notifyMatchFailure(
    op, "impossible to pack emulated elements into container elements "
    "(bit-wise misalignment)");
}

// (...)

bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0;
```

Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:

```cpp
bool isFullyAligned = origElements % emulatedPerContainerElem == 0;
```

**CHANGE 3**
  * Unifies various comments throughout the file (for consistency).
* Adds new comments throughout the file and adds TODOs where high-level
    comments are missing.
    
    
**GitHub issue to track this work**:
#123630
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 6, 2025
…123527)

This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

**CHANGE 1**

Renames the variable "scale". Note, "scale" could mean either:

  * "container-elements-per-emulated-type", or
  * "emulated-elements-per-container-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**CHANGE 2**

Replaces `isUnalignedEmulation` with `isFullyAligned`

Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (containerBits % emulatedBits != 0) {
  return rewriter.notifyMatchFailure(
    op, "impossible to pack emulated elements into container elements "
    "(bit-wise misalignment)");
}

// (...)

bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0;
```

Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:

```cpp
bool isFullyAligned = origElements % emulatedPerContainerElem == 0;
```

**CHANGE 3**
  * Unifies various comments throughout the file (for consistency).
* Adds new comments throughout the file and adds TODOs where high-level
    comments are missing.

**GitHub issue to track this work**:
llvm/llvm-project#123630
Icohedron pushed a commit to Icohedron/llvm-project that referenced this issue Feb 11, 2025
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

**CHANGE 1** 

Renames the variable "scale". Note, "scale" could mean either:

  * "container-elements-per-emulated-type", or
  * "emulated-elements-per-container-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**CHANGE 2** 

Replaces `isUnalignedEmulation` with `isFullyAligned`

Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (containerBits % emulatedBits != 0) {
  return rewriter.notifyMatchFailure(
    op, "impossible to pack emulated elements into container elements "
    "(bit-wise misalignment)");
}

// (...)

bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0;
```

Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:

```cpp
bool isFullyAligned = origElements % emulatedPerContainerElem == 0;
```

**CHANGE 3**
  * Unifies various comments throughout the file (for consistency).
* Adds new comments throughout the file and adds TODOs where high-level
    comments are missing.
    
    
**GitHub issue to track this work**:
llvm#123630
@banach-space
Copy link
Contributor Author

NEXT STEPS (1):

We need to clarify the meaning of "source" and "destination" types.
Currently the usage is ambiguous.

For example, for this arith.extsi Op, vector<8xi2> and
vector<8xi32> are the "source" and "destination" types, respectively:

  %0 = arith.extsi %arg0 : vector<8xi2> to vector<8xi32>
}

However, patterns like RewriteAlignedSubByteIntExt introduce
vector.bitcast Ops like this:

  %bitcast = vector.bitcast %arg0 : vector<8xi2> to vector<2xi8>

I've noticed that we tend to mix vector<2xi8> and vector<8xi32> as
the destination types and that should be clarified.

NEXT STEPS (2):

With this PR, I am introducing explicit references to "sub-byte" as
that is effectively what this logic is used of (i.e. for emulating
"sub-byte" types). We should either generalise (which would include
increasing test coverage) or restrict everything to "sub-byte" type
emulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants