New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GlobalTypeOptimization: Reorder fields in order to remove them #6820

Merged

kripken merged 29 commits into WebAssembly:main from kripken:gto.reorder

Aug 12, 2024

Member

kripken commented Aug 7, 2024

Before, we only removed fields from the end of a struct. If we had, say

struct Foo {
  int x;
  int y;
  int z;
};

// Add no fields but inherit the parent's.
struct Bar : Foo {};

If y is only used in Bar, but never Foo, then we still kept it around, because
if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which
is invalid - Bar no longer extends Foo. But we can do this if we first reorder
the two:

struct Foo {
  int x;
  int z;
  int y; // now y is at the end
};

struct Bar : Foo {};

And the optimized form is

struct Foo {
  int x;
  int z;
};

struct Bar : Foo {
  int y; // now y is added in Bar
};

This lets us remove all fields possible in all cases AFAIK.

This situation is not super-common, as most fields are actually used both
up and down the hierarchy (if they are used at all), but testing on some
large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin,
and 31 in Dart. This might lead to speedups if any of those objects are
constructed many times, but I'm not sure of that.

The NFC change to src/wasm-type-ordering.h was needed for this to
compile, and I can't understand why it was needed, but maybe I'm missing
something there?

kripken added 25 commits

August 1, 2024 15:59


          reorder

d95bbd9

sus

bcb8d3d

bet

26fb4a4


          done

44140b4


          totally.done

376d243

fix

1fe9f78


          buildz

fd207e5

fix

1a996b8


          work

171f26e


          test

ce7850c


          test

17c69e8


          test

1bcd12c

bug

adcb5f3

fix

1040afd

fix

914b2a0

fix

526ff0d

fix

fd18fa0


          test

66197e7


          Merge remote-tracking branch 'origin/main' into gto.reorder

8f69598


          assert

2b0fa09


          work

d4041fd


          Merge remote-tracking branch 'origin/main' into gto.reorder

4ecd9c2


          undo

8cc4340


          notes

3f5eaa1


          comments

82a4f14

kripken requested a review from tlively

August 7, 2024 18:54

Member Author

kripken commented Aug 7, 2024

@jakobkummerow Is there any possible downside to Binaryen reordering WasmGC struct fields like this? In C/C++ that could be bad because alignment/padding issues in the C ABI could force larger struct sizes after reordering, but I hope wasm VMs don't have such issues?

Contributor

gkdn commented Aug 7, 2024

@jakobkummerow Is there any possible downside to Binaryen reordering WasmGC struct fields like this? In C/C++ that could be bad because alignment/padding issues in the C ABI could force larger struct sizes after reordering, but I hope wasm VMs don't have such issues?

Jakob implemented some heuristic for packing fields in V8 earlier for optimizing memory (users of high level languages doesn't think about this when they declare members, it is handled at VM level).

Member Author

kripken commented Aug 7, 2024

@gkdn Ah, good to know. So J2CL doesn't optimize the order itself, that is, this PR would not be undoing some optimal order that you computed?

tlively reviewed

View reviewed changes

src/passes/GlobalTypeOptimization.cpp Outdated

+                      // Check for reads or writes in ourselves and our supers. If there are
+                      // none, then operations only happen in our strict subtypes, and those
+                      // subtypes can define the field there, and we don't need it here.
+                      auto hasNoReadsOrWritesInSupers = !sub[i].hasRead && !sub[i].hasWrite;

Member

tlively Aug 7, 2024

It's a little confusing that to look at what the supertypes do, we look at sub. Are there better names we could give things here?

Member Author

kripken Aug 7, 2024

Yeah, maybe that isn't clearest. The idea is that sub has data propagated to subtypes, so if there is nothing there, nothing from our supertypes was propagated to us. I renamed to dataFromSupers.

src/passes/GlobalTypeOptimization.cpp Outdated

+                        for (Index i = 0; i < superIndexes.size(); ++i) {
+                          auto superIndex = superIndexes[i];
+                          if (superIndex == RemovedField) {
+                            if (!removableIndexes.count(i)) {

Member

tlively Aug 7, 2024

It would be slightly clearer to invert the if-else to avoid the negation in the condition.

Member Author

kripken Aug 7, 2024

Sure, done.

src/passes/GlobalTypeOptimization.cpp

Comment on lines +297 to +299

+                            // Update |next| to refer to the next available index. Due to
+                            // possible reordering in the parent, we may not see indexes in
+                            // order here, so just take the max at each point in time.

Member

tlively Aug 7, 2024

I guess this isn't just a matter of looking at the size of the supertype's field vector because we've left tombstone values in it? I wonder if the code would get any simpler if we recorded the new fields as a list of indices into the old fields rather than as a list of indices into the new fields that has to include tombstones.

Member Author

kripken Aug 7, 2024

Overall the data structure we need is "given this old index, find me the new index". That guides all the work we do in the updating stage for StructNew/StructGet/StructSet operations, all of which have old indexes and need new ones. So I think it would get more complex to flip this data structure.

src/passes/GlobalTypeOptimization.cpp Outdated Show resolved Hide resolved

kripken and others added 4 commits

August 7, 2024 15:49


          rename

b00e7c1


          format

13f18f1


          reorder to slightly simplify

f88f6b0


          Update src/passes/GlobalTypeOptimization.cpp

3ff0dd4

Co-authored-by: Thomas Lively <[email protected]>

Contributor

gkdn commented Aug 7, 2024

@kripken Nope J2CL doesn't optimize it as it would be moot after optimizations.

tlively approved these changes

View reviewed changes

src/passes/GlobalTypeOptimization.cpp

Comment on lines +180 to +181

    
                    auto& dataFromSubsAndSupers = dataFromSubsAndSupersMap[type];

                    auto& dataFromSupers = dataFromSupersMap[type];

Member

tlively Aug 8, 2024

We probably don't need the "dataFrom" prefix if you want to tighten this up. No strong feeling either way, though.

Contributor

jakobkummerow commented Aug 9, 2024

Just to confirm: as @gkdn said, V8 chooses the physical offsets of struct fields to optimize packing density, so I have no size concerns here.

(If you're curious about the details, see here. Since objects on our heap are only 4-byte aligned, remembering one gap is enough.)

Member Author

kripken commented Aug 12, 2024

Thanks for the info @jakobkummerow , great, I'll land this.

kripken merged commit a4f9128 into WebAssembly:main

13 checks passed

kripken deleted the gto.reorder branch

August 12, 2024 19:30

tlively reviewed

View reviewed changes

test/lit/passes/gto-removals.wast

Comment on lines +1036 to +1042

+                  (type $A (sub (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref))))
+                  ;; CHECK:       (type $C (sub $A (struct (field i64) (field v128) (field nullref) (field f64) (field anyref))))
+                  (type $C (sub $A (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref))))
+                  ;; CHECK:       (type $B (sub $A (struct (field i64) (field v128) (field nullref) (field f32) (field anyref))))
+                  (type $B (sub $A (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref))))

Member

tlively Aug 15, 2024

This new test seems to have made the fuzzer uncover an existing bug: we don't validate the features used in heap types during validation. When the fuzzer uses this test as initial contents with SIMD disabled, it thinks the module validates, then it ends up generating SIMD instructions to access the v128 fields and failing validation later.

I can fix this as a follow-on to my planned updates to the type gather API, but in the short term, if the fuzzer failures are causing issues, a quick fix would be to change these fields from v128 to something else.

Member Author

kripken Aug 15, 2024

I just noticed such an error as well. I believe the issue is that we don't validate the unused parts of rec groups, like here:

(module
 (rec
  (type $func (func))
  (type $unused (sub (struct (field v128))))
 )
 (func $func (type $func))
)

That v128 is not used in any used type, but the rec group pulls in that type, and then the fuzzer ends up doing something with it. Does that sound right?

Member

tlively Aug 15, 2024

We don't seem to validate the types at all in any central location, used or unused, although I supposed the used types should be validated at the location of their use. That kind of decentralized type validation seems error-prone, though.

Member Author

kripken Aug 15, 2024

Yeah, agreed, we just validate in specific places. Another option could be to run collectHeapTypes and validate them all, but that wouldn't handle v128. Not sure I have a great idea here, but maybe just adding rec type traversal by itself, as I was suggesting, is not the best improvement. I'll open a PR to modify this test for now.

Member

tlively Aug 15, 2024

Running collectHeapTypes is what I had in mind. Why wouldn't that handle v128?

Member Author

kripken Aug 15, 2024

I mean because v128 isn't a heap type. But I guess you meant that it would be found by traversing heap types + their rec groups? Yes, that can work.

I am hoping we can avoid another pass on the module in validation, though. That would make everything slower, unfortunately.

Member

tlively Aug 15, 2024

Oh yeah, I was thinking we would call getFeatures on each heap type (including unused members of used rec groups) and compare the results to the enabled features. This would catch the v128 field requiring SIMD.

kripken mentioned this pull request

[NFC] Avoid v128 in rec groups with no other v128 uses #6843

Merged

kripken added a commit that referenced this pull request


          [NFC] Avoid v128 in rec groups with no other v128 uses (#6843)

c2b4380

We don't properly validate that yet. E.g.:

(module
 (rec
  (type $func (func))
  (type $unused (sub (struct (field v128))))
 )
 (func $func (type $func))
)

That v128 is not used, but it ends up in the output because it is in a rec group that is used.
Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer.

Context: #6820. For now, modify the test that uncovered this.

gkdn mentioned this pull request

stringconsts gkdn/binaryen#1

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet