-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GlobalTypeOptimization: Reorder fields in order to remove them #6820
Conversation
@jakobkummerow Is there any possible downside to Binaryen reordering WasmGC struct fields like this? In C/C++ that could be bad because alignment/padding issues in the C ABI could force larger struct sizes after reordering, but I hope wasm VMs don't have such issues? |
Jakob implemented some heuristic for packing fields in V8 earlier for optimizing memory (users of high level languages doesn't think about this when they declare members, it is handled at VM level). |
@gkdn Ah, good to know. So J2CL doesn't optimize the order itself, that is, this PR would not be undoing some optimal order that you computed? |
// Check for reads or writes in ourselves and our supers. If there are | ||
// none, then operations only happen in our strict subtypes, and those | ||
// subtypes can define the field there, and we don't need it here. | ||
auto hasNoReadsOrWritesInSupers = !sub[i].hasRead && !sub[i].hasWrite; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing that to look at what the supertypes do, we look at sub
. Are there better names we could give things here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe that isn't clearest. The idea is that sub
has data propagated to subtypes, so if there is nothing there, nothing from our supertypes was propagated to us. I renamed to dataFromSupers
.
for (Index i = 0; i < superIndexes.size(); ++i) { | ||
auto superIndex = superIndexes[i]; | ||
if (superIndex == RemovedField) { | ||
if (!removableIndexes.count(i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be slightly clearer to invert the if-else to avoid the negation in the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
// Update |next| to refer to the next available index. Due to | ||
// possible reordering in the parent, we may not see indexes in | ||
// order here, so just take the max at each point in time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this isn't just a matter of looking at the size of the supertype's field vector because we've left tombstone values in it? I wonder if the code would get any simpler if we recorded the new fields as a list of indices into the old fields rather than as a list of indices into the new fields that has to include tombstones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the data structure we need is "given this old index, find me the new index". That guides all the work we do in the updating stage for StructNew/StructGet/StructSet operations, all of which have old indexes and need new ones. So I think it would get more complex to flip this data structure.
Co-authored-by: Thomas Lively <[email protected]>
@kripken Nope J2CL doesn't optimize it as it would be moot after optimizations. |
auto& dataFromSubsAndSupers = dataFromSubsAndSupersMap[type]; | ||
auto& dataFromSupers = dataFromSupersMap[type]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need the "dataFrom" prefix if you want to tighten this up. No strong feeling either way, though.
Thanks for the info @jakobkummerow , great, I'll land this. |
(type $A (sub (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref)))) | ||
|
||
;; CHECK: (type $C (sub $A (struct (field i64) (field v128) (field nullref) (field f64) (field anyref)))) | ||
(type $C (sub $A (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref)))) | ||
|
||
;; CHECK: (type $B (sub $A (struct (field i64) (field v128) (field nullref) (field f32) (field anyref)))) | ||
(type $B (sub $A (struct (field i32) (field i64) (field f32) (field f64) (field anyref) (field v128) (field nullref)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new test seems to have made the fuzzer uncover an existing bug: we don't validate the features used in heap types during validation. When the fuzzer uses this test as initial contents with SIMD disabled, it thinks the module validates, then it ends up generating SIMD instructions to access the v128 fields and failing validation later.
I can fix this as a follow-on to my planned updates to the type gather API, but in the short term, if the fuzzer failures are causing issues, a quick fix would be to change these fields from v128 to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed such an error as well. I believe the issue is that we don't validate the unused parts of rec groups, like here:
(module
(rec
(type $func (func))
(type $unused (sub (struct (field v128))))
)
(func $func (type $func))
)
That v128 is not used in any used type, but the rec group pulls in that type, and then the fuzzer ends up doing something with it. Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to validate the types at all in any central location, used or unused, although I supposed the used types should be validated at the location of their use. That kind of decentralized type validation seems error-prone, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, we just validate in specific places. Another option could be to run collectHeapTypes
and validate them all, but that wouldn't handle v128. Not sure I have a great idea here, but maybe just adding rec type traversal by itself, as I was suggesting, is not the best improvement. I'll open a PR to modify this test for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running collectHeapTypes
is what I had in mind. Why wouldn't that handle v128
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean because v128 isn't a heap type. But I guess you meant that it would be found by traversing heap types + their rec groups? Yes, that can work.
I am hoping we can avoid another pass on the module in validation, though. That would make everything slower, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I was thinking we would call getFeatures
on each heap type (including unused members of used rec groups) and compare the results to the enabled features. This would catch the v128 field requiring SIMD.
We don't properly validate that yet. E.g.: (module (rec (type $func (func)) (type $unused (sub (struct (field v128)))) ) (func $func (type $func)) ) That v128 is not used, but it ends up in the output because it is in a rec group that is used. Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer. Context: #6820. For now, modify the test that uncovered this.
Before, we only removed fields from the end of a struct. If we had, say
If
y
is only used in Bar, but never Foo, then we still kept it around, becauseif we removed it from Foo we'd end up with
Foo = {x, z}, Bar = {x, y, z}
whichis invalid - Bar no longer extends Foo. But we can do this if we first reorder
the two:
And the optimized form is
This lets us remove all fields possible in all cases AFAIK.
This situation is not super-common, as most fields are actually used both
up and down the hierarchy (if they are used at all), but testing on some
large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin,
and 31 in Dart. This might lead to speedups if any of those objects are
constructed many times, but I'm not sure of that.
The NFC change to
src/wasm-type-ordering.h
was needed for this tocompile, and I can't understand why it was needed, but maybe I'm missing
something there?