Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EH] Fix pop enclosed within a block in DCE #6922

Merged
merged 10 commits into from
Sep 11, 2024
Merged

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Sep 10, 2024

#6400 fixed this case but that handled only when a pop is an immediate child of the current expression, but a pop can be nested deeper down.

We conservatively run the EH fixup whenever we have a pop and create blocks in the optimization. We considered using FindAll<Pop> to make it precise, but we decided the quadratic time plexity was not worth it.

Fixes #6918.

 WebAssembly#6400 fixed this case but that handled only when a `pop` is an
immediate child of the current expression, but a `pop` can be nested
deeper down.

Fixes WebAssembly#6918.
@aheejin aheejin requested a review from kripken September 10, 2024 00:44
@@ -110,7 +107,8 @@ struct DeadCodeElimination
replaceCurrent(remainingChildren[0]);
} else {
replaceCurrent(builder.makeBlock(remainingChildren));
if (hasPop) {
if (getModule()->features.hasExceptionHandling() &&
!FindAll<Pop>(curr).list.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little worried about quadratic behavior here. Imagine

(block (result i32)
  (block (result i32)
    ..
      (block (result i32)
        (use (pop))
        (unreachable)
      )
    ..
  )
)

I think we'd do FindAll on each of those blocks as we turn them unreachable, in post-order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was worried about that too... Do you have other suggestions to fix this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the new EH (exnref) is more common, it would be nice if we are able to detect whether we are using the legacy EH and run this only for them...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, one option might be to just note if we ever see a Pop, and if we ever optimize a control flow structure. If so we can run EHUtils::handleBlockNestedPops at the end.

We could also make that a little more precise by tracking if the Pop is actually in an optimized control flow structure, but I'm not sure how easy that would be, or if it would be worthwhile. Might be worth measuring.

About legacy EH, good point, we can also track if we ever see a legacy try-catch (or can a Pop only ever be there?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, one option might be to just note if we ever see a Pop, and if we ever optimize a control flow structure. If so we can run EHUtils::handleBlockNestedPops at the end.

Yeah I think we can make something like

binaryen/src/pass.h

Lines 477 to 484 in 9d3f8e5

// Whether this pass modifies Binaryen IR in ways that may require fixups for
// non-nullable locals to validate according to the wasm spec. If the pass
// adds locals not in that form, or moves code around in ways that might break
// that validation, this must return true. In that case the pass runner will
// automatically run the necessary fixups afterwards.
//
// For more details see the LocalStructuralDominance class.
virtual bool requiresNonNullableLocalFixups() { return true; }

for EH fixups. (The way that non-local fixup is done conservative in way that unless we find a reason to not run the fixup, we run it) I actually tried this a while ago for the EH fixup but was kind of overwhelmed by the number of passes and dropped it, but if we make the criteria as simple as what you suggested, 1. Do we use pop 2. Do we run control flow opts, it may not take that long time.

Also I think it is worthwhile to consider only try-catch (not catch_all), given that pop is only generated with catch and catch is a far less common that catch_all. catch is generated only when a user explicitly use try-catch clause in C++. I'm not sure about other languages but I guess it would be similar. But can this info be considered in a pass-level method like requiresNonNullableLocalFixups?

We could also make that a little more precise by tracking if the Pop is actually in an optimized control flow structure, but I'm not sure how easy that would be, or if it would be worthwhile. Might be worth measuring.

How can this be done? Can this be done without quadratic time complexity?

Copy link
Member

@kripken kripken Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of requiresNonNullableLocalFixups() is that when it is true then the pass runner does extra work after the pass, every time. So even if the pass does nothing, or doesn't change a local, it still happens. Those fixups are fast enough that it isn't too bad, but in general it's better for rarely-needed fixups to be done in the pass. At least when there's a good way for the pass to detect when they are needed, that is more efficient I think.

How can this be done? Can this be done without quadratic time complexity?

Using ControlFlowWalker we have a stack of control flow structures, and then when we visit a Pop we can mark all of them as containing a nested Pop. If the number of pops is O(1) then this would be O(N) overall?

But if there can be many Pops, maybe it isn't worth it, and we can just run the fixups if we've ever seen a Pop + ever optimized.

Copy link
Member Author

@aheejin aheejin Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be done? Can this be done without quadratic time complexity?

Using ControlFlowWalker we have a stack of control flow structures, and then when we visit a Pop we can mark all of them as containing a nested Pop. If the number of pops is O(1) then this would be O(N) overall?

Not sure if I understand. Do you mean we should create a ControlFlowWalker for all control-flow changing operations (which some of them already use but not all do) and check the stack every time we make control flow changing modifications? Wouldn't it be O(height of control flow stack * # of expressions that need control flow changes)?

But if there can be many Pops, maybe it isn't worth it, and we can just run the fixups if we've ever seen a Pop + ever optimized.

Is there a convenient way to gather the list of control flow changing passes? Passes using ControlFlowWalker can be a good indication but not all of them change control flow and also there are others that don't use it change control flow (like DCE).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should create a ControlFlowWalker for all control-flow changing operations (

Sorry, no, I meant that the main class could be changed from PostWalker to ControlFlowWalker. So we would update the stack of control flow structures as we go, constantly.

Is there a convenient way to gather the list of control flow changing passes?

Hmm, I don't think so. Each pass would need to be investigated specifically.

But I don't think we need a broader change here? Just fixing this pass should be enough. Unless I'm missing your point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought we were talking about the whole binaryen-level solution... (which may be necessary at some point) Yeah, fixing this pass can be simple I think. I just made it to run the fixup once we see a pop in a function. PTAL.

@@ -110,11 +115,6 @@ struct DeadCodeElimination
replaceCurrent(remainingChildren[0]);
} else {
replaceCurrent(builder.makeBlock(remainingChildren));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only danger with a Pop is the new block made here, so I suggest that we set a boolean addedBlock to true in this location. And we can rename needEHFixups to hasPop. Then below, the check if (needEHFixups) can be if (hasPop && addedBlock), which will be a little more precise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@aheejin aheejin merged commit 432ed1c into WebAssembly:main Sep 11, 2024
13 checks passed
@aheejin aheejin deleted the dce_pop branch September 11, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fuzz bug with DCE on pop moved into block in old EH
2 participants