Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type checking and unreachable blocks #707

Closed
MikeHolman opened this issue Jun 10, 2016 · 20 comments
Closed

Type checking and unreachable blocks #707

MikeHolman opened this issue Jun 10, 2016 · 20 comments
Milestone

Comments

@MikeHolman
Copy link
Member

Feels like there are some corner cases with type checking, and AFAIK there hasn't really been much discussion on it. Right now I am looking at cases like the following pseudo-code for a function which has been defined with an int result type:

(func
  (if condition
    (ret (const 1))
  else
    (ret (const 2))
  )
)

Should this validate? All paths return an int but the if/else doesn't yield a value.

If that should validate, what about this?

(func
  (if condition
    (ret (const 1))
    (const 1.5)
  else
    (ret (const 2))
    (const 1.5)
  )
)

Same type of scenario, but now it's arguable whether we consider that the if/else yields float or doesn't yield. If it yields, then I suppose we fail validation, because that (unreachable) path yields a float.

How are you guys handling this type of scenario today?

@binji
Copy link
Member

binji commented Jun 10, 2016

@rossberg-chromium is probably the best to explain this, but I'll try :)

They simple way I think of it is that there are two extra types, void and any (i.e. bottom and top). return and br always have type any, because they are unconditional control flow. if has a type which is the set intersection of its branch's types. A block always has the type of its last expression.

So in the first example, it validates because the type of the if is any, which includes i32.

In the second example, the two branches have a float type, because the last expression of the (implicit) block is a f32 (say), so the if has the same type. Therefore the function does not validate, because the two type sets (i32 and f32) are disjoint.

@titzer
Copy link

titzer commented Jun 13, 2016

On Fri, Jun 10, 2016 at 9:01 PM, Ben Smith [email protected] wrote:

@rossberg-chromium https://github.com/rossberg-chromium is probably the
best to explain this, but I'll try :)

They simple way I think of it is that there are two extra types, void and
any (i.e. bottom and top). return and br always have type any, because
they are unconditional control flow. if has a type which is the set
intersection of its branch's types. A block always has the type of its last
expression.

So in the first example, it validates because the type of the if is any,
which includes i32.

In the second example, the two branches have a float type, because the
last expression of the (implicit) block is a f32 (say), so the if has the
same type. Therefore the function does not validate, because the two type
sets (i32 and f32) are disjoint.

Ben's right. This is exactly how type checking works in the V8 validator:
bottom up. Unreachable, return, br, and br_table always have a type any,
and any unified with a type T is always type T. Blocks have the type which
is the "unification" of all the br's that target, with the last expression
in a block considered to be an implicit br.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#707 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ALnq1Dn4HC64VujAJco1dm1TeKP7ZTg9ks5qKbSbgaJpZM4IzOWR
.

@rossberg
Copy link
Member

One nit: it's not unification but computing the LUB (at least until we decide for #694).

(And super-nit: I'd personally avoid the names void and any for this, since they usually have the exact opposite meaning, i.e., any as a top type void as a bottom type, whereas here it's the other way round.)

As for the question why the second example does not type-check: it's a very simple and conventional type system. Like most type systems, it only looks at data flow, not control flow. So whether there is an unconditional branch somewhere in a block does not affect the block's type.

@binji
Copy link
Member

binji commented Jun 13, 2016

(And super-nit: I'd personally avoid the names void and any for this, since they usually have the exact opposite meaning, i.e., any as a top type void as a bottom type, whereas here it's the other way round.)

Ha, I knew I'd mess that up. :)

@MikeHolman
Copy link
Member Author

Sounds good to me, thanks for the clarification!

@titzer
Copy link

titzer commented Jun 14, 2016

On Mon, Jun 13, 2016 at 12:27 PM, rossberg-chromium <
[email protected]> wrote:

One nit: it's not unification but computing the LUB (at least until we
decide for #694 #694).

Right, yes. There are no type variables, so it's just walking up a type
lattice.

(And super-nit: I'd personally avoid the names void and any for this,
since they usually have the exact opposite meaning, i.e., any as a top
type void as a bottom type, whereas here it's the other way round.)

As for the question why the second example does not type-check: it's a
very simple and conventional type system. Like most type systems, it only
looks at data flow, not control flow. So whether there is an unconditional
branch somewhere in a block does not affect the block's type.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#707 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ALnq1Fjdd9PiepvSUd9qUeRDhc0tNmjGks5qLTCggaJpZM4IzOWR
.

@Cellule
Copy link

Cellule commented Jun 22, 2016

So whether there is an unconditional branch somewhere in a block does not affect the block's type.

I have run into some issue regarding this from the testsuite.
in block.wast in function break-inner
There's the statement (set_local 0 (i32.add (get_local 0) (block (i32.ctz (br 0 (i32.const 0x4))))))
I fail to see how this is valid with the current state of things.
br 0 does not yield a value, therefore, i32.ctz doesn't have an operand.
The only way I can see this passing validation is if after seeing an unconditional branch, we ignore the rest of the block.

Same goes for function drop-break-value-heterogeneous
Where (block (br 0 (i32.const 8)) (br 0 (f64.const 8)) (br 0 (f32.const 8))) cannot be valid if we check all instructions.
Because there is a conflict in the type the br 0s yields.

@MikeHolman MikeHolman reopened this Jun 22, 2016
@ghost
Copy link

ghost commented Jun 23, 2016

@Cellule Consider (i32.ctz (unreachable)) which is basically the same type case, and this is currently accepted as valid.

@ghost
Copy link

ghost commented Jun 23, 2016

@Cellule oh, and the second example, if the types conflict then the result is the same as for nop, the void type, which is only a validation error if the value is consumed.

@rossberg
Copy link
Member

rossberg commented Jun 23, 2016

Branches, like all unconditional control flow transfer, are simply polymorphic, i.e., they can have any type, in order to e.g. allow exiting from branches whose siblings do yield some value. Same treatment as in all other typed (expression) languages. I suppose we should have an FAQ for that, as this question keeps coming up. ;)

The operands to branches are implicitly dropped when the block targeted has type void. Thus, their types can differ in such cases -- they are all implicitly converted to void. There has been a long history of back & forth on this (#179, #180, #227, #271), but this arguably surprising freedom is removed with #694.

@Cellule
Copy link

Cellule commented Jun 23, 2016

Thanks,
for the different yield types I think I get it and it should be easy to implement.

On the other hand, I am a little fuzzy as to how to implement (i32.ctz (unreachable))
Do I have to check for unreachable for every operator and let it pass ?
Let's say for instance I have the following

(if 
  (i32.add 
    (i32.ctz (br 0)) 
    (f32.const 1.4)
  )
) (set_local 0 (return 0))

Since br 0 makes it unreachable,
then i32.ctz is now unreachable also (meaning to explicitly check for unreachable)
then is (i32.add (unreachable) (f32)) valid ?

Let's say it is valid or we put an i32.const instead of the float.
then is (if (unreachable) (...)) valid ?

I fail to see why any such things should be considered valid.
Is there a real use case for this ? Correct me if I'm wrong, but I doubt any tools we make would produce such patterns (aka produce dead code in the middle of an expression).
Why not just make unreachable a different type and reject the likes of i32.ctz (unreachable) since i32.ctz needs an i32 and unreachable != i32?

It feels like the equivalent of such a thing in c++ for example would be

{
  if (ctz(goto label1) + 1.4f) {
    int a = return 0;
  }
label1:
}

Which looks really silly.

@ghost
Copy link

ghost commented Jun 23, 2016

@Cellule The 'unreachable' type does not propagate through typed operators. So the result type of (i32.ctz (br 0)) is i32 even if it is unreachable - an implementation might propagate the type differently for compilation, but it is i32 for validation. (i32.add (unreachable) (f32)) is also valid (is invalid due to the f32 argument - dead code is still validated). (if (unreachable) (...)) is valid, but if you are asking if dead code is still validated then yes it is.

This choice was probably to support top-down typing too.

Perhaps the disucussion in #654 would also be of interest in which is it proposed that the unreachable operator have the same type as the nop operator and that the result can not be consumed, and this removes this type from wasm. This is a useful simplification in an expressionless encoding.

@rossberg
Copy link
Member

Do I have to check for unreachable for every operator and let it pass ?

Let's say for instance I have the following

(if
(i32.add
(i32.ctz (br 0))
(f32.const 1.4)
)
) (...)

Since br 0 makes it unreachable,
then i32.ctz is now unreachable also (meaning to explicitly check for
unreachable)
then is (i32.add (unreachable) (f32)) valid ?

No, type checking does not consider control flow. Both examples are
ill-typed. I'll add some tests for these case.

Let's say it is valid or we put an i32.const instead of the float.
then is (if (unreachable) (...)) valid ?

Yes, that's valid (though useless). I think there already is a test for
that.

I fail to see why any such things should be considered valid.
Is there a real use case for this ? Correct me if I'm wrong, but I doubt
any tools we make would produce such patterns (aka produce dead code in the
middle of an expression).
Why not just make unreachable a different type and reject the likes of i32.ctz
(unreachable) since i32.ctz needs an i32 and unreachable != i32?

There are infinitely many ways of writing dead or otherwise useless code
with any language construct. ;)

The reason control flow transfer should be polymorphic (instead of, say,
void) is that you want to be able use it in arms of branching constructs
that produce a value, e.g.:

(if (assertion)
  (i32.expression)
  (unreachable)
)

or more interestingly, (the encoding of) switch cases:

(switch (index)  ;; can be encoded with br_table
  case 0: (i32.expr)
  case 1: (i32.expr)
  case 2: (i32.expr)
  case 3: (i32.expr)
  default: (unreachable)
)

If control flow transfer was typed void then you'd be forced to insert dead
dummy values (and sometimes spurious blocks) for these cases, which should
be unnecessary.

IOW, the point of these typing rules is not to prevent dead code, but to
avoid requiring dead code.

FWIW, typing control flow transfer constructs polymorphically is completely
standard in typed expression languages. The most popular example probably
is throw, which is a polymorphic expression in all such languages I'm
aware of.

@naturaltransformation
Copy link
Contributor

Since some of these type semantics issues have been confusing multiple people and the only "documentation" of the WASM type system is the spec implementation, would people be interested in a prose explanation of the type system?

@naturaltransformation
Copy link
Contributor

The prose can also go into a discussion of the motivation in attempt to clarify things.

@jfbastien
Copy link
Member

@rossberg-chromium was the design repo ever updated to clarify this? I don't see a commit but may have missed it.

@distransient
Copy link

I left open #747 on these things

@MikeHolman
Copy link
Member Author

@jfbastien You're right, I was a bit hasty in closing this.

@MikeHolman MikeHolman reopened this Oct 10, 2016
@rossberg
Copy link
Member

@jfbastien, see the description of WebAssembly/spec#345 for the resolution we reached regarding unreachable code. The design repo is generally (and IIUC, somewhat intentionally) very vague about validation, so not sure where specific clarifications would fit in there exactly. I believe we agreed that the written spec will be the appropriate place to lay out the details. But everybody has been busy with catching up to 0xc, so not much progress on that front...

@rossberg
Copy link
Member

rossberg commented Feb 7, 2017

This is old and has been superseded by more recent discussion.

@rossberg rossberg closed this as completed Feb 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants