-
Notifications
You must be signed in to change notification settings - Fork 40
Make funcref not a subtype of anyref #69
Comments
To add to this, one change in my understanding of the constraint space is that, in earlier design thinking, there was an expectation that Thus, in the short term, I think we should avoid unnecessarily committing to The win from not unnecessarily requiring |
@lukewagner, can you explain under which scenario you would be able to allow different tagging strategies for function references but not different sizes? Because I see only two possibilities: either the universes of funcrefs and other refs are always statically distinguishable, in which case they can also have different sizes. Or there are cases where you cannot tell them apart statically, in which case you'll also need compatible tagging schemes. So, AFAICS, to allow different representations you have to create two properly disjoint type hierarchies. That is certainly possible, but is a less trivial change than just removing the Another question is how this interacts with the host. You can pass Wasm functions to JS and JS functions to Wasm. Both would necessarily become a coercive operation. That works in the first-order case, but does not compose. E.g. once you want to allow Typed Values over functions in JS you'd have to distinguish e.g. arrays of JS functions from arrays of Wasm functions. That is, you would need to introduce two disjoint universes of function types for Typed Values. (Maybe you're already assuming this?) The last time separating funcrefs from anyref was discussed (I think at the Lyon meeting) the consensus was that the added language complexity was not worth it for the time being, and that we can always add a second form of funcref type later if there is evidence that it is sufficiently useful. Btw, regarding uniform representations, I think we have thrown up some vague ideas for possible alternatives, but AFAICT nobody has worked them out and they'll likely be beyond MVP level. It seems premature to assume that we won't need anyref. If we divorced funcrefs then additional boxing may become necessary in client code (which may not be a problem, however). |
Yeah, we discussed the issue with null too, concretely discussing how a universal |
I guess it boils down to the question: do we reconsider "fat function refs" to be an MVP feature or can we add it later as a separate type? If the former, I agree that we should do so quickly, so that we can move forward at the meeting. |
Well, there's another question: what are the pressing use cases of having funcref be a subtype of anyref in the MVP? (Use cases that aren't addressed by having funcref be convertible to anyref.) |
@RossTate, I don't know about pressing, but I can think of a number of softer reasons: it makes for a somewhat simpler language, gives more seamless host interop (at least on the web and in the C API), avoids redundant annotations in element segments (for which we may want to come up with a solution otherwise), and of course, avoids late phase churn for a number of implementations. |
I disagree on the simpler language argument. People argued for F-bounded polymorphism on the same basis that having everything be done through subtyping would be simpler. Instead they got undecidability and unnecessary complexity that crippled tooling. As for late-phase churn, this should only require a change in the type-checker, and I imagine removing a subtyping is fairly straightforward. Can you go into more depth on one of the other reasons? |
@RossTate, it requires changes/extensions to the instruction set, see ref.null, and the type language, see nullref. It also requires observable changes to current APIs, where some of these are reflected. For the C API, for example, the changes would go deeper, because that currently models a simple uniform ref space, and externval's are just refs. Removing that possibility (as opposed to just complementing it) requires a redesign of that part, replacing it with something more complicated. Can all be done, no question. But clearly not free or cheap. I'd be interested in hearing from @lukewagner how important he considers having this for the first version, and whether it should fully replace or merely amend the current design. (I don't follow which similarity to F-bounded quantification you see.) |
F-bounded polymorphism is essentially a way of encoding a limited form of type classes into subtyping. In particular, Of course, the actual use case this was meant to address, i.e. the binary-method problem, could have just as easily been addressed by introducing a new kind of type constraint. This So the analogy is that the subtyping |
I'm aware of the risks of F-bounded quantification if taken too far, but come on, there is no similarity with the issue at hand. Its impact on the type system and its meta theory is negligible either way. If anything, establishing disjoint type hierarchies is a marginal complication to the meta theory, since you'd be under obligation to prove a little lemma that verifies that property. I have given you reasons above, like fewer and simpler instructions, simpler and more uniform host APIs, etc. Nothing earth-shattering, but arguably an appropriate default for an MVP, unless there are crucial problems that the MVP has to address or it precludes later addition. Evidence for the former would be it being a roadblock for engines. |
@rossberg (sorry for the slow reply) If a type is imported with no bounds: (module
(import "file" "handle" (type $Handle))
(func $f (param (ref $Handle)) ...)
) then On a side note: if we wanted to allow different byte widths without recompilation, we could perhaps always require type imports to have a bound, and so you'd always know if you had an |
@lukewagner, not sure that could work in general. Depending on how much code generation in a given engine has to know about GC, it probably will need to know what kind of tagging e.g. a local of that type uses, e.g., to generate suitable stack frame descriptors, or inlined bits of GC code, etc. It's safer to assume that a compiler always needs to know the representational interpretation of any value it has to handle. But in any case, the type import proposal already defines what you suggest: all imports have a bound -- given subtyping, there's no advantage in having an unbounded form. |
@rossberg Ah, in that case, then I agree that, if |
We discussed this in the Feb 2020 CG meeting. We didn't come to any conclusion, but there was some agreement that breaking the subtype relationship between |
Given that multiple ecosystems (Emscripten, Rust, WASI) eagerly want anyref support but don't particularly need funcref right now, I would like to either resolve the subtype question quickly or revisit @fitzgen's idea of splitting the proposal so that anyref can move forward alone without being held back by this issue. I understand that the semantics would be ugly in the interim, but it would be shame to hold up these ecosystems for a temporary cosmetic issue. Of course, there would be much less overhead if we could agree to a solution to the subtype question soon and not have to split the proposal. I lean heavily toward not having a subtype relationship between funcref and anyref right now since it is both the simpler conservative option and because I do not know of any immediate usecase for that subtype relationship. Having type-parameterized nullrefs sounds perfectly acceptable to me, since it will make the tooling simpler anyway. I am also not too worried about the size impact for table initialization since a sequence of identical parameterized nullrefs sounds like it will compress easily. |
I believe I've now realized another problem with the "any" notion of Unfortunately, there seems to be a problem when |
Can you dig into that a bit and explain why exnref being a subtype of anyref means that some exnref values can't be reference counted? |
Because then my code can hand off an |
@RossTate, the possibility of reference counting is a fallback for engines that do not have GC, e.g., ones that do not run within JS and neither intend to implement the GC proposal. An engine that already incorporates GC refs wouldn't be using reference counting for exnrefs. |
We could debate that detailed point, but I think the higher-level point stands regardless: there is utility in enabling engines/programs to keep internal and external references separate. |
I'm not sure what you have in mind. Exnref's can be passed to the host like any other reference. In a JS environment there is not much choice but to manage all of them with proper GC. |
I could imagine a non-JS engine that would want to allow users to disable its GC support (maybe to reduce the code size footprint and memory usage in an embedded context). Such an engine would want to keep as much functionality enabled without GC as possible, so it might very well choose to use reference counting for exnrefs, and for simplicity it might want to use reference counting for exnrefs even when GC is enabled. I don't think this is an outlandish scenario, and I have yet to see a use case for the subtype relationship between exnref and anyref, so I support decoupling these as well. |
@tlively, exnrefs can point to GC refs and vice versa. Once you have GC refs, exnrefs can indeed participate in cycles, unlike without. So no, a heterogeneous implementation would be very outlandish. |
After some follow-up discussion with Lars and others, I'd like to drop my earlier objection and say that I'm fine with the proposal as-is. In my first comment, I observed that, if (I do think it would be valuable to have some value type that can simply hold an unboxed raw code pointer (without any fancy thunky optimizations), but, independent of whether |
OK, let's try to move this forward. It sounds like at least for Luke, Lars and others there's less incentive to change the proposal. Let's discuss this at the next CG meeting. |
In preparation for the meeting, here is a summary of my arguments for removing subtyping:
Lastly, regarding pragmatics, this change is easy to implement: comment out the implementation of |
I think, fundamentally, the argument for the proposed subtyping is based on a vision for |
I think a lot of us are stuck on this point currently. In the meeting today @rossberg mentioned how much work this would require to the spec, and mentioned that the work required for downstream proposals is unclear. I think it would be valuable for us to try our best to evaluate what kind of changes would be required for proposals that depend on this change: at least exception-handling, bulk-memory, function-references, type-imports, interface-types, wasm-c-api, and WASI AFAIK. |
I see, your examples make sense to me now. Having started reading the iTalX paper, I also see that this is a problem if you are trying to do the deferred type inference approach used in that project. I will have to think more about how we could fit deferred type inference into the WebAssembly LLVM backend and how much functionality it could buy us.
FWIW, the |
Too much back and forth in this thread to answer everything individually, but I'll try to clarify a few high-level points (most of which is just reiterating what I said in the meeting yesterday, also see slides for some additional details).
The addition of subtyping and anyref is part of a larger design that we ended up splitting into several proposals to be able to ship incrementally, like function references and type imports. Basic subtyping ended up in this proposal mostly for technical reasons (cleanest way for slicing the semantics into increments), so it's admittedly a bit less obvious why we have it. But downstream use cases like type imports or typing of reference equality, as currently proposed, fundamentally rely on both.
As others have pointed out as well, the change proposed here is not small, because it has a number of design implications on this proposal, as well as various others. Across all affected implementations, tools, and downstream proposals, I'd estimate it is in the order of several man weeks to man months of work. That is a high cost for delaying a decision that we will have to resolve in a couple of months anyway. And we would be introducing some permanent design warts on the way. For example, the need for type-indexed null values (which are awkward from a semantic perspective, e.g., because (null T) and (null U) are seemingly different values, but if T <: U they must be treated as if they were the same).
For something large and in practical use, there is no alternative to an incremental design approach. So far, Wasm has been successful with that. That means that, even when trying to stay conservative as much as possible, we have to commit to certain design choices at some point. Of course, that always involves some amount of uncertainty and the risk of introducing design mistakes at some step. We had a few ones in the past, but so far have been able to work around them successfully. It's clear that we'll need subtyping. If not now, then 1-2 months down the road, with function references (which are about ready to be advanced to stage 2 or 3). It seems very unlikely that we are dropping subtyping altogether, given that no concrete alternative has been proposed for all its uses, and there is plenty of pressure to move forward. The concrete subtyping rules in this proposal are very standard and an absolute bare minimum, they are even orthogonal to structural vs nominal. I haven't seen any convincing evidence that they impose a concrete risk, given the explicit nature of typing in Wasm. (In particular, any form of inference, if desired, will be confined to a tooling stage. That means that it is operating on a different language anyway, with a different type system. It can just as well choose to restrict it or add other annotations in any way it sees fit.)
It's been a basic assumption from the start that heap references support a unified representation in a Wasm engine. That's a very reasonable assumption, too, given that all existing Wasm engines as well as the vast majority of other GC runtimes employ it -- off-hand, I wouldn't know any prominent counter example. Such an assumption simplifies various design aspects as well as interfacing with an engine (e.g., through the C API). It is essential if you want to support compilation against imported reference types without making arbitrary design decisions about which reference types are allowed. This does not preclude the later addition of specialised flat/raw/unboxed reference types! But even in their presence, you'd still want the uniform variant. Hence, for the MVP, starting with just that is the obvious choice.
As one instance of staying conservative, we do not extend subtyping to functions yet. That leaves headroom for different choices downstream. If we find that we can't efficiently implement dynamic type checks modulo subtyping (which is not unlikely) then we impose suitable static restrictions. One known approach would be to distinguish between subtypable and exact function types (an approach that @RossTate is aware of, since it's used in his own paper). Either way, we obviously need to ensure that the semantics is coherent, i.e., that any subtyping rules are the same at validation, link and run time. |
Thanks for laying that all out so clearly, @rossberg!
As far as I can see, type imports only depends on funcref <: anyref insofar as it is desirable to be able to have effectively unbounded imports. I can see that this would be elegant, but it's not clear to me that it is useful. I don't want to litigate this trade off here, but please let me know if I am missing some subtlety that makes funcref <: anyref more important. Also, can you point me to discussion or specification of the proposed reference equality? I haven't seen anything on that.
The argument about existing Wasm engines biases heavily toward web engines, which might make different choices about their representations because they already support JavaScript objects. I could easily see a non-web engine wanting to support anyref and funcref but not wanting to support the full GC proposal. If that's a situation we want to encourage, we may not want to limit ourselves to precedents in GC runtimes.
True, but if they turn out to be sufficient for real use cases, it would be much better to have only the unboxed reference types than have to duplicate everything to have it both ways. And if real languages need both after all, it would be better to start with the one that is more immediately useful. We don't know any immediate use cases for subtyping relationship, so unless that changes we should start with the unboxed references with no subtyping relation.
I can see why for spec elegance, but can you point to a real language that uses this for which an explicit boxing operation would be much more difficult to use? |
The reason is avoiding bias. If we can help it, don't bake in arbitrary decisions about which types are reasonable as imports and which ones aren't. At the meeting, we already got into the discussion about exnref, and I expect the same to occur for others. Ultimately deciding for each type separately would just be random design choices akin to premature optimisation. FWIW, I can imagine use cases for abstracting function types, such as capabilities (we actually use functions as capabilities at Dfinity). Sure, you can always work around that with additional wrapping/converting, but that's an extra cost.
It was in this proposal originally. After we voted to defer it, it was moved to the Future Extensions section of the overview. Currently a bit orphaned. Not sure whether we want to make it its own proposal eventually or incorporate it in one of the others. But clearly, support for reference equality will be needed at some point.
Fair point, but even without full GC, engines implementing the full Wasm API will usually need some form of reference counting scheme for function references, as for others.
They both are equally useful. But the boxed one is somewhat more flexible. The only advantage of the unboxed one is a hypothetical optimisation in engines for which there isn't any precedence and a very low likelihood that the current generation of implementations would actually implement it. For an MVP, it makes sense to start with the simpler design, and avoid the warts, open questions, and other costs we have been discussing.
An explicit boxing operation would have to be assumed to involve an allocation. So a compiler would want to avoid repeating it on the same object. Consequently, any object that it needs in boxed form in some contexts, the compiler would want to keep it around in boxed form. But if there is no type that represents a boxed function specifically (as opposed to just an anyref), then it would in turn need a downcast check every time it wants to access it. IOW, without a type representing the boxed form, your only choice when using that is between allocation overhead (on every injection) vs casting overhead (on every projection). |
To turn attention back to the question at hand, I'd summarise as follows. If everything else was being equal, I would totally agree that cutting subtyping from this proposal would be the right move. But not everything else is equal (which is why we included it in the first place). So the question boils down to one of cost vs benefit. And the costs of cutting it are concrete, both short-term and long-term. Inversely, the benefit is completely hypothetical, and achievable by other means. |
@rossberg You haven't factored in "risk", viz., that we commit to some design choices now that we regret later. Until we've fully fleshed out (implemented, generated, widely understood) subtyping in more forms than |
Following up on that point:
To be fair, the benefits of the subtyping rules in contention are also completely hypothetical. Implicit in your evaluation has been an assumption that these subtyping rules will inevitably be adopted. I totally appreciate the philosophical principle behind this assumption, as I have been there myself. After all, every language designer aspires to a grand unifying principle for their creation. But it is also the case that this principle inevitably clashes with performance. One of a designer's most significant and difficult decisions is how much they choose to compromise uniformity for the sake of efficiency. It would be quite surprising and amazing for WebAssembly to not face the same trade-off, given how much difference we all know something as small as a single bit can make. So, as nice as this principle is, what has surprised me is how little alternatives to top- One such assertion was that garbage-collected languages will need a top- Another assertion regarded type imports. In fact, there were two assertions, each worth breaking down separately. One was that the current plan avoids bias with respect to which types can be im/exported. But it most certainly is biased as it does not even consider numeric types. This is odd considering that the only value types for C/C++ programs, the primary user base for WebAssembly, are numeric types. And if we consider WASI, capabilities can easily be encoded as integer handles, and by exporting those handles as an abstract type (or as multiple abstract types for different kinds of handles), WASI can furthermore ensure that they cannot be forged (provided the above issue with But let's put the issue of numeric types aside and move on to the second assertion about type imports, which is that separate compilation requires imported (reference) types to have the same representation, i.e. top- Hopefully the above demonstrates that there are alternative options and that there is reason to believe that top- |
In the last CG meeting, we talked about how it might be useful to link to relevant CG meeting notes, so here are 3 such links in case it helps: |
@lukewagner, indeed, risk is a factor to. But that goes both ways. From were I stand, the risks of making this change (such as poorly understood consequences on other proposals) is higher and more concrete than the other way round.
I think we have repeatedly concluded that this is not the case here, because flat pointers (and other specialised features) can be introduced later. You keep making lots of assertions about what Wasm implementations could do, but little of that bears any relation to what existing implementations actually do today. So while various features could possibly reap some performance benefits in a next-gen engine, there is little reason to expect that the current engines could easily exploit them. So no benefit in making them an MVP feature. |
@rossberg If we remove subtyping we know exactly what the consequences are; it's hard to call them "risks", they're just fixed "costs". "Risk" refers to the fact that we do not have a complete picture of subtyping at this point, and we won't until we've put "pressure" on subtyping via Type Imports and Function References. |
A bold statement. ;) I for one don't. How can we adapt the C API? What design implications will type-indexed null values, which are uncommon, have in the future? Do we get the format of the new immediates right? It's folklore wisdom that last minute design changes are a favourite source of errors and unforeseen consequences. |
I think we should hold off on committing to a stable C API until we know more about subtyping. Until then, embeddings are already doing something now and can keep doing so. I've got to chuckle a bit at comparing the unknowns for the |
One down side to removing subtyping is that it creates the need for two instructions, |
There will be an infinite set of different nullable types and hence null values. Consequently, it would not scale to add multiple null instructions. Instead, we'd add one type-indexed null. That's still a wart, because it doesn't mesh well with subtyping on that index type. For example, although technically different, None of this is related to defaulting. As for the default instruction, there is some confusion here on multiple levels. First, it cannot change anything about imports. You can already emulate its behaviour via auxiliary functions that return their default-initialised locals, so it wouldn't add any new expressiveness that magically provides something new for imports. But I also don't see any problem with imports. Nullability is a property of ref types, not type definitions: a local of type |
Your first paragraph seems to be about formalization/specification. The formalization research community offers multiple ways to specify this pattern. The one you give is one of them, and is not considered to be particularly unnatural (it just says that values are (co)variant with respect to subtyping, the natural analog to type-level considerations). But if you don't like the idea of the same constant having multiple representations, which I understand, then you can say that As for the second paragraph, I realized I missed a step. From my earlier comment, I still had in my mind type imports rather than just reference type imports. So yeah, filling in that disconnect, it makes sense why reference type imports has no discussion of defaultability; sorry for being confusing there. But if ever wasm does add true type imports, then it will need a notion of defaultability (since types like But focusing on the here and now, do the |
How is there a minimal type in an open subtype hierarchy? The type IDK if there is any good use case for a default instruction. Arguably, defaulting is primarily a hack for initialisation, out of necessity, not a particularly desirable feature in general. At least I have never heard anybody asking for it. |
In investigating the rationale of the design, I have had multiple people explain that they wanted a direct way to construct and test for the default value. That seems to be the general pattern. Right now that pattern is being served by providing separate instructions for each type with a default value. But given that default values are a core feature of WebAssembly, whether just out of necessity or not, it seems another reasonable way to address that pattern would be to have general-purpose instructions for constructing and testing for default values. |
Adding link to most recent discussion for reference: April 21 |
We had a poll at the Apr 28th meeting, with the following results (where SF represents strongly in favor of removing subtyping, and SA represents strongly against removing subtyping): SF: 7 At least one member of the group voted SA because there was no discussion during the meeting. We had discussed this topic at previous meetings, but none at this meeting. Because the poll was at the end of the meeting, we weren't able to succinctly state a conclusion. I think this poll does offer some clarity, still. We should take this to mean that the group as a whole slightly favors removing subtyping, and we should proceed accordingly. I apologize that this poll was a little haphazard. Please feel free to reach out to me to discuss any concerns you have about this decision. |
Per the vote on #69, this PR removes subtyping from the proposal. List of changes: * Syntax: - remove `nullref` type - rename `anyref` type to `externref` - extend `ref.null` and `ref.is_null` instructions with new immediate of the form `func` or `extern` (this will later have to generalise to a `constype` per the [typed references proposal](https://github.com/WebAssembly/function-references)) * Typing rules: - `ref.null`, `ref.is_null`: determine reference type based on new immediate - `select`, `call_indirect`, `table.copy`, `table.init`: drop subtyping - `br_table`: revert to rule requiring same label types - `elem` segment: drop subtyping - `global` import: drop subtyping (link time) * Remove subtyping rules and bottom type. * Revert typing algorithm (interpreter and spec appendix). * JS API: - remove `"nullref"` - rename `"anyref"` to `"externref"` * Scripts: - rename `ref` result to `ref.extern` - rename `ref.host` value to `ref.extern` - drop subtyping from invocation type check * JS translation: - extend harness with separate eq functions for each ref type * Adjust tests: - apply syntax changes - remove tests for subtyping - change tests exercising subtyping in other ways
Issue WebAssembly/reference-types#69 requires that `ref.null` instructions include a reference type immediate. This concept isn't present in the bulk-memory proposal, but the encoding is (in element segment expressions). This change updates the binary and text format, but not the syntax. This is OK for now, since the only reference type allowed here is `funcref`.
Issue WebAssembly/reference-types#69 requires that `ref.null` instructions include a reference type immediate. This concept isn't present in the bulk-memory proposal, but the encoding is (in element segment expressions). This change updates the binary and text format, but not the syntax. This is OK for now, since the only reference type allowed here is `funcref`.
(This idea came up after yesterday's discussion about the GC extension. I have tried to describe it here in a self-contained matter, but let me know if there are any terms I forgot to define or motivations I forgot to provide.)
Having
funcref
be a subtype ofanyref
forces the two to have the same register-level representation. Yet there are good reasons why an engine might want to represent a function reference differently than an arbitrary reference. For example, function references might always be an assembly-code pointer paired with a module-instance pointer, effectively representing the assembly code compiled from a wasm module closed over the global state of the specific instance the function reference was created from. If so, it might make sense for an engine to use a fat pointer for a function reference. But iffuncref
is a subtype ofanyref
, and if it overall makes sense for arbitrary references to be implemented with normal-width pointers, then that forces function references to be implemented with normal-width pointers as well, causing an otherwise-avoidable additional memory-indirection in every indirect function call.Regardless of the reason, by making
funcref
not a subtype ofanyref
, we give engines the flexibility to represent these two types differently (including the option to represent them the same). Instead of subtyping, we could have aconvert
instruction that could take a function reference and convert it into ananyref
representation, or more generally could convert between "convertible" types. The only main benefit of subtyping over conversion in a low-level type system is its behavior with respect to variance, such as co/contravariance of function types, but I see no such application forfuncref
andanyref
. And in the worst case, we could always makingfuncref
a subtype ofanyref
later if such a compelling need arises.The text was updated successfully, but these errors were encountered: