-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Names section extensible #984
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -436,34 +436,62 @@ The expectation is that, when a binary WebAssembly module is viewed in a browser | |
environment, the data in this section will be used as the names of functions | ||
and locals in the [text format](TextFormat.md). | ||
|
||
The name section contains a sequence of name subsections: | ||
|
||
| Field | Type | Description | | ||
| ----- | ---- | ----------- | | ||
| count | `varuint32` | count of entries to follow | | ||
| entries | `function_names*` | sequence of names | | ||
| name_type | `varuint7` | code identifying type of name contained in this subsection | | ||
| name_payload_len | `varuint32` | size of this subsection in bytes | | ||
| name_payload_data | `bytes` | content of this section, of length `name_payload_len` | | ||
|
||
Since name subsections have a given length, unknown or unwanted names can be | ||
skipped over by an engine. The current list of valid `name_type` codes are: | ||
|
||
| Name Type | Code | Description | | ||
| --------- | ---- | ----------- | | ||
| [Function](#function-names) | `1` | Assigns names to functions | | ||
| [Local](#local-names) | `2` | Assigns names to locals in functions | | ||
|
||
When present, name subsections must appear in this order and at most once. The | ||
end of the last subsection must coincide with the last byte of the name | ||
section to be a well-formed name section. | ||
|
||
#### Name List | ||
|
||
In any of the subsequent subsections, a `name_list` is encoded as: | ||
|
||
The sequence of `function_names` assigns names to the corresponding | ||
[function index](Modules.md#function-index-space). (Note: this assigns names to both | ||
imported and module-defined functions.) The count may differ from the actual number of functions. | ||
| Name Type | Type | Description | | ||
| --------- | ---- | ----------- | | ||
| count | `varuint32` | number of `name` in names | | ||
| names | `name*` | sequence of `name` | | ||
|
||
where a `name` is encoded as: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: How about calling such a pair a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, done. |
||
|
||
| Name Type | Type | Description | | ||
| --------- | ---- | ----------- | | ||
| name_len | `varuint32` | number of bytes in name_str | | ||
| name_str | `bytes` | UTF8 encoding of the name | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All these have to UTF8 -> UTF16 properly? Be valid JavaScript properties? Or should we just have a byte array like elsewhere, and restrict the name section in JS.md like everywhere else? That seems better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, I guess we could just leave them plain old byte arrays, but there is a difference. For import/export names, they might be strings, they might be some custom data structure. In this case, I think they're really supposed to be strings, so I think it's beneficial (and consistent with the state before this PR) to say they are UTF8 strings. Also, they don't end up being reflected in JS as property names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It kinda breaks the "embedder makes sense of byte arrays" approach we had elsewhere. It's weird to mention UTF anywhere but in JS.md. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, it's more like "embedder makes sense of imports and exports"; in this case they're actually supposed to be names (so strings). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC what we had before was a hacked-up names section that got us limping along, with no real intent of keeping it around. Maybe this would help me understand better: what's the intent of this refactoring? I thought you were going for something that'll stand some test of time, so without designing DWARF / TROLL I'm trying to see what we can minimally get "right". Is that out of scope? Happy to back away if so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the scope of the names section is just assigning (string) names to various index spaces for the benefit of binary->text rendering. If we want to do something more in the direction of source-level debugging, I think that will require a new section with a lot more structure than a collection of arrays of names; I can't imagine that being able to encode arbitrary binary contents in the names of the names section will be sufficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I'm not proposing we broaden the scope to more than strings. What I was trying to get to is: are we trying to get this string mapping thing right? Or is this a throwaway thing to tie implementations over? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd have a slight preference for not mentioning encodings here either (keep that out of the core spec altogether). Rather say in the JS API spec that names are assumed to be UTF-8 encoded and handled analogously to import/export names. |
||
|
||
#### Function names | ||
|
||
| Field | Type | Description | | ||
| ----- | ---- | ----------- | | ||
| fun_name_len | `varuint32` | string length, in bytes | | ||
| fun_name_str | `bytes` | valid utf8 encoding | | ||
| local_count | `varuint32` | count of local names to follow | | ||
| local_names | `local_name*` | sequence of local names | | ||
The function names subsection is a `name_list` which assign a name to each | ||
function by [function index](Modules.md#function-index-space). (Note: this | ||
assigns names to both imported and module-defined functions.) The count | ||
may differ from the actual number of functions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A count that's bigger or smaller doesn't seem useful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just being consistent with what it was before; I don't have a really strong opinion here. |
||
|
||
The sequence of `local_name` assigns names to the corresponding local index. The | ||
count may differ from the actual number of locals. | ||
#### Local names | ||
|
||
#### Local name | ||
The local names subsection assigns a `name_list` to each function defined inside | ||
the module according to its index in the [Function section](#function-section). | ||
(Note: this does not assign names to imports; imports have no locals.) The | ||
`name_list` for a given function assigns a name to each local. The counts may | ||
differ from both the number of functions and number of locals in a given | ||
function. | ||
|
||
| Field | Type | Description | | ||
| ----- | ---- | ----------- | | ||
| local_name_len | `varuint32` | string length, in bytes | | ||
| local_name_str | `bytes` | valid utf8 encoding | | ||
|
||
| count | `varuint32` | count of `name_list` in func_locals | | ||
| func_locals | `name_list*` | sequence of `name_list`, one per function | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So you can't skip a function? Or to do so you need an empty name_list? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, you'd set count=0 (similar to what you had to do before when the locals were embedded inside each function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems pretty brittle when combined with "you can avoid specifying functions if you want to". At this point why not use a map of { function_number => name } ? I agree forcing all of them to be present is suboptimal (mixing debug and release is nice, especially if we consider transfer size). I just don't want us to end up with some super weird encoding, even if name sections aren't really standard right now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A map is more general, but more complicated too; I think the normal case here is all-or-nothing (so either no local subsection at all or a full one). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I like the shift in index space. And in fact, there is a use for naming locals for imports: parameter names. More generally, I wonder whether it's optimal to require dense lists for the name section. Once we extend this to other names we might encounter lots of anonymous entries (consider the type section). And even for functions I can imagine scenarios where you have large modules but only partial name information, e.g., when merging several modules into one for deployment (what I called "static linking" the other day). An index->info map seems more flexible and potentially more compact, while not being all that complex. |
||
|
||
# Function Bodies | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unknown or unwanted name types" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed