Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink the size of a compiled artifact's .wasmtime.addrmap section #3547

Open
alexcrichton opened this issue Nov 18, 2021 · 6 comments
Open

Comments

@alexcrichton
Copy link
Member

Upon thinking about this recently I believe we can shrink the .addrmap section of compiled artifacts a significant amount. Currently this section is used to translate from machine code addresses to addresses of instructions within the original wasm file itself. This information is used primarily backtraces to go from machine address to wasm address and then via the wasm dwarf from wasm address to filename and line number.

At this time, though, we have a mapping from machine code address to wasm address for every single wasm instruction in the entire module. I don't actually think that this is necessary. Instead I think we only need mappings for trapping instructions and instructions which call a function (not a wasm function but instead a Cranelift-level call to include things like memory.grow and such). At this time we're not collecting "asynchronous backtraces" or anything like that so there's no need to actually have an address map for every single wasm instruction in the module.

I suspect that this would lead to huge savings on the .addrmap section which is currently sometimes even larger than the .text section. I don't think this will necessarily be trivially implemented, though, and will involve some trickery on the cranelift side of things to correlating the source of all machine instructions, whether they're calling, and whether they can trap.

@cfallin
Copy link
Member

cfallin commented Nov 18, 2021

Great idea!

I'll note that the line-number-per-wasm-instruction (or actually one really wants line-number-per-compiled-instruction I think) is actually useful if one is single-stepping through code; the infra right now is I guess fully general because of this use-case. But of course backtraces are different, as you say!

It'd probably be reasonably easy to add a knob implements your suggestion -- I'd do it by returning an Option<SourceLoc> here, probably...

@fitzgen
Copy link
Member

fitzgen commented Nov 18, 2021

As discussed in private chat, we should actually be able to remove the .addrmap section completely by updating/fixing/special casing our wasm -> source to native -> source DWARF translation so that we can use the relevant DWARF sections directly without any native -> wasm translation that .addrmap is providing.

@fitzgen
Copy link
Member

fitzgen commented Nov 18, 2021

(The DWARF translation would happen at module compilation time, not runtime.)

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 18, 2021

DWARF translation only happens when there is DWARF debuginfo in the source modules in the first place. .addrmap is also used for backtraces when there is no DWARF debuginfo at all or when DWARF translation is not enabled.

@fitzgen
Copy link
Member

fitzgen commented Nov 18, 2021

That's correct. In the mode I am proposing, we would still only include the subset of DWARF that we are querying today to reconstruct backtraces after translating the native PC to a Wasm PC via .addrmap, this wouldn't imply including every DWARF section and all of their contents.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 18, 2021

The DWARF .debug_line section can't encode a native pc -> wasm pc mapping as necessary when there is no debuginfo. It can only encode a native pc -> (file, line, column, flags) tuple mapping. Wasmtime shows module name + wasm pc when there is no debuginfo, right? On one hand if you encode it as native pc -> (file, line=wasm pc, 0, no flags) that would be bigger than what you can do using .addrmap if you were to encode it using deltas. On the other hand .debug_line always encodes it as deltas, which makes lookup much slower than the current scheme as you have to traverse the entire section. On native DWARF this is somewhat less painful as every compilation unit gets it's own .debug_line mapping, but for generated wasm .debug_line you would likely put the entire wasm module in a single compilation unit.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Dec 13, 2021
This commit adds a new `Config::generate_address_map` compilation
setting which is used to disable emission of the `.wasmtime.addrmap`
section of compiled artifacts. This section is currently around the size
of the entire `.text` section itself unfortunately and for size reasons
may wish to be omitted. Functionality-wise all that is lost is knowing
the precise wasm module offset address of a faulting instruction or in a
backtrace of instructions. This also means that if the module has DWARF
debugging information available with it Wasmtime isn't able to produce a
filename and line number in the backtrace.

This option remains enabled by default. This option may not be needed in
the future with bytecodealliance#3547 perhaps, but in the meantime it seems reasonable
enough to support a configuration mode where the section is entirely
omitted if the smallest module possible is desired.
alexcrichton added a commit that referenced this issue Dec 13, 2021
* Add a compilation section to disable address maps

This commit adds a new `Config::generate_address_map` compilation
setting which is used to disable emission of the `.wasmtime.addrmap`
section of compiled artifacts. This section is currently around the size
of the entire `.text` section itself unfortunately and for size reasons
may wish to be omitted. Functionality-wise all that is lost is knowing
the precise wasm module offset address of a faulting instruction or in a
backtrace of instructions. This also means that if the module has DWARF
debugging information available with it Wasmtime isn't able to produce a
filename and line number in the backtrace.

This option remains enabled by default. This option may not be needed in
the future with #3547 perhaps, but in the meantime it seems reasonable
enough to support a configuration mode where the section is entirely
omitted if the smallest module possible is desired.

* Fix some CI issues

* Update tests/all/traps.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

* Do less work in compilation for address maps

But only when disabled

Co-authored-by: Nick Fitzgerald <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants