-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce AstGen.numberLiteral stack usage #16438
Conversation
Thanks! I'm happy to merge this but I wonder, how about Edit: with regards to the other memcpy instances that you found, these center around #2765 which is an open language design topic that deeply relates to aliasing, which itself deeply relates to zig's niche in the programming language ecosystem. In other words, you have poked the core of Zig as a language; one of the last remaining unsolved topics in its language design space. I'm looking forward to linking you to @SpexGuy's very related recent talk from SYCL Vancouver which should be posted online any day now. |
40a2e69
to
ec1a650
Compare
Yeah, setCold makes more sense here, fixed |
At the moment, the LLVM IR we generate for this fn is define internal fastcc void @AstGen.numberLiteral ... { Entry: ... %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8 ... That `Decimal` is huuuge! It stores pub const max_digits = 11564; digits: [max_digits]u8, on the stack. It comes from `convertSlow` function, which LLVM happily inlined, despite it being the cold path. Forbid inlining that to not penalize callers with excessive stack usage. Backstory: I was looking for needles memcpys in TigerBeetle, and came up with this copyhound.zig tool for doing just that: https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig Got curious, run it on the Zig's own code base, and looked at some of the worst offenders. List of worst offenders: warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy warning: AstGen.numberLiteral: 11584 bytes memcpy warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy warning: http.Client.connectUnproxied: 16896 bytes memcpy warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy Note from Andrew: I removed `noinline` from this commit since it should be enough to set it to be cold.
ec1a650
to
539eaef
Compare
Perf data point for building the self-hosted compiler:
(insignificant) |
At the moment, the LLVM IR we generate for this fn is
That
Decimal
is huuuge! It storeson the stack.
It comes from
convertSlow
function, which LLVM happily inlined, despite it being the cold path. Forbid inlining to not penalize callers with excessive stack usage.Backstory: I was looking for needles memcpys in TigerBeetle, and came up with this copyhound.zig tool for doing just that:
https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig
Got curious, run it on the Zig's own code base, and looked at some of the worst offenders.
List of worst offenders:
warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy