Skip to content

Commit

Permalink
reduce AstGen.numberLiteral stack usage
Browse files Browse the repository at this point in the history
At the moment, the LLVM IR we generate for this fn is

define internal fastcc void @AstGen.numberLiteral ...  {
Entry:
  ...
  %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8
  ...

That `Decimal` is huuuge! It stores

    pub const max_digits =  11564;
    digits: [max_digits]u8,

on the stack.

It comes from `convertSlow` function, which LLVM happily inlined,
despite it being the cold path. Forbid inlining that to not penalize
callers with excessive stack usage.

Backstory: I was looking for needles memcpys in TigerBeetle, and came up
with this copyhound.zig tool for doing just that:

   https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig

Got curious, run it on the Zig's own code base, and looked at some of
the worst offenders.

List of worst offenders:

warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy

Note from Andrew: I removed `noinline` from this commit since it should
be enough to set it to be cold.
  • Loading branch information
matklad authored and andrewrk committed Jul 20, 2023
1 parent e313584 commit 539eaef
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions lib/std/fmt/parse_float/convert_slow.zig
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,12 @@ pub fn getShift(n: usize) usize {
///
/// The algorithms described here are based on "Processing Long Numbers Quickly",
/// available here: <https://arxiv.org/pdf/2101.11408.pdf#section.11>.
///
/// Note that this function needs a lot of stack space and is marked
/// cold to hint against inlining into the caller.
pub fn convertSlow(comptime T: type, s: []const u8) BiasedFp(T) {
@setCold(true);

const MantissaT = mantissaType(T);
const min_exponent = -(1 << (math.floatExponentBits(T) - 1)) + 1;
const infinite_power = (1 << math.floatExponentBits(T)) - 1;
Expand Down

0 comments on commit 539eaef

Please sign in to comment.