Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce AstGen.numberLiteral stack usage #16438

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

matklad
Copy link
Contributor

@matklad matklad commented Jul 18, 2023

At the moment, the LLVM IR we generate for this fn is

define internal fastcc void @AstGen.numberLiteral ...  { Entry:
  ...
  %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8
  ...

That Decimal is huuuge! It stores

pub const max_digits = 11564;
digits: [max_digits]u8,

on the stack.

It comes from convertSlow function, which LLVM happily inlined, despite it being the cold path. Forbid inlining to not penalize callers with excessive stack usage.

Backstory: I was looking for needles memcpys in TigerBeetle, and came up with this copyhound.zig tool for doing just that:

https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig

Got curious, run it on the Zig's own code base, and looked at some of the worst offenders.

List of worst offenders:

warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy

@matklad
Copy link
Contributor Author

matklad commented Jul 18, 2023

@andrewrk
Copy link
Member

andrewrk commented Jul 19, 2023

Thanks!

I'm happy to merge this but I wonder, how about @setCold(true); instead? This won't guarantee that the memory won't be inlined, but it will very likely have the same effect, while communicating to the optimizer that this function is never in a hot path.

Edit: with regards to the other memcpy instances that you found, these center around #2765 which is an open language design topic that deeply relates to aliasing, which itself deeply relates to zig's niche in the programming language ecosystem. In other words, you have poked the core of Zig as a language; one of the last remaining unsolved topics in its language design space.

I'm looking forward to linking you to @SpexGuy's very related recent talk from SYCL Vancouver which should be posted online any day now.

@matklad matklad force-pushed the matklad/save-the-stack branch from 40a2e69 to ec1a650 Compare July 19, 2023 09:08
@matklad
Copy link
Contributor Author

matklad commented Jul 19, 2023

Yeah, setCold makes more sense here, fixed

At the moment, the LLVM IR we generate for this fn is

define internal fastcc void @AstGen.numberLiteral ...  {
Entry:
  ...
  %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8
  ...

That `Decimal` is huuuge! It stores

    pub const max_digits =  11564;
    digits: [max_digits]u8,

on the stack.

It comes from `convertSlow` function, which LLVM happily inlined,
despite it being the cold path. Forbid inlining that to not penalize
callers with excessive stack usage.

Backstory: I was looking for needles memcpys in TigerBeetle, and came up
with this copyhound.zig tool for doing just that:

   https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig

Got curious, run it on the Zig's own code base, and looked at some of
the worst offenders.

List of worst offenders:

warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy

Note from Andrew: I removed `noinline` from this commit since it should
be enough to set it to be cold.
@andrewrk andrewrk force-pushed the matklad/save-the-stack branch from ec1a650 to 539eaef Compare July 20, 2023 19:51
@andrewrk andrewrk merged commit 772debb into ziglang:master Jul 20, 2023
@andrewrk
Copy link
Member

Perf data point for building the self-hosted compiler:

Benchmark 1 (3 runs): before
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          58.7s  ± 2.32s     56.1s  … 60.6s           0 ( 0%)        0%
  peak_rss           3.49GB ± 29.8MB    3.47GB … 3.53GB          0 ( 0%)        0%
  cpu_cycles          233G  ±  858M      233G  …  234G           0 ( 0%)        0%
  instructions        334G  ± 1.71G      334G  …  336G           0 ( 0%)        0%
  cache_references   13.3G  ± 18.8M     13.3G  … 13.3G           0 ( 0%)        0%
  cache_misses       1.07G  ± 41.9M     1.03G  … 1.11G           0 ( 0%)        0%
  branch_misses      1.61G  ± 8.19M     1.60G  … 1.62G           0 ( 0%)        0%
Benchmark 2 (3 runs): after
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          58.7s  ±  579ms    58.2s  … 59.3s           0 ( 0%)          -  0.1% ±  6.5%
  peak_rss           3.47GB ±  356KB    3.47GB … 3.47GB          0 ( 0%)          -  0.5% ±  1.4%
  cpu_cycles          231G  ± 1.30G      229G  …  232G           0 ( 0%)          -  1.2% ±  1.1%
  instructions        333G  ± 76.9M      333G  …  333G           0 ( 0%)          -  0.6% ±  0.8%
  cache_references   13.3G  ± 96.2M     13.2G  … 13.3G           0 ( 0%)          -  0.1% ±  1.2%
  cache_misses       1.08G  ± 46.1M     1.03G  … 1.12G           0 ( 0%)          +  1.1% ±  9.3%
  branch_misses      1.60G  ± 1.91M     1.60G  … 1.60G           0 ( 0%)          -  0.6% ±  0.8%

(insignificant)

@matklad matklad deleted the matklad/save-the-stack branch July 25, 2023 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants