Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow list fusion for Text and Text.Lazy unpack #629

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

meooow25
Copy link
Contributor

  • Make Data.Text.unpack and Data.Text.Lazy.unpack good producers in list fusion. This allows them to fuse with good consumers of lists. Rewrite-back rules are included since the function bodies are large and we don't want to inline them if fusion doesn't occur.
  • For Data.Text.Lazy, this change means that unpack, which uses unstreamList, no longer fuses with streamList under Text's stream fusion framework. This scenario seems very unlikely, since nothing else must be done to the list in between the two functions. Even pack . unpack does not satisfy this rule. So we are not losing anything valuable here.
  • Add benchmarks for unpack, fusion and no fusion.

Closes #628.


Benchmarks with GHC 9.10.1

Before

Show
All
  Pure
    tiny
      length . unpack
        Text:     OK
          47.3 ns ± 4.0 ns, 415 B  allocated,   0 B  copied,  35 MB peak memory
        LazyText: OK
          53.8 ns ± 5.3 ns, 503 B  allocated,   0 B  copied,  35 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          46.7 ns ± 1.7 ns, 415 B  allocated,   0 B  copied,  35 MB peak memory
        LazyText: OK
          53.0 ns ± 3.5 ns, 503 B  allocated,   0 B  copied,  35 MB peak memory
    ascii-small
      length . unpack
        Text:     OK
          567  μs ±  43 μs, 4.8 MB allocated, 117 B  copied,  35 MB peak memory
        LazyText: OK
          680  μs ±  59 μs, 6.4 MB allocated, 139 B  copied,  35 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          566  μs ±  54 μs, 4.8 MB allocated, 116 B  copied,  35 MB peak memory
        LazyText: OK
          683  μs ±  44 μs, 6.4 MB allocated, 137 B  copied,  35 MB peak memory
    ascii
      length . unpack
        Text:     OK
          491  ms ±  13 ms, 4.1 GB allocated,  61 KB copied, 384 MB peak memory
        LazyText: OK
          587  ms ±  11 ms, 5.4 GB allocated,  81 KB copied, 384 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          485  ms ± 4.1 ms, 4.1 GB allocated,  61 KB copied, 384 MB peak memory
        LazyText: OK
          588  ms ± 6.6 ms, 5.4 GB allocated,  78 KB copied, 384 MB peak memory
    english
      length . unpack
        Text:     OK
          32.5 ms ± 1.7 ms, 277 MB allocated, 4.9 KB copied, 384 MB peak memory
        LazyText: OK
          39.4 ms ± 1.9 ms, 369 MB allocated, 5.9 KB copied, 384 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          32.4 ms ± 1.7 ms, 277 MB allocated, 4.7 KB copied, 384 MB peak memory
        LazyText: OK
          39.2 ms ± 1.7 ms, 369 MB allocated, 5.7 KB copied, 384 MB peak memory
    russian
      length . unpack
        Text:     OK
          63.3 μs ± 6.2 μs, 455 KB allocated,  10 B  copied, 384 MB peak memory
        LazyText: OK
          70.3 μs ± 6.6 μs, 607 KB allocated,  12 B  copied, 384 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          63.3 μs ± 5.9 μs, 455 KB allocated,  10 B  copied, 384 MB peak memory
        LazyText: OK
          72.7 μs ± 7.1 μs, 607 KB allocated,  11 B  copied, 384 MB peak memory
    japanese
      length . unpack
        Text:     OK
          40.0 μs ± 3.4 μs, 314 KB allocated,   6 B  copied, 384 MB peak memory
        LazyText: OK
          46.2 μs ± 3.7 μs, 419 KB allocated,   6 B  copied, 384 MB peak memory
      length . drop 1 . unpack
        Text:     OK
          40.0 μs ± 683 ns, 314 KB allocated,   4 B  copied, 384 MB peak memory
        LazyText: OK
          46.6 μs ± 3.8 μs, 419 KB allocated,   6 B  copied, 384 MB peak memory

After:

All
  Pure
    tiny
      length . unpack
        Text:     OK
          15.3 ns ± 672 ps,  23 B  allocated,   0 B  copied,  35 MB peak memory, 67% less than baseline
        LazyText: OK
          17.6 ns ± 1.4 ns,  23 B  allocated,   0 B  copied,  35 MB peak memory, 67% less than baseline
      length . drop 1 . unpack
        Text:     OK
          45.9 ns ± 2.8 ns, 407 B  allocated,   0 B  copied,  35 MB peak memory,       same as baseline
        LazyText: OK
          53.1 ns ± 3.5 ns, 503 B  allocated,   0 B  copied,  35 MB peak memory,       same as baseline
    ascii-small
      length . unpack
        Text:     OK
          142  μs ± 7.1 μs,  59 B  allocated,   2 B  copied,  35 MB peak memory, 74% less than baseline
        LazyText: OK
          164  μs ±  13 μs,  95 B  allocated,   5 B  copied,  35 MB peak memory, 75% less than baseline
      length . drop 1 . unpack
        Text:     OK
          550  μs ±  32 μs, 4.8 MB allocated,  95 B  copied,  35 MB peak memory,       same as baseline
        LazyText: OK
          678  μs ±  52 μs, 6.4 MB allocated, 137 B  copied,  35 MB peak memory,       same as baseline
    ascii
      length . unpack
        Text:     OK
          124  ms ± 3.1 ms, 2.0 KB allocated, 1.8 KB copied, 384 MB peak memory, 74% less than baseline
        LazyText: OK
          142  ms ± 3.5 ms,  48 KB allocated, 3.4 KB copied, 384 MB peak memory, 75% less than baseline
      length . drop 1 . unpack
        Text:     OK
          477  ms ± 4.1 ms, 4.1 GB allocated,  61 KB copied, 384 MB peak memory,  1% less than baseline
        LazyText: OK
          586  ms ± 3.6 ms, 5.4 GB allocated,  78 KB copied, 384 MB peak memory,       same as baseline
    english
      length . unpack
        Text:     OK
          8.33 ms ± 685 μs, 4.0 KB allocated, 356 B  copied, 384 MB peak memory, 74% less than baseline
        LazyText: OK
          9.52 ms ± 749 μs, 3.9 KB allocated, 351 B  copied, 384 MB peak memory, 75% less than baseline
      length . drop 1 . unpack
        Text:     OK
          32.2 ms ± 2.1 ms, 277 MB allocated, 4.7 KB copied, 384 MB peak memory,       same as baseline
        LazyText: OK
          39.0 ms ± 2.1 ms, 369 MB allocated, 5.7 KB copied, 384 MB peak memory,       same as baseline
    russian
      length . unpack
        Text:     OK
          17.9 μs ± 1.4 μs,  32 B  allocated,   0 B  copied, 384 MB peak memory, 71% less than baseline
        LazyText: OK
          23.1 μs ± 1.5 μs,  32 B  allocated,   0 B  copied, 384 MB peak memory, 67% less than baseline
      length . drop 1 . unpack
        Text:     OK
          62.3 μs ± 5.9 μs, 455 KB allocated,  10 B  copied, 384 MB peak memory,       same as baseline
        LazyText: OK
          69.7 μs ± 4.3 μs, 607 KB allocated,   9 B  copied, 384 MB peak memory,       same as baseline
    japanese
      length . unpack
        Text:     OK
          10.2 μs ± 825 ns,  28 B  allocated,   0 B  copied, 384 MB peak memory, 74% less than baseline
        LazyText: OK
          11.5 μs ± 711 ns,  28 B  allocated,   0 B  copied, 384 MB peak memory, 75% less than baseline
      length . drop 1 . unpack
        Text:     OK
          39.0 μs ± 3.1 μs, 314 KB allocated,   5 B  copied, 384 MB peak memory,       same as baseline
        LazyText: OK
          45.6 μs ± 1.4 μs, 419 KB allocated,   5 B  copied, 384 MB peak memory,       same as baseline

Comment on lines +60 to +66
foldrText :: (Char -> b -> b) -> b -> Text -> b
foldrText f z (Text arr off len) = go off
where
go !i
| i >= off + len = []
| otherwise = let !(Iter c l) = iterArray arr i in c : go (i + l)
{-# INLINE [1] unpack #-}
| i >= off + len = z
| otherwise = let !(Iter c l) = iterArray arr i in f c (go (i + l))
{-# INLINE foldrText #-}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This can, and maybe should, use Data.Text.foldr. But it was already here, so I didn't change that.

-- * If it fuses: In phase 0, `foldrFB` inlines and `foldr` inlines. GHC
-- optimizes the fused code.
{-# RULES
"Text.Lazy.unpack" [~1] forall t. unpack t = Exts.build (\c n -> foldrFB c n t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Text.Lazy.unpack" [~1] forall t. unpack t = Exts.build (\c n -> foldrFB c n t)
"Text.Lazy.unpack" [~1] forall t. unpack t = Exts.build (\cons nil -> foldrFB cons nil t)

and same for the strict Text. Fusion is cryptic enough, let's give an extra hint to readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I went with lcons and lnil because GHC warns about cons shadowing the top-level cons.

* Make Data.Text.unpack and Data.Text.Lazy.unpack good producers in list
  fusion. This allows them to fuse with good consumers of lists.
  Rewrite-back rules are included since the function bodies are large
  and we don't want to inline them if fusion doesn't occur.
* For Data.Text.Lazy, this change means that `unpack`, which uses
  `unstreamList`, no longer fuses with `streamList` under Text's stream
  fusion framework. This scenario seems very unlikely, since nothing
  else must be done to the list in between the two functions. Even
  `pack . unpack` does not satisfy this rule. So we are not losing
  anything valuable here.
* Add benchmarks for unpack, fusion and no fusion.
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
(I'm not a terribly big fan of rewrite rules in general because of their brittleness, but there does not seem to be any downside to this particular change)

@Bodigrim Bodigrim requested a review from Lysxia March 26, 2025 00:03
@Bodigrim Bodigrim merged commit 5e57460 into haskell:master Mar 26, 2025
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow list fusion with unpack
3 participants