Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lazy performance of Data.Text.Lazy.inits #572

Merged
merged 3 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions benchmarks/haskell/Benchmarks.hs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import qualified Benchmarks.EncodeUtf8 as EncodeUtf8
import qualified Benchmarks.Equality as Equality
import qualified Benchmarks.FileRead as FileRead
import qualified Benchmarks.FoldLines as FoldLines
import qualified Benchmarks.Micro as Micro
import qualified Benchmarks.Multilang as Multilang
import qualified Benchmarks.Pure as Pure
import qualified Benchmarks.ReadNumbers as ReadNumbers
Expand Down Expand Up @@ -61,6 +62,7 @@ main = do
defaultMain
[ Builder.benchmark
, Concat.benchmark
, Micro.benchmark
, bgroup "DecodeUtf8"
[ env (DecodeUtf8.initEnv (tf "libya-chinese.html")) (DecodeUtf8.benchmark "html")
, env (DecodeUtf8.initEnv (tf "yiwiki.xml")) (DecodeUtf8.benchmark "xml")
Expand Down
32 changes: 32 additions & 0 deletions benchmarks/haskell/Benchmarks/Micro.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
-- | Benchmarks on artificial data.

module Benchmarks.Micro (benchmark) where

import qualified Data.Text.Lazy as TL
import qualified Data.Text as T
import Test.Tasty.Bench (Benchmark, Benchmarkable, bgroup, bcompareWithin, bench, nf)

benchmark :: Benchmark
benchmark = bgroup "Micro"
[ blinear "lazy-inits--last" 500000 2 0.1 $ \len ->
nf (last . TL.inits) (chunks len)
, blinear "lazy-inits--map-take1" 500000 2 0.1 $ \len ->
nf (map (TL.take 1) . TL.inits) (chunks len)
]

chunks :: Int -> TL.Text
chunks n = TL.fromChunks (replicate n (T.pack "a"))

-- Check that running an action with input length (m * baseLen)
-- runs m times slower than the same action with input length baseLen.
blinear :: String -- ^ Name (must be globally unique!)
-> Int -- ^ Base length
-> Int -- ^ Multiplier m
-> Double -- ^ Slack s
-> (Int -> Benchmarkable) -- ^ Action to measure, parameterized by input length
-> Benchmark
blinear name baseLen m s run = bgroup name
[ bench "baseline" $ run baseLen
, bcompareWithin (fromIntegral m * (1 - s)) (fromIntegral m * (1 + s)) (name ++ ".baseline") $
bench ("x" ++ show m) $ run (m * baseLen)
]
16 changes: 12 additions & 4 deletions src/Data/Text/Lazy.hs
Original file line number Diff line number Diff line change
Expand Up @@ -1443,10 +1443,18 @@ inits = (NE.toList P.$!) . initsNE
--
-- @since 2.1.2
initsNE :: Text -> NonEmpty Text
initsNE = (Empty NE.:|) . inits'
where inits' Empty = []
inits' (Chunk t ts) = L.map (\t' -> Chunk t' Empty) (NE.tail (T.initsNE t))
++ L.map (Chunk t) (inits' ts)
initsNE ts0 = Empty NE.:| inits' 0 ts0
where
inits' :: Int64 -- Number of previous chunks i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think number of chunks cannot exceed Int even on 32-bit machines.

Copy link
Contributor

@meooow25 meooow25 Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lazy structure so it can, given enough time. For instance, initsNE (cycle "text").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if we want to be precise to that extent it should be Integer ;)

-> Text -- The remainder after dropping i chunks from ts0
-> [Text] -- Prefixes longer than the first i chunks of ts0.
inits' !i (Chunk t ts) = L.map (takeChunks i ts0) (NE.tail (T.initsNE t))
++ inits' (i + 1) ts
inits' _ Empty = []

takeChunks :: Int64 -> Text -> T.Text -> Text
takeChunks !i (Chunk t ts) lastChunk | i > 0 = Chunk t (takeChunks (i - 1) ts lastChunk)
takeChunks _ _ lastChunk = Chunk lastChunk Empty

-- | /O(n)/ Return all final segments of the given 'Text', longest
-- first.
Expand Down
1 change: 1 addition & 0 deletions text.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,7 @@ benchmark text-benchmarks
Benchmarks.Equality
Benchmarks.FileRead
Benchmarks.FoldLines
Benchmarks.Micro
Benchmarks.Multilang
Benchmarks.Programs.BigTable
Benchmarks.Programs.Cut
Expand Down
Loading