Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupBy changes order of characters #4

Closed
gbwey opened this issue Jan 27, 2018 · 9 comments · Fixed by #26
Closed

groupBy changes order of characters #4

gbwey opened this issue Jan 27, 2018 · 9 comments · Fixed by #26

Comments

@gbwey
Copy link

gbwey commented Jan 27, 2018

Hi,
I noticed that groupBy appears to take the first character in each group and move it to the end.
Here's a snippet showing what happens.

{-# LANGUAGE OverloadedStrings #-}
import Data.Functor.Identity
import Data.Function
import qualified Data.ByteString.Streaming as Q
import Streaming.Internal as S
import qualified Data.ByteString.Lazy as BL

zz = S.mapsM Q.toLazy $ Q.groupBy (on (==) (== 1)) $ Q.fromLazy @Identity "1234"
-- Effect (Identity (Step ("2341" :> Return ())))

zz1 = S.mapsM Q.toLazy $ Q.groupBy (on (==) (elem BL.unpack "ab")) $ Q.fromLazy @Identity "1234ab5678"
-- Effect (Identity (Step ("2341" :> Effect (Identity (Step ("ba" :> Effect (Identity (Step ("6785" :> Return ())))))))))

@andrewthad
Copy link
Contributor

Thanks for reporting this. That's pretty bad. I'll take a PR that fixes this, or I'll do it in a few days if it hasn't already been done.

@fosskers
Copy link
Collaborator

Hi there, is this still an issue?

@fosskers
Copy link
Collaborator

I'm adding a test to check this.

@fosskers
Copy link
Collaborator

group seems fine, but here's an unexpected result for groupBy:

groupByCharOrder :: Assertion
groupByCharOrder = do
  a <- S.toList_ . SM.mapsM Q8.toLazy $ Q8.groupBy (\a b -> succ a == b) $ Q8.fromLazy @IO "12346789"
  a @?= ["1234", "6789"]
  Unit Tests
    groupBy: Char order:                                                  FAIL
      tests/Test.hs:89:
      expected: ["1234","6789"]
       but got: ["21","43","76","98"]

@fosskers
Copy link
Collaborator

fosskers commented Sep 18, 2020

Or perhaps my understanding of groupBy was flawed:

Data.List> groupBy (\a b -> succ a == b) "12346789"
["12","34","67","89"]

Still, the problem of order remains in the Streaming variant. Note that group doesn't seem to suffer from this, as unlike Data.List the group we have here is implemented separately.

@fosskers
Copy link
Collaborator

fosskers commented Sep 18, 2020

From Data.List:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

So it's comparing every other Char with the first Char in the run. I had thought it would look at adjacent pairs.

@fosskers
Copy link
Collaborator

Looking at the code, my guess is that this bug is actually present in group as well, but you can't tell that items are reordered due to group always using ==.

@fosskers fosskers linked a pull request Sep 18, 2020 that will close this issue
@fosskers
Copy link
Collaborator

Here we go:

    go (Chunk c cs)
      | B.length c == 1 = Step $ to [c] (B.unsafeHead c) cs
      | otherwise       = Step $ to [B.unsafeTake 1 c] (B.unsafeHead c) (Chunk (B.unsafeTail c) cs)
...
    to acc !w (Chunk c cs) = case findIndexOrEnd (not . rel w) c of
      0 -> revNonEmptyChunks acc (Empty (go (Chunk c cs)))
      n | n == B.length c  -> to (B.unsafeTake n c : acc) w cs
...

Notice the acc is being primed with the first element of the chunk, and later pieces are being prepended to that.

@fosskers
Copy link
Collaborator

fosskers commented Sep 18, 2020

Turns out a former dev had mistaken the behaviour of foldl for foldr, and so in the internal logic of groupBy, the gradually prepended lists (shown above) were never actually being reversed as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants