-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Evaluation] Statically unroll 'itraverseCounter_' #5265
[Evaluation] Statically unroll 'itraverseCounter_' #5265
Conversation
/benchmark validation |
Click here to check the status of your benchmark. |
/benchmark validation |
Click here to check the status of your benchmark. |
/benchmark validation |
Comparing benchmark results of ' validation' on '48a20b623' (base) and '50b077a0c' (PR) Results table
|
Click here to check the status of your benchmark. |
Comparing benchmark results of ' validation' on '48a20b623' (base) and '50b077a0c' (PR) Results table
|
The first run is -1.13% on average, then second one is -1.12%. Looks like a modest improvement, but really hard to tell with the benchmarks not being reliable for such little things at all. |
/benchmark plutus-benchmark:nofib |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:nofib' on '48a20b623' (base) and '50b077a0c' (PR) Results table
|
^ +0.95% on average out of nowhere. Let's try again. |
/benchmark plutus-benchmark:nofib |
Click here to check the status of your benchmark. |
Comparing benchmark results of ' plutus-benchmark:nofib' on '48a20b623' (base) and '50b077a0c' (PR) Results table
|
+0.31% on average. OK, this looks like noise that isn't worth spending time on. |
50b077a
to
366de2b
Compare
/benchmark validation |
3 similar comments
/benchmark validation |
/benchmark validation |
/benchmark validation |
Click here to check the status of your benchmark. |
Comparing benchmark results of 'validation' on '5eb80f3fe3' (base) and '366de2bce7' (PR) Results table
|
Click here to check the status of your benchmark. |
Comparing benchmark results of 'validation' on '5eb80f3fe3' (base) and '366de2bce7' (PR) Results table
|
Click here to check the status of your benchmark. |
Comparing benchmark results of 'validation' on '5eb80f3fe3' (base) and '366de2bce7' (PR) Results table
|
366de2b
to
0a8f38e
Compare
-2.2% to -3.2% with a bunch of readings around -2.5%, so let's call it -2.5%. |
0a8f38e
to
6b3ae4b
Compare
{-# INLINE upwardsM #-} | ||
|
||
instance UpwardsM f n => UpwardsM f ('S n) where | ||
upwardsM !i k = k i *> upwardsM @f @n (i + 1) k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't you saying somewhere that we should always use (const+i)
instead of (i+const)
? Not that it changes anything here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am courious about the possible rationale behind the argument order preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the rationale is constant folding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't you saying somewhere that we should always use
(const+i)
instead of(i+const)
?
That's in UPLC, I don't think it matters in Haskell, but I'll be happy to get enlightened.
{-# INLINE upwardsM #-} | ||
|
||
instance UpwardsM f n => UpwardsM f ('S n) where | ||
upwardsM !i k = k i *> upwardsM @f @n (i + 1) k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird when thinking about the "types" going downwards while in the meantime the values going upwards...
Can't you use them going on the same direction, for example by calling (i-1)
instead? Probably you would have to change to (<*)
also.
On another note, assuming that NatToPeano is necessary, can't you still keep the KnownNat around so that in the method's definition you call to natVal (Proxy @n)
instead of doing the arithmetic manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird when thinking about the "types" going downwards while in the meantime the values going upwards...
I did exactly what the original implementation did, that's it. I don't think it matters much, but we can discuss it during tech discussion if anybody hates this version.
Probably you would have to change to
(<*)
also.
That wouldn't change ordering of effects or the result.
On another note, assuming that NatToPeano is necessary, can't you still keep the KnownNat around so that in the method's definition you call to
natVal (Proxy @n)
instead of doing the arithmetic manually?
Perhaps, but evidently constant folding works perfectly well here (see the GHC Core in the PR description), so who cares?
data Peano | ||
= Z | ||
| S Peano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why Peano is needed? Is it because KnownNat + CmpNat type family would require OverlappingInstances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because KnownNat + CmpNat type family would require OverlappingInstances?
Yes. Or interleaving a type family and a type class (like I do here). Or maybe something else. The version in the PR is the simplest I'm aware of.
{-# INLINE upwardsM #-} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How reliable is the use of INLINE pragma?
Let's say that we cannot guarantee that this will be unrolled (fully);
have you looked into the perf difference of old way vs the not-inlined method way? Could the not-inlined method way be actually much slower than the old way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How reliable is the use of INLINE pragma?
In my experience, for such small definitions it is reliable.
have you looked into the perf difference of old way vs the not-inlined method way? Could the not-inlined method way be actually much slower than the old way?
I haven't but I'd expect it to be noticeably slower indeed.
I don't think we need to worry about it, there's a huge tower of abstractions that we optimize away like this. The code here is very straightforward compared to some of the other stuff that we do. There's no way around looking into GHC Core occasionally and relying on longitudinal benchmarks anyway.
-- | Traverse the counters with an effectful function. | ||
itraverseCounter_ | ||
:: forall n m | ||
. (KnownNat n, PrimMonad m) | ||
. (UpwardsM m (NatToPeano n), PrimMonad m) | ||
=> (Int -> Word8 -> m ()) | ||
-> StepCounter n (PrimState m) | ||
-> m () | ||
itraverseCounter_ f (StepCounter arr) = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that no other code is using itraverseCounter. How about we get rid of itraversecounter and switch to iforcounter directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel safe to ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, it's normal for for
to get defined in terms of traverse
and having an extra function is fine. I'd keep it the way it is.
instance UpwardsM f n => UpwardsM f ('S n) where | ||
upwardsM !i k = k i *> upwardsM @f @n (i + 1) k | ||
{-# INLINE upwardsM #-} | ||
|
||
-- | Traverse the counters with an effectful function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about a potential off-by-one mistake. I looked at the core and the unrolling LGTM:
case $weta1_sx2U 0# (ipv8_ivY8 `cast` <Co:8> :: ...) of
{ (# ipv10_XX, ipv11_X1a #) ->
case $weta1_sx2U 1# ipv10_XX of { (# ipv12_X1c, ipv13_X11 #) ->
case $weta1_sx2U 2# ipv12_X1c of { (# ipv14_X1d, ipv15_X1e #) ->
case $weta1_sx2U 3# ipv14_X1d of { (# ipv16_X1g, ipv17_X1h #) ->
case $weta1_sx2U 4# ipv16_X1g of { (# ipv18_X1j, ipv19_X1k #) ->
case $weta1_sx2U 5# ipv18_X1j of { (# ipv20_X1m, ipv21_X1n #) ->
case $weta1_sx2U 6# ipv20_X1m of { (# ipv22_X1p, ipv23_X1q #) ->
case $weta1_sx2U 7# ipv22_X1p of { (# ipv24_X1s, ipv25_X1t #) ->
case $weta1_sx2U 8# ipv24_X1s of { (# ipv26_X1v, ipv27_X1w #) ->
case $weta1_sx2U 9# ipv26_X1v of { (# ipv28_X1y, ipv29_X1z #) ->
assuming we have 10 elements on the cekmachinecosts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for double-checking!
-- | Traverse the counters with an effectful function. | ||
itraverseCounter_ | ||
:: forall n m | ||
. (KnownNat n, PrimMonad m) | ||
. (UpwardsM m (NatToPeano n), PrimMonad m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the n
is known at compile time, what is it and where does it come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the number of of fields of CekMachineCostsBase
I'm going to claim that I've addressed the comments (by responding to them), so let's merge this bad boy. |
Results in a statically unrolled loop:
Not sure if it's worth it though, let's ask benchmarks.