Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Miscompilation of popcnt for i8/i16 #3615

Closed
bjorn3 opened this issue Dec 17, 2021 · 1 comment · Fixed by #3617
Closed

Cranelift: Miscompilation of popcnt for i8/i16 #3615

bjorn3 opened this issue Dec 17, 2021 · 1 comment · Fixed by #3617
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator

Comments

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 17, 2021

.clif Test Case

target x86_64 nehalem

function u0:9(i16) -> i32 system_v {
; symbol _RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros

                                block0(v0: i16):
                                    v1 -> v0
                                    jump block1

                                block1:
@0000                               v2 = bnot.i16 v1
@000b                               v3 = popcnt v2
                                    v4 -> v3
@000b                               jump block2

                                block2:
@000d                               v5 = uextend.i32 v4
@0011                               return v5
}

Expected Results

The popcnt instruction only takes the 16bit part of v1 into account.

Actual Results

The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.

0000000000027246 <_RNvMs_NtCs1K2K8gQcegq_4core3nums11count_zeros>:
   27246:       55                      push   %rbp
   27247:       48 89 e5                mov    %rsp,%rbp
   2724a:       f7 d7                   not    %edi
   2724c:       f3 0f b8 f7             popcnt %edi,%esi
   27250:       0f b7 f6                movzwl %si,%esi
   27253:       48 89 f0                mov    %rsi,%rax
   27256:       48 89 ec                mov    %rbp,%rsp
   27259:       5d                      pop    %rbp
   2725a:       c3                      retq

Versions and Environment

Cranelift version or commit: Cranelift 0.79.0

Operating system: Linux

Architecture: x86_64

Extra Info

For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960

@bjorn3 bjorn3 added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Dec 17, 2021
@cfallin
Copy link
Member

cfallin commented Dec 17, 2021

OK, I was nerdsniped into looking at this briefly; it looks like an issue introduced with #2887 (the initial PR that makes use of popcnt instruction) and masked until recently because the upper bits happened to be zero in the register, as you say.

The basic issue is that this match arm matches on I32, and the impl is intended only for actual 32-bit values; but above that, this code alters ty and computes an ext_spec (extension specification), turning I8/I16 into an I32 with extension first. But this extension only happens with the fallback (non-popcnt-instruction implementation) further down, not the "use popcnt if we can" early-out.

I think we can just move the let (ext_spec, ty) = ... further down, below the use_popcnt case. If you want to put together a PR for that, I'm happy to review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants