You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
bjorn3
added
bug
Incorrect behavior in the current implementation that needs fixing
cranelift
Issues related to the Cranelift code generator
labels
Dec 17, 2021
OK, I was nerdsniped into looking at this briefly; it looks like an issue introduced with #2887 (the initial PR that makes use of popcnt instruction) and masked until recently because the upper bits happened to be zero in the register, as you say.
The basic issue is that this match arm matches on I32, and the impl is intended only for actual 32-bit values; but above that, this code alters ty and computes an ext_spec (extension specification), turning I8/I16 into an I32 with extension first. But this extension only happens with the fallback (non-popcnt-instruction implementation) further down, not the "use popcnt if we can" early-out.
I think we can just move the let (ext_spec, ty) = ... further down, below the use_popcnt case. If you want to put together a PR for that, I'm happy to review!
.clif
Test CaseExpected Results
The popcnt instruction only takes the 16bit part of
v1
into account.Actual Results
The popcnt instruction takes a 32bit register into account when computing the population count despite the upper half having an undefined value.
Versions and Environment
Cranelift version or commit: Cranelift 0.79.0
Operating system: Linux
Architecture: x86_64
Extra Info
For reference: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/isle.20performance.20with.20cg_clif/near/265273960
The text was updated successfully, but these errors were encountered: