-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Vpopcnt for x64 #2887
Implement Vpopcnt for x64 #2887
Conversation
d088ff5
to
32d129f
Compare
) | ||
.operands_in(vec![x]) | ||
.operands_out(vec![a]), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just use popcnt
? If popcnt
and vpopcnt
have the same semantics then we can just use popcnt
with a vector type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I made that decision but it does indeed look like I've created identical conversions. Perhaps I was duplicating an earlier effort where two instruction conversions were needed due to the input args?? Not sure. In any case, good question and I'll attempt to combine the two and hopefully it will cause not trouble. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh .. now I see why we had to do that. It has something to do with needing to pop1_with_bitcast in the code_translator. Ultimately the verifier did not like the vector sharing the definition when compiling even though the translations in instructions.rs are identical. Perhaps you know or @cfallin knows but I remember trying this now and consciously implementing a separate translation/lowering because of an issue along these lines:
`Caused by:
0: WebAssembly failed to compile
1: Compilation error: function u0:0(i64 vmctx, i64, i8x16) -> i8x16 wasmtime_system_v {
gv0 = vmctx
gv1 = load.i64 notrap aligned readonly gv0
gv2 = load.i64 notrap aligned gv1
stack_limit = gv2
block0(v0: i64, v1: i64, v2: i8x16):
@004f v4 = popcnt v2
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; error: inst0 (v4 = popcnt.i8x16 v2): has an invalid controlling type i8x16
@0051 jump block1(v4)
block1(v3: i8x16):
@0051 return v3
}
; 1 verifier error detected (see above). Compilation aborted.
', /home/jlbirch/wasmtime_jlb6740/target/debug/build/wasmtime-cli-89f47d7a9050963a/out/wast_testsuite_tests.rs:3645:18
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good find; I think the issue is that the types allowed for popcnt
are too restrictive. popcnt
uses the iB
type, which only includes scalar integers, but I think we should make it use the Int
type, which allows both scalar and vector integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok .. Yes, makes sense. Do we want to just merged this here and refactor that in a separate patch or do it here?? My only thought is fixing a patch that currently works (should work) maybe asking for trouble .. creating some unintended consequence not realizing the restriction to 1b was there for some non-obvious reason. In any case I do suddenly have failures below to check on so I'll fix that first. Will likely continue to be very slow to respond to updates this coming week unfortunately but will get back to normal soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to avoid adding vpopcnt
in the first place rather than cleaning up in a subsequent PR -- hopefully the changes are minor on the lowering side (basically it should be just a conditional in the Opcode::Popcnt
case, on the type, like the arithmetic operators have). Thanks!
e06d98d
to
78cfe71
Compare
_ => unreachable!(), | ||
}; | ||
let ty_tmp = ty.unwrap(); | ||
if !ty_tmp.is_vector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !ty_tmp.is_vector() { | |
let ty = ty.unwrap(); | |
if !ty.is_vector() { |
}; | ||
let ty_tmp = ty.unwrap(); | ||
if !ty_tmp.is_vector() { | ||
let (ext_spec, ty) = match ctx.input_ty(insn, 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnjbvr, what does ty
mean here? (Edit: what I mean is that how ty
is used here is different than elsewhere throughout this file; I'm guessing that the input type may not be the controlling type so we need to figure it out and zero-extend it in some cases. Perhaps this could be clarified with a renaming of ty
to input_ty
and some comment about why this is all necessary?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to @jlb6740 about this (the scalar version is unchanged despite the crazy diff); we should merge this with minor tweaks to ty
and a rebase to get CI to pass. It could be that in a future PR we could simplify the scalar code a bit but that shouldn't affect this PR.
78cfe71
to
1156fe9
Compare
1156fe9
to
058f58a
Compare
No description provided.