Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vpopcnt for x64 #2887

Merged
merged 1 commit into from
May 22, 2021

Conversation

jlb6740
Copy link
Contributor

@jlb6740 jlb6740 commented May 10, 2021

No description provided.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels May 10, 2021
@jlb6740 jlb6740 force-pushed the x64_implement_packed_popcnt branch 2 times, most recently from d088ff5 to 32d129f Compare May 13, 2021 20:02
)
.operands_in(vec![x])
.operands_out(vec![a]),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just use popcnt? If popcnt and vpopcnt have the same semantics then we can just use popcnt with a vector type?

Copy link
Contributor Author

@jlb6740 jlb6740 May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I made that decision but it does indeed look like I've created identical conversions. Perhaps I was duplicating an earlier effort where two instruction conversions were needed due to the input args?? Not sure. In any case, good question and I'll attempt to combine the two and hopefully it will cause not trouble. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh .. now I see why we had to do that. It has something to do with needing to pop1_with_bitcast in the code_translator. Ultimately the verifier did not like the vector sharing the definition when compiling even though the translations in instructions.rs are identical. Perhaps you know or @cfallin knows but I remember trying this now and consciously implementing a separate translation/lowering because of an issue along these lines:

`Caused by:
0: WebAssembly failed to compile
1: Compilation error: function u0:0(i64 vmctx, i64, i8x16) -> i8x16 wasmtime_system_v {
gv0 = vmctx
gv1 = load.i64 notrap aligned readonly gv0
gv2 = load.i64 notrap aligned gv1
stack_limit = gv2

                                   block0(v0: i64, v1: i64, v2: i8x16):
   @004f                               v4 = popcnt v2
   ;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ; error: inst0 (v4 = popcnt.i8x16 v2): has an invalid controlling type i8x16
   
   @0051                               jump block1(v4)
   
                                   block1(v3: i8x16):
   @0051                               return v3
   }
   
   ; 1 verifier error detected (see above). Compilation aborted.
   ', /home/jlbirch/wasmtime_jlb6740/target/debug/build/wasmtime-cli-89f47d7a9050963a/out/wast_testsuite_tests.rs:3645:18

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace`

Copy link
Contributor

@abrown abrown May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good find; I think the issue is that the types allowed for popcnt are too restrictive. popcnt uses the iB type, which only includes scalar integers, but I think we should make it use the Int type, which allows both scalar and vector integers.

Copy link
Contributor Author

@jlb6740 jlb6740 May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok .. Yes, makes sense. Do we want to just merged this here and refactor that in a separate patch or do it here?? My only thought is fixing a patch that currently works (should work) maybe asking for trouble .. creating some unintended consequence not realizing the restriction to 1b was there for some non-obvious reason. In any case I do suddenly have failures below to check on so I'll fix that first. Will likely continue to be very slow to respond to updates this coming week unfortunately but will get back to normal soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to avoid adding vpopcnt in the first place rather than cleaning up in a subsequent PR -- hopefully the changes are minor on the lowering side (basically it should be just a conditional in the Opcode::Popcnt case, on the type, like the arithmetic operators have). Thanks!

@jlb6740 jlb6740 force-pushed the x64_implement_packed_popcnt branch 2 times, most recently from e06d98d to 78cfe71 Compare May 21, 2021 04:17
_ => unreachable!(),
};
let ty_tmp = ty.unwrap();
if !ty_tmp.is_vector() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !ty_tmp.is_vector() {
let ty = ty.unwrap();
if !ty.is_vector() {

};
let ty_tmp = ty.unwrap();
if !ty_tmp.is_vector() {
let (ext_spec, ty) = match ctx.input_ty(insn, 0) {
Copy link
Contributor

@abrown abrown May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnjbvr, what does ty mean here? (Edit: what I mean is that how ty is used here is different than elsewhere throughout this file; I'm guessing that the input type may not be the controlling type so we need to figure it out and zero-extend it in some cases. Perhaps this could be clarified with a renaming of ty to input_ty and some comment about why this is all necessary?).

Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @jlb6740 about this (the scalar version is unchanged despite the crazy diff); we should merge this with minor tweaks to ty and a rebase to get CI to pass. It could be that in a future PR we could simplify the scalar code a bit but that shouldn't affect this PR.

@jlb6740 jlb6740 force-pushed the x64_implement_packed_popcnt branch from 78cfe71 to 1156fe9 Compare May 21, 2021 17:25
@jlb6740 jlb6740 force-pushed the x64_implement_packed_popcnt branch from 1156fe9 to 058f58a Compare May 21, 2021 21:53
@jlb6740 jlb6740 merged commit 9a5c960 into bytecodealliance:main May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants