Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work-around buggy Intel chips erroneously reporting BMI1/BMI2 support #1249

Merged
merged 1 commit into from
Nov 19, 2021

Conversation

thomcc
Copy link
Member

@thomcc thomcc commented Nov 7, 2021

Apparently on some chips, cpuid is not to be fully trusted. According to http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf, some Skylake chips (notably Celerons) will report support for BMI1 and BMI2 without actual support for those operations.

Quoting that document's Errata section:

SKL052 CPUID Incorrectly Reports Bit Manipulation Instructions Support
Problem Executing CPUID with EAX = 7 and ECX = 0 may return EBX with bits [3] and [8] set, incorrectly indicating the presence of BMI1 and BMI2 instruction set extensions.
Implication Attempting to use instructions from the BMI1 or BMI2 instruction set extensions will result in a #UD exception.
Workaround It is possible for the BIOS to contain a workaround for this erratum.

IIUC, the problematic chips are fairly widespread1, and searches online do seem to bring up people hitting this crash in the real world... so it would be nice if a better workaround existed than "ask the user to update their BIOS" (I think the OS could work around it too — if nothing else, by performing a similar hack to what's in this patch).

That said, one seems to exist2. From what I was able to tell3, all current Intel chips with working support for either BMI instruction set also have AVX, and the chips impacted by this bug here do not, so we can just assume any Intel chip without AVX also has no BMI1/BMI2. If this is correct, we can work around this with no downside to any current chip, so even if a more ideal place for a hack like this would be in the OS4, it's worth doing.

This is what's implemented in this patch, which also contains a big comment describing the issue, since it's pretty WTF.

Footnotes

  1. I am pretty sure I know someone with one. This would be great because it would allow me to verify the issue/fix, but the Intel documentation is pretty clear either way.

  2. It's not surprising that Intel wouldn't include this as the suggested workaround, since they might want to keep the ability to ship chips with BMI and not AVX in the future.

  3. Unfortunately, this isn't definitive, since I'm not sure where to find such information concretely, and just went by what I could find on Wikipedia and Agner. That said, it makes sense that this would be the case both now and in the future, and it's safe even if I'm wrong or things change in the future.

  4. And ofc we can't rely on the user's OS or BIOS being up to date, beyond what libstd supports.

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

if vendor_id == *b"GenuineIntel" && !value.test(Feature::avx as u32) {
value.unset(Feature::bmi1 as u32);
value.unset(Feature::bmi2 as u32);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you limit this to the specific cpu family that has the bug?

Copy link
Member Author

@thomcc thomcc Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably in theory, I mean there's model/family info in the bits of __cpuid_count(1, 0).eax (and we already have this from earlier in the function)

But it's not clear to me at all which values for these correspond to the impacted chip :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Affected Steppings for this appear to be...

Number Y U - H - S -
- D-1 D-1 K-1 N-0 R-0 R-0 S-0
SKL052 X X X X X X X

So that looks like effectively "all Skylake (non-server)", according to Intel, therefore we can constrain this logic to all Skylake processors. We don't know which of the processors this is actually true for, since it's a "may", so we must apply it to all Skylake "client" chips. The possible values for family, model, etc., are specified on page 21 of this specification update, so we just add another constraint.

New Skylake chips are not being shipped so we can assume this won't change.

@Amanieu Amanieu merged commit cfba59f into rust-lang:master Nov 19, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 20, 2021
Update stdarch

5 commits in 815d55c..cfba59f
2021-11-08 00:58:47 +0000 to 2021-11-19 01:29:04 +0000
- Work-around buggy Intel chips erroneously reporting BMI1/BMI2 support (rust-lang/stdarch#1249)
- complete armv8 instructions (rust-lang/stdarch#1256)
- Fix i8mm feature with bootstrap compiler. (rust-lang/stdarch#1252)
- Fix unused link_name attribute. (rust-lang/stdarch#1251)
- Add remaining insturctions (rust-lang/stdarch#1250)
@thomcc thomcc deleted the intel-fix-yo-chips branch January 13, 2022 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants