Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address stacked borrows violation in SymbolTable::intern #236

Merged
merged 5 commits into from
Jul 26, 2023

Conversation

lopopolo
Copy link
Member

@lopopolo lopopolo commented Jul 26, 2023

This stacked borrows violation impacts all SymbolTable variants in this crate and has been present in all published versions.

The SymbolTable::intern function previously worked like this:

  1. Convert the given contents to be interned into a Cow<'static, T>.
  2. Check to see if this content is already interned by checking the internal bimap. If found, return the matching symbol.
  3. Convert the Cow into an Interned which is an enum of either a &'static T or a Box<T>.
  4. Borrow a &'static T from the Interned.
  5. Push the Interned into the vec.
  6. Add the &'static T to the map.
  7. Check that the insertion happened correctly with some debug assertions.
  8. Return the computed symbol.

As of at least rustc 1.73.0-nightly (31395ec38 2023-07-24), Miri pops all borrows from the Interned at step 5 when the Interned is moved into the vec, which means the borrow made at 4 is untracked when Miri executes the debug assertions.

The fix here is to make the assignment/move of the Interned happen first so Miri doesn't retag the inner Box<T>. This is accomplished by reordering the self.vec.push operation to immediately follow step 3. An additional unchecked index into the vec is used to get a shared access to that Interned object to later derive the &'static T.

This commit also adds an additional test to resolve a slice from the symbol table to ensure Miri does not disagree as the Interned is moved during growth of the SymbolTable.

Fixes #235.

@lopopolo lopopolo added A-security Area: Security vulnerabilities and unsoundness issues. I-bytes Interner: Byte string SymbolTable. C-bug Category: This is a bug. I-str Interner: UTF-8 string SymbolTable. I-cstr Interner: C string SymbolTable. A-interner Area: String interners, data structures, and types. I-path Interner: Path string SymbolTable. I-platform Interner: OsStr (platform) string SymbolTable. labels Jul 26, 2023
@lopopolo
Copy link
Member Author

@CAD97 sorry for the out of the blue ping, but you've contributed here in the past and if you could lend a hand, I would appreciate a review on this fix for a stacked borrows violation in the intaglio string interner.

@lopopolo lopopolo force-pushed the lopopolo/intern-stacked-borrows-miri-gh-235 branch 2 times, most recently from 2193b3b to 569b01b Compare July 26, 2023 06:09
@lopopolo
Copy link
Member Author

marking this as blocked since I would like to get this reviewed by a few eyes before merging.

@lopopolo lopopolo added S-do-not-merge Status: This pull request should not be merged. S-blocked Status: Marked as blocked ❌ on something else such as other implementation work. labels Jul 26, 2023
@CAD97
Copy link
Contributor

CAD97 commented Jul 26, 2023

Speaking for myself, not as part of T-opsem.

If the box gets moved when the symbol table grows, that's still a potential problem. It's probably not one which will actually get hit, but that's because of implementation details of the collection (i.e. that it uses realloc and/or untyped copies to move stuff around, instead of typed moves).

The more thorough fix would be to replace the use of Box with a PinBox style type, basically just wrapping a NonNull<T> to call Box::from_raw again when it gets dropped.

Make sure to test with -Zmiri-retag-fields, as that can expose more issues along this line due to retagging when moving owned boxes more often, and is going to be turned on as the default in the nearish future. Or actually this may have shown up as an issue because of it being turned on; I'm not exactly certain since I'm just looking at this patch.

That said, if you're okay with relying on the implementation of the symbol table (probably Vec? I'm just looking at the patch and don't recall if it's that or doing something more clever) only using untyped copies when resizing, this is an accurate targeted fix for the problematic retag on insert, and the description provided is accurate.

Note that as a slight mitigation, despite the Rust-level UB, the fact that interned symbols are immutable means that rustc has not yet exploited this UB, and thus the lowering to LLVM IR does not contain any UB. (This may change in the future.)

@lopopolo
Copy link
Member Author

Thanks @CAD97! Responses to your comments inline.

If the box gets moved when the symbol table grows, that's still a potential problem. It's probably not one which will actually get hit, but that's because of implementation details of the collection (i.e. that it uses realloc and/or untyped copies to move stuff around, instead of typed moves).

I suspected this to be the case which is why I added this line to all of the tests that run under Miri. I hope this will prevent regressions if the underlying collection, which is Vec, changes its copying behavior.

assert_eq!(table.get(Symbol::new(0)).unwrap().len(), 100);

The more thorough fix would be to replace the use of Box with a PinBox style type, basically just wrapping a NonNull<T> to call Box::from_raw again when it gets dropped.

This seems like a reasonable change to make as well as one that's not too disruptive to the Slice abstraction, since it goes through a pointer anyway to create the static reference. As an aside, it looks like you contributed the current representation for the Slice enum in #28.

Make sure to test with -Zmiri-retag-fields, as that can expose more issues along this line due to retagging when moving owned boxes more often, and is going to be turned on as the default in the nearish future.

Yup it looks like this being made the default in nightly is the reason I got the build failure on the weekly CI that runs on a cron against HEAD.

Do you have any suggestions for additional Miri flags I should add in CI? The current set of MIRIFLAGS is:

MIRIFLAGS: "-Zmiri-strict-provenance -Zmiri-symbolic-alignment-check -Zrandomize-layout"

That said, if you're okay with relying on the implementation of the symbol table (probably Vec? I'm just looking at the patch and don't recall if it's that or doing something more clever) only using untyped copies when resizing, this is an accurate targeted fix for the problematic retag on insert, and the description provided is accurate.

Note that as a slight mitigation, despite the Rust-level UB, the fact that interned symbols are immutable means that rustc has not yet exploited this UB, and thus the lowering to LLVM IR does not contain any UB. (This may change in the future.)

This is correct, the symbol tables store the Slice objects in a Vec. Thanks for the confirmation that this fix is appropriate and I've properly diagnosed the issue Miri identified.

@lopopolo
Copy link
Member Author

lopopolo commented Jul 26, 2023

@CAD97 I've added a PinBox type derived from yours in bfbea04. Since this is largely your code, I'd appreciate it if you could take one last look and could bless its use here in intaglio. Especially since it appears the simple-interner repository it comes from does not have a license associated with it.

lopopolo added 2 commits July 26, 2023 12:12
This stacked borrows violation impacts all `SymbolTable` variants in
this crate and has been present in all published versions.

The `SymbolTable::intern` function previously worked like this:

1. Convert the given contents to be interned into a `Cow<'static, T>`.
2. Check to see if this content is already interned by checking the
  internal bimap. If found, return the matching symbol.
3. Convert the `Cow` into an `Interned` which is an enum of either a
  &'static T` or a `Box<T>`.
4. Borrow a `&'static T` from the `Interned`.
5. Push the `Interned` into the vec.
6. Add the `&'static T` to the map.
7. Check that the insertion happened correctly with some debug
  assertions.
8. Return the computed symbol.

As of at least rustc 1.73.0-nightly (31395ec38 2023-07-24), Miri pops
all borrows from the `Interned` at step 5 when the `Interned` is moved
into the vec, which means the borrow made at 4 is untracked when Miri
executes the debug assertions.

The fix here is to make the assignment/move of the `Interned` happen
first so Miri doesn't retag the inner `Box<T>`. This is accomplished by
reordering the `self.vec.push` operation to immediately follow step 3.
An additional unchecked index into the vec is used to get a shared
access to that `Interned` object to later derive the `&'static T`.

This commit also adds an additional test to resolve a slice from the
symbol table to ensure Miri does not disagree as the `Interned` is moved
during growth of the `SymbolTable`.

Fixes #235.
@lopopolo lopopolo removed S-do-not-merge Status: This pull request should not be merged. S-blocked Status: Marked as blocked ❌ on something else such as other implementation work. labels Jul 26, 2023
@lopopolo
Copy link
Member Author

not as part of T-opsem

TIL: https://rust-lang.github.io/rfcs/3346-t-opsem.html

Create an operational semantics team that is tasked with owning the semantics of unsafe code. This responsibility would be transferred from T-types, which had previously been given ownership of this domain. Additionally, this team replaces the Unsafe Code Guidelines working group, which has been doing much of the work in this space.

Thanks again for your help @CAD97!

lopopolo added a commit that referenced this pull request Jul 26, 2023
See #235, #236.
See #236 (comment).

Introduce a new `PinBox` type which stores a boxed slice as a `NonNull`
pointer. The pointer is converted back to a `Box` on drop.
@lopopolo lopopolo force-pushed the lopopolo/intern-stacked-borrows-miri-gh-235 branch from bfbea04 to 71c9493 Compare July 26, 2023 19:23
See #235, #236.
See #236 (comment).

Introduce a new `PinBox` type which stores a boxed slice as a `NonNull`
pointer. The pointer is converted back to a `Box` on drop.
@lopopolo lopopolo force-pushed the lopopolo/intern-stacked-borrows-miri-gh-235 branch from 71c9493 to c0d1eb6 Compare July 26, 2023 19:25
@CAD97
Copy link
Contributor

CAD97 commented Jul 26, 2023

simple-interner license

Bad habit of not bothering to add an actual LICENSE file; the Cargo.toml records the SPDX license identifier as MIT OR Apache-2.0. However, I also consider PinBox trivial and noncopyrightable, thus hereby officially place its implementation in the public domain to the full extent possible.

Everything looks good to me. Theoretically the change to insertion ordering isn't needed once using PinBox, but it doesn't hurt either.

@lopopolo lopopolo merged commit e2820d2 into trunk Jul 26, 2023
@lopopolo lopopolo deleted the lopopolo/intern-stacked-borrows-miri-gh-235 branch July 26, 2023 21:00
@lopopolo
Copy link
Member Author

lopopolo commented Jul 26, 2023

Now that this is released in v1.9.0, I've filled a RustSec advisory here:

I do not intend to file for a GitHub Security Advisory (GHSA) or a CVE.

@lopopolo
Copy link
Member Author

Make sure to test with -Zmiri-retag-fields, as that can expose more issues along this line due to retagging when moving owned boxes more often, and is going to be turned on as the default in the nearish future. Or actually this may have shown up as an issue because of it being turned on; I'm not exactly certain since I'm just looking at this patch.

it looks like yup! merged 5 days ago, which means the cron running the weekly Miri CI would have hit it on Tuesday.

@lopopolo
Copy link
Member Author

I missed the Send and Sync impls on PinBox in this PR which led to the Send and Sync auto traits being dropped from the symbol table types in v1.9.0. I just yanked v1.9.0 and re-released this fix as v1.9.1 with an additional bit of code to preserve the Send and Sync bounds on PinBox in #240.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interner Area: String interners, data structures, and types. A-security Area: Security vulnerabilities and unsoundness issues. C-bug Category: This is a bug. I-bytes Interner: Byte string SymbolTable. I-cstr Interner: C string SymbolTable. I-path Interner: Path string SymbolTable. I-platform Interner: OsStr (platform) string SymbolTable. I-str Interner: UTF-8 string SymbolTable.
Development

Successfully merging this pull request may close these issues.

Miri failure on rustc 1.73.0-nightly (31395ec38 2023-07-24)
2 participants