Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panicked during epoch::pin() #1042

Closed
siyuan0322 opened this issue Nov 30, 2023 · 9 comments
Closed

Panicked during epoch::pin() #1042

siyuan0322 opened this issue Nov 30, 2023 · 9 comments

Comments

@siyuan0322
Copy link

I came across a weird panic recently, which seems is the guard_count: usize has overflowed during calling epoch::pin(). But I'm sure I haven't call epoch::pin() that many times.

I'm wondering if this ever occurred you guys, and could kindly provides some insights on this issue.

thread 'reactor 0' panicked at 'called `Option::unwrap()` on a `None` value', /home/graphscope/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-epoch-0.9.15/src/internal.rs:386:57

The relevant code context are

#[inline]
pub(crate) fn pin(&self) -> Guard {
let guard = Guard { local: self };
let guard_count = self.guard_count.get();
self.guard_count.set(guard_count.checked_add(1).unwrap());
if guard_count == 0 {

@taiki-e
Copy link
Member

taiki-e commented Dec 1, 2023

In what environment did you encounter this problem? Do you have an example to reproduce this problem?

I'm wondering if this ever occurred you guys,

I have never seen this problem, but assuming you are using a 64-bit system, it would take years to cause an overflow due to addition, so I guess it is more likely an overflow due to subtraction caused by an incorrect invocation of unpin. If so, it's either a bug on our code, a bug in the standard library or platform (especially around thread-local), or a bug in your code, but if you're not using unsafe code, it's one of the former two.

@taiki-e
Copy link
Member

taiki-e commented Dec 2, 2023

If you have encountered rust-lang/rust#47949 (rustc bug) in some way, I think it is possible to trigger this problem because the destructor of the guard in repin_after will not be called. I can't immediately think of an example that would trigger it, though.

@siyuan0322
Copy link
Author

Thanks you very much, I'm trying to get a backtrace of the panic currently. I can't reproduce it with a small example, which makes it hard to debug.

@siyuan0322
Copy link
Author

siyuan0322 commented Dec 6, 2023

Hi, I want to thanks for your help in previous communications. After a clear inspect of the source code, I have another two question about the pin() and unpin() here,

  • The guard_count is a usize that at least should be 0, but why there it checks for if guard_count == 0 after the self.guard_count.set(guard_count.checked_add(1).unwrap());
    pub(crate) fn pin(&self) -> Guard {
        let guard = Guard { local: self };

        let guard_count = self.guard_count.get();
        self.guard_count.set(guard_count.checked_add(1).unwrap());

        if guard_count == 0 {
  • Why do not check the guard_count against 0 but 1, in the unpin(), I see it checked it against 0 in the finalize() afterward.
    pub(crate) fn unpin(&self) {
        let guard_count = self.guard_count.get();
        self.guard_count.set(guard_count - 1);

        if guard_count == 1 {
            self.epoch.store(Epoch::starting(), Ordering::Release);

            if self.handle_count.get() == 0 {
                self.finalize();
            }
        }
    }

@siyuan0322
Copy link
Author

Now I managed to reproduce this but several times, sometimes it panicked in the overflow of checked_add(1) during pin(), sometimes it panicked in the overflow of subtract of guard_count - 1 during unpin(), which is in the drop() of the Guard.

😢 totally confused.

@longbinlai
Copy link

Hi, I want to thanks for your help in previous communications. After a clear inspect of the source code, I have another two question about the pin() and unpin() here,

  • The guard_count is a usize that at least should be 0, but why there it checks for if guard_count == 0 after the self.guard_count.set(guard_count.checked_add(1).unwrap());
    pub(crate) fn pin(&self) -> Guard {
        let guard = Guard { local: self };

        let guard_count = self.guard_count.get();
        self.guard_count.set(guard_count.checked_add(1).unwrap());

        if guard_count == 0 {
  • Why do not check the guard_count against 0 but 1, in the unpin(), I see it checked it against 0 in the finalize() afterward.
    pub(crate) fn unpin(&self) {
        let guard_count = self.guard_count.get();
        self.guard_count.set(guard_count - 1);

        if guard_count == 1 {
            self.epoch.store(Epoch::starting(), Ordering::Release);

            if self.handle_count.get() == 0 {
                self.finalize();
            }
        }
    }

@taiki-e Could you please help us with the above question? Thank you so much.

@taiki-e
Copy link
Member

taiki-e commented Dec 21, 2023

  • The guard_count is a usize that at least should be 0, but why there it checks for if guard_count == 0 after the self.guard_count.set(guard_count.checked_add(1).unwrap());
  • Why do not check the guard_count against 0 but 1, in the unpin(),

Getting a value, then adding or subtracting it, and then checking the old value is the same idiom as fetch_add or fetch_sub.

In the former case, guard_count == 0 means that there was no pinned one; in the latter, guard_count == 1 means that it was the last one (so check if it is needed to call the finalize).

I see it checked it against 0 in the finalize() afterward.

It checks handle_count, not guard_count. (If there is a live handle, finalize cannot be called.)

@taiki-e
Copy link
Member

taiki-e commented Dec 21, 2023

Now I managed to reproduce this

Is it possible to provide this reproduction?

If this is difficult, at least please provide us information about your build environment, build configuration, other dependencies, and any unsafe code you may have.

TBH I feel it is impossible to help you with just the information currently provided.

@longbinlai
Copy link

Now I managed to reproduce this

Is it possible to provide this reproduction?

If this is difficult, at least please provide us information about your build environment, build configuration, other dependencies, and any unsafe code you may have.

TBH I feel it is impossible to help you with just the information currently provided.

Indeed, we are diligently analyzing our code but have yet to uncover significant findings. Concurrently, we're delving into the intricacies of Crossbeam's code to gain a deeper understanding. During this process, we may occasionally seek your assistance with questions like those previously mentioned. We genuinely appreciate your support and guidance in these matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants