Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behaviour in multithreaded code #7881

Closed
aleksuss opened this issue Jan 19, 2021 · 8 comments
Closed

Undefined behaviour in multithreaded code #7881

aleksuss opened this issue Jan 19, 2021 · 8 comments

Comments

@aleksuss
Copy link
Contributor

aleksuss commented Jan 19, 2021

I decided to bump version of rocksdb up to 6.13.3 in rust-rocksdb but stuck with undefined behaviour while rust tests executed. I figured out that the reason of such behaviour is multithreaded execution of rust tests. Every test creates an instance of the database. Current release of rust-rocksdb crate uses rocksdb version 6.11.4. With this version such behaviour doesn't reproduce. So. I wrote c++ code which reproduces this behaviour. See below.

Expected behaviour

Creating an instance of the database per thread without errors.

Actual behavior

libc++abi.dylib: terminating
Process 86130 stopped
* thread #2, stop reason = signal SIGABRT
    frame #0: 0x00007fff20345462 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff20345462 <+10>: jae    0x7fff2034546c            ; <+20>
    0x7fff20345464 <+12>: movq   %rax, %rdi
    0x7fff20345467 <+15>: jmp    0x7fff2033f6a1            ; cerror_nocancel
    0x7fff2034546c <+20>: retq   
Target 0: (cpp-playground) stopped.
(lldb) bt
* thread #2, stop reason = signal SIGABRT
  * frame #0: 0x00007fff20345462 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff20373610 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007fff202c6720 libsystem_c.dylib`abort + 120
    frame #3: 0x00007fff20338418 libc++abi.dylib`abort_message + 231
    frame #4: 0x00007fff203299a3 libc++abi.dylib`demangling_terminate_handler() + 48
    frame #5: 0x00007fff20337847 libc++abi.dylib`std::__terminate(void (*)()) + 8
    frame #6: 0x00007fff203377f8 libc++abi.dylib`std::terminate() + 56
    frame #7: 0x00007fff203117ed libc++.1.dylib`std::__1::thread::~thread() + 17
    frame #8: 0x000000010026d741 librocksdb.6.dylib`std::__1::unique_ptr<std::__1::thread, std::__1::default_delete<std::__1::thread> >::reset(std::__1::thread*) + 25
    frame #9: 0x000000010026ca82 librocksdb.6.dylib`rocksdb::Timer::Start() + 116
    frame #10: 0x000000010026c66b librocksdb.6.dylib`rocksdb::PeriodicWorkScheduler::Register(rocksdb::DBImpl*, unsigned int, unsigned int) + 53
    frame #11: 0x0000000100220be3 librocksdb.6.dylib`rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::__1::vector<rocksdb::ColumnFamilyHandle*, std::__1::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) + 4207
    frame #12: 0x000000010021fb66 librocksdb.6.dylib`rocksdb::DB::Open(rocksdb::DBOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::__1::vector<rocksdb::ColumnFamilyHandle*, std::__1::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**) + 18
    frame #13: 0x00000001000026f8 cpp-playground`create_db(num=0) at main.cpp:32:19
    frame #14: 0x000000010000ea41 cpp-playground`decltype(__f=(0x0000000100c1b9d8), __args=0x0000000100c1b9e0)(int)>(fp)(std::__1::forward<int>(fp0))) std::__1::__invoke<void (*)(int), int>(void (*&&)(int), int&&) at type_traits:3545:1
    frame #15: 0x000000010000e98e cpp-playground`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int), int, 2ul>(__t=size=3, (null)=__tuple_indices<2> @ 0x000070000afecf58)(int), int>&, std::__1::__tuple_indices<2ul>) at thread:273:5
    frame #16: 0x000000010000e136 cpp-playground`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int), int> >(__vp=0x0000000100c1b9d0) at thread:284:5
    frame #17: 0x00007fff20373950 libsystem_pthread.dylib`_pthread_start + 224
    frame #18: 0x00007fff2036f47b libsystem_pthread.dylib`thread_start + 15

Steps to reproduce the behavior

#include <rocksdb/db.h>

#include <array>
#include <filesystem>
#include <iostream>
#include <string>
#include <thread>
#include <vector>

using namespace rocksdb;
namespace fs = std::filesystem;

static void create_db(int num) {
    std::string path =
        fs::temp_directory_path().generic_string() + std::to_string(num);
    DB* db = nullptr;
    DBOptions options;
    std::vector<ColumnFamilyDescriptor> cf_descriptors;
    std::vector<ColumnFamilyHandle*> cf_handlers;

    cf_descriptors.emplace_back("default", ColumnFamilyOptions());
    cf_descriptors.emplace_back("cf1", ColumnFamilyOptions());
    options.create_if_missing = true;
    options.create_missing_column_families = true;

    auto status = DB::Open(options, path, cf_descriptors, &cf_handlers, &db);
    std::cout << status.ToString() << std::endl;
    assert(status.ok());
    assert(cf_handlers.size() == 2);

    delete db;
    DestroyDB(path, options, cf_descriptors);
}

int main() {
    std::vector<std::thread> threads;
    threads.reserve(10);

    for (int i = 0; i < 10; ++i) {
        threads.emplace_back(create_db, i);
    }

    for (auto& thread : threads) {
        thread.join();
    }

    return 0;
}
@aleksuss
Copy link
Contributor Author

aleksuss commented Jan 19, 2021

By the way, if I remove line with delete db; everything works as excepted 🤔 But question is, would be a memory leak in case if I remove line delete db; or this line is redundant ?

@ajkr
Copy link
Contributor

ajkr commented Jan 19, 2021

I built the provided repro against a recent master with assertions enabled; the error then is:

test: db/column_family.cc:1466: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref' failed.

After adding the following code above delete db;, running the repro does not produce any error on my machine:

    for (auto* cfh : cf_handlers) {
      db->DestroyColumnFamilyHandle(cfh);
    }

Did this repro code produce the same stack trace for you as the original report (failing in rocksdb::PeriodicWorkScheduler::Register())?

@aleksuss
Copy link
Contributor Author

After adding:

for (auto* cfh : cf_handlers) {
      db->DestroyColumnFamilyHandle(cfh);
}

the stacktrace hasn't been changed:

libc++abi.dylib: terminating
Process 11187 stopped
* thread #8, stop reason = signal SIGABRT
    frame #0: 0x00007fff20345462 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff20345462 <+10>: jae    0x7fff2034546c            ; <+20>
    0x7fff20345464 <+12>: movq   %rax, %rdi
    0x7fff20345467 <+15>: jmp    0x7fff2033f6a1            ; cerror_nocancel
    0x7fff2034546c <+20>: retq   
Target 0: (cpp-playground) stopped.
(lldb) bt
* thread #8, stop reason = signal SIGABRT
  * frame #0: 0x00007fff20345462 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff20373610 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007fff202c6720 libsystem_c.dylib`abort + 120
    frame #3: 0x00007fff20338418 libc++abi.dylib`abort_message + 231
    frame #4: 0x00007fff203299a3 libc++abi.dylib`demangling_terminate_handler() + 48
    frame #5: 0x00007fff20337847 libc++abi.dylib`std::__terminate(void (*)()) + 8
    frame #6: 0x00007fff203377f8 libc++abi.dylib`std::terminate() + 56
    frame #7: 0x00007fff203117ed libc++.1.dylib`std::__1::thread::~thread() + 17
    frame #8: 0x0000000100261741 librocksdb.6.dylib`std::__1::unique_ptr<std::__1::thread, std::__1::default_delete<std::__1::thread> >::reset(std::__1::thread*) + 25
    frame #9: 0x0000000100260a82 librocksdb.6.dylib`rocksdb::Timer::Start() + 116
    frame #10: 0x000000010026066b librocksdb.6.dylib`rocksdb::PeriodicWorkScheduler::Register(rocksdb::DBImpl*, unsigned int, unsigned int) + 53
    frame #11: 0x0000000100214be3 librocksdb.6.dylib`rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::__1::vector<rocksdb::ColumnFamilyHandle*, std::__1::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) + 4207
    frame #12: 0x0000000100213b66 librocksdb.6.dylib`rocksdb::DB::Open(rocksdb::DBOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::__1::vector<rocksdb::ColumnFamilyHandle*, std::__1::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**) + 18
    frame #13: 0x0000000100005758 cpp-playground`create_db(num=6) at main.cpp:29:19
    frame #14: 0x000000010000f001 cpp-playground`decltype(__f=(0x0000000100d163e8), __args=0x0000000100d163f0)(int)>(fp)(std::__1::forward<int>(fp0))) std::__1::__invoke<void (*)(int), int>(void (*&&)(int), int&&) at type_traits:3545:1
    frame #15: 0x000000010000ef4e cpp-playground`void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int), int, 2ul>(__t=size=3, (null)=__tuple_indices<2> @ 0x000070000b490f58)(int), int>&, std::__1::__tuple_indices<2ul>) at thread:273:5
    frame #16: 0x000000010000e6f6 cpp-playground`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int), int> >(__vp=0x0000000100d163e0) at thread:284:5
    frame #17: 0x00007fff20373950 libsystem_pthread.dylib`_pthread_start + 224
    frame #18: 0x00007fff2036f47b libsystem_pthread.dylib`thread_start + 15

But I didn't try to add a definition ROCKSDB_ASSERT_STATUS_CHECKED.

By the way. I'm testing it on macos big sur 11.1.

@jay-zhuang
Copy link
Contributor

Seems the same issue as #7711 which happens when closing the last cf and re-open.

ajkr added a commit to ajkr/rocksdb that referenced this issue Jan 21, 2021
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()`. The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its timer. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Test Plan: ran the repro provided in facebook#7881
@ajkr
Copy link
Contributor

ajkr commented Jan 21, 2021

Thanks for the extra info. Indeed your repro worked on my machine after running it in a loop (should've tried that earlier :/ ). Anyway, we can do a workaround for now (#7888) as the long-term fix appears tricky.

Will you be able to upgrade to the latest minor version (6.15.x), or will you need the fix backported farther to the 6.13 release series?

@aleksuss
Copy link
Contributor Author

aleksuss commented Jan 21, 2021

The latest minor version of 6.15.x with the fix would be great. It seems that #7888 really fixes the issue. Thanks. Will be waiting for a new tag with the fix 👍🏻

facebook-github-bot pushed a commit that referenced this issue Jan 21, 2021
#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see #7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: #7888

Test Plan: ran the repro provided in #7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
ajkr added a commit that referenced this issue Jan 21, 2021
#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see #7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: #7888

Test Plan: ran the repro provided in #7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
ajkr added a commit that referenced this issue Jan 21, 2021
#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see #7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: #7888

Test Plan: ran the repro provided in #7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
@ajkr
Copy link
Contributor

ajkr commented Jan 21, 2021

Sure, here is the 6.15.4 tag with the fix: https://github.com/facebook/rocksdb/releases/tag/v6.15.4

ltamasi pushed a commit that referenced this issue Jan 21, 2021
#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see #7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: #7888

Test Plan: ran the repro provided in #7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
@aleksuss
Copy link
Contributor Author

@ajkr thanks !

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this issue Mar 5, 2021
facebook#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook#7888

Test Plan: ran the repro provided in facebook#7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
Layamon pushed a commit to bytedance/terarkdb that referenced this issue Mar 30, 2021
…n (#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook/rocksdb#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook/rocksdb#7888

Test Plan: ran the repro provided in facebook/rocksdb#7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
mm304321141 pushed a commit to bytedance/terarkdb that referenced this issue Mar 31, 2021
…n (#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook/rocksdb#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook/rocksdb#7888

Test Plan: ran the repro provided in facebook/rocksdb#7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
Yuanliang-Wang pushed a commit to Yuanliang-Wang/terarkdb4zns that referenced this issue Apr 7, 2021
…n (#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook/rocksdb#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook/rocksdb#7888

Test Plan: ran the repro provided in facebook/rocksdb#7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
bijanoviedo pushed a commit to bijanoviedo/rocksdb that referenced this issue May 12, 2021
facebook#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook#7888

Test Plan: ran the repro provided in facebook#7881

Reviewed By: jay-zhuang

Differential Revision: D25990891

Pulled By: ajkr

fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
benhannel pushed a commit to rockset/rocksdb-cloud that referenced this issue May 20, 2021
facebook#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook#7888

Test Plan: ran the repro provided in facebook#7881

Reviewers: igor

Subscribers:

Differential Revision: https://rockset.phacility.com/D25990891
benhannel pushed a commit to rockset/rocksdb-cloud that referenced this issue May 21, 2021
facebook#7888)

Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:

(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see facebook#7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.

Pull Request resolved: facebook#7888

Test Plan: ran the repro provided in facebook#7881

Reviewers: igor

Subscribers:

Differential Revision: https://rockset.phacility.com/D25990891
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants