Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a global lock to FatalErrorHandler to avoid re-entry #122144

Closed
wants to merge 1 commit into from

Conversation

DianQK
Copy link
Member

@DianQK DianQK commented Mar 7, 2024

Fixes #122089. Multiple threads calling LLVMRustWriteOutputFile may lead to repeated call FatalErrorHandler after encountering errors.

r? saethlin

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 7, 2024
@saethlin
Copy link
Member

saethlin commented Mar 7, 2024

I am not sure this actually fixes the problem. I can't provoke any crashes with a debug build of the write shim, but with a release build I can still get crashes, though they look different. I suspect that the slowness of the backtrace checking is responsible. Building the shim with RUSTFLAGS=-Copt-level=1 cargo build seems sufficient to provoke crashes.

@workingjubilee
Copy link
Member

It is my understanding based on the diagnostics we have so far that it is not actually FatalErrorHandler that is responsible, yes, thus locking it is pointless, it merely "looks sus".

@workingjubilee
Copy link
Member

@DianQK Consider the change I made in #122061

The reason I did this is that we hit a cascade of LLVM ERROR but cannot fully distinguish "LLVM is emitting this error" vs. "our fatal error handler emits the error". There is another "fatal error" that LLVM emits that is only called from the destructor of raw_fd_os_stream. If that is the primary source of this repetition, then we already have multiple places which can be exposed to that destructor.

Honestly, I suspect we simply aren't handling errors properly at all, based on this SO:

https://stackoverflow.com/questions/59911752/are-llvm-fatal-errors-really-fatal

@DianQK
Copy link
Member Author

DianQK commented Mar 8, 2024

I am not sure this actually fixes the problem. I can't provoke any crashes with a debug build of the write shim, but with a release build I can still get crashes, though they look different. I suspect that the slowness of the backtrace checking is responsible. Building the shim with RUSTFLAGS=-Copt-level=1 cargo build seems sufficient to provoke crashes.

In fact, when debugging locally, I can reproduce using a fn main() {}.
Interesting. Not sure why, but I am no longer able to reproduce it with this patch in exa and fn main() {}.

@workingjubilee Actually it can still be reproduced when I comment out install_fatal_error_handler. Maybe it is not the essence of the problem.

The interesting thing is that after turning on sanitizer, I can get the following log:

../llvm-project/build-san/bin/llc -O1 foo.ll -o /dev/full 
LLVM ERROR: IO failure on output stream: No space left on device

=================================================================
==56987==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 10 byte(s) in 1 object(s) allocated from:
    #0 0x5651d68ac378 in strdup (/home/dianqk/llvm/llvm-project/build-san/bin/llc+0x17c378)
    #1 0x7f0eaa631da5 in (anonymous namespace)::FileToRemoveList::FileToRemoveList(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/dianqk/llvm/llvm-project/llvm/lib/Support/Unix/Signals.inc:107:55
    #2 0x7f0eaa631da5 in (anonymous namespace)::FileToRemoveList::insert(std::atomic<(anonymous namespace)::FileToRemoveList*>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/dianqk/llvm/llvm-project/llvm/lib/Support/Unix/Signals.inc:122:37
    #3 0x7f0eaa631da5 in llvm::sys::RemoveFileOnSignal(llvm::StringRef, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*) /home/dianqk/llvm/llvm-project/llvm/lib/Support/Unix/Signals.inc:448:3
    #4 0x7f0eaa523af5 in llvm::ToolOutputFile::ToolOutputFile(llvm::StringRef, std::error_code&, llvm::sys::fs::OpenFlags) /home/dianqk/llvm/llvm-project/llvm/lib/Support/ToolOutputFile.cpp:42:7
    #5 0x5651d6904ce9 in std::__detail::_MakeUniq<llvm::ToolOutputFile>::__single_object std::make_unique<llvm::ToolOutputFile, llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>&, std::error_code&, llvm::sys::fs::OpenFlags&>(llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>&, std::error_code&, llvm::sys::fs::OpenFlags&) /nix/store/cmr8qd8w64w8q0cbfc30p98z2pydc1k7-gcc-13.2.0/include/c++/13.2.0/bits/unique_ptr.h:1070:34
    #6 0x5651d6904ce9 in GetOutputStream(char const*, llvm::Triple::OSType, char const*) /home/dianqk/llvm/llvm-project/llvm/tools/llc/llc.cpp:308:16
    #7 0x5651d6904ce9 in compileModule(char**, llvm::LLVMContext&) /home/dianqk/llvm/llvm-project/llvm/tools/llc/llc.cpp:613:7
    #8 0x5651d68ff76f in main /home/dianqk/llvm/llvm-project/llvm/tools/llc/llc.cpp:408:22
    #9 0x7f0ea9a3f0cd in __libc_start_call_main (/nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/libc.so.6+0x280cd) (BuildId: e948d07286607c37a05e967820db9a8138b5ab72)

SUMMARY: AddressSanitizer: 10 byte(s) leaked in 1 allocation(s).

But I'm not sure if it's relevant yet. (I am not sure yet why integrating sanitizer into rustc fails to build, that should be some issue with my development environment.)

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 8, 2024
@Dylan-DPC
Copy link
Member

@DianQK any updates on this? thanks

@DianQK
Copy link
Member Author

DianQK commented May 1, 2024

@DianQK any updates on this? thanks

Ah, I have other unfinished PRs, and this current PR is at the bottom of my priority list to complete. :)

@DianQK DianQK closed this May 1, 2024
@DianQK DianQK deleted the llvm_fatal_error_lock branch August 12, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustc segfaults if out-of-disk is encountered inside LLVM
5 participants