Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test issues on Ruby 3.4 #242

Merged
merged 9 commits into from
Jan 29, 2025
Merged

Fix test issues on Ruby 3.4 #242

merged 9 commits into from
Jan 29, 2025

Conversation

jasonroelofs
Copy link
Collaborator

@jasonroelofs jasonroelofs commented Jan 27, 2025

This PR fixes up the test problems we were seeing under Ruby 3.4. The main issue was the lack of RUBY_INIT_STACK which has existed for a long time but looks like for Ruby 3.4 is now definitely needed. The lack of this call in embed_ruby.cpp was leading to very strange and inconsistent GC-related errors.

@jasonroelofs
Copy link
Collaborator Author

Notes: This is not a new error, the current release also fails in this way. It's also not all tests, and we call rb_gc_start() quite a few times throughout the test suite. Will need to figure out what's different with the calls I commented out.

@cfis
Copy link
Collaborator

cfis commented Jan 27, 2025

Can you split this into 2 different PRs because I think there are 2 different issues. One is an embedding issue, one is a GC issue.

The first is that on 3.4 embedding Ruby fails. See for example:

https://github.com/ruby-rice/rice/actions/runs/12534403540/job/35007888873

In cases where the tests do run (just Fedora for me locally), then a GC bug happens (which sounds like what you are seeing). Pretty sure that is a new bug, and pretty sure that is a Ruby bug. When I looked at it, the problem is that Ruby is not calling the mark function when it should be.

I would not comment out the GC calls because that is just hiding the problem instead of solving it.

@jasonroelofs jasonroelofs changed the title Enable 3.4 and comment out the lines crashing tests WIP: Enable 3.4 and comment out the lines crashing tests Jan 27, 2025
@jasonroelofs jasonroelofs marked this pull request as draft January 27, 2025 21:12
@jasonroelofs
Copy link
Collaborator Author

Sorry should have been clearer about the purpose of this. I don't intend on merging this, I just wanted a place to show code changes that at least get us passed the segfaults, to see if something showed up that may help track down what's actually failing.

@jasonroelofs
Copy link
Collaborator Author

jasonroelofs commented Jan 27, 2025

Well I have a minimal reproducible example:

#include <ruby.h>

int main(int argc, char** argv)
{
  ruby_init();
  ruby_init_loadpath();

  char* options[] = { "-e;" };
  ruby_options(2, options);
}

And a build script:

#!/bin/bash
set -ex

PREFIX=[where ruby is installed]

clang++ -I$PREFIX/include/ruby-3.4.0/arm64-darwin24 -I$PREFIX/include/ruby-3.4.0 -L$PREFIX/lib  -lruby.3.4 test.cpp

@cfis
Copy link
Collaborator

cfis commented Jan 27, 2025

That script causes the crash? The test code includes a call to ruby_sysinit(&argc, &pArgv);

https://github.com/ruby-rice/rice/blob/master/test/embed_ruby.cpp#L14

@jasonroelofs
Copy link
Collaborator Author

jasonroelofs commented Jan 27, 2025

Yeah crashes with or without ruby_sysinit().

-- C level backtrace information -------------------------------------------
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_vm_bugreport+0xb6c) [0x1013572fc]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_bug_for_fatal_signal+0x100) [0x10118f26c]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(sigsegv+0x84) [0x1012b7f94]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x18a1f0184]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(callable_method_entry_or_negative+0xc0) [0x101333d78]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_vm_search_method_slowpath+0xc8) [0x1013278b0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(gccct_method_search_slowpath+0x24) [0x1013427e8]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_funcallv_scope+0x17c) [0x1013385f0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_funcall+0x88) [0x101338a60]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_obj_as_string+0x4c) [0x1012ca318]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(ruby__sfvextra+0xe0) [0x1012bde54]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(BSD_vfprintf+0x608) [0x1012bbea8]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(ruby_vsprintf0+0xa8) [0x1012bb58c]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_sprintf+0x5c) [0x1012bb704]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(unexpected_type+0x64) [0x1013c5840]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_unexpected_type+0x30) [0x1013c5888]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_fstring+0x1a0) [0x1012c72ac]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(require_internal+0x894) [0x1011fb244]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_require_string_internal+0x58) [0x1011fa610]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_f_require+0x44) [0x1011fa4e0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(vm_call_cfunc_with_frame_+0xf0) [0x101348ff4]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(vm_exec_core+0x22f4) [0x10132d2b0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_vm_exec+0x140) [0x101329be0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(load_iseq_eval+0x22c) [0x1011fd244]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(require_internal+0x974) [0x1011fb324]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_require_string_internal+0x58) [0x1011fa610]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_f_require+0x44) [0x1011fa4e0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(vm_call_cfunc_with_frame_+0xf0) [0x101348ff4]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(vm_exec_core+0x22f4) [0x10132d2b0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(rb_vm_exec+0x140) [0x101329be0]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(ruby_opt_init+0x138) [0x1012ae5b4]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(ruby_process_options+0x11c0) [0x1012ad204]
/Users/jasonroelofs/.local/lib/libruby.3.4.dylib(ruby_options+0xb0) [0x101199a48]
/Users/jasonroelofs/tmp/embed/a.out(main+0x60) [0x100bbff34]

@jasonroelofs
Copy link
Collaborator Author

Best I've tracked down so far: something is getting read into a global setting (such as RbConfig::CONFIG) that isn't actually a String but is expected to be so, which then triggers the segfault.

@jasonroelofs
Copy link
Collaborator Author

So I was making zero progress and things were just weird. Then I pulled up Ruby's own main and found a reference to RUBY_INIT_STACK.

https://github.com/ruby/ruby/blob/v3_4_1/include/ruby/internal/interpreter.h#L127-L144

With that call, my test script no longer segfaults. And neither does CI, just a few actual 3.4 related test failures.

@jasonroelofs
Copy link
Collaborator Author

Given the documentation of RUBY_INIT_STACK, I have a feeling that "embedding" and "GC" were actually one and the same issue. There was no well defined end of the stack, so GC would end up in lala land clearing out data as it saw fit. Which would explain why this was so weird to track down.

@cfis
Copy link
Collaborator

cfis commented Jan 28, 2025

Wow - nice work! Funny the new test failures in stl_map.

Looks like Windows is crashing right after the Address_Registration_Guard:register_address test. Will have to track that one down.

@jasonroelofs jasonroelofs marked this pull request as ready for review January 28, 2025 14:28
@jasonroelofs jasonroelofs changed the title WIP: Enable 3.4 and comment out the lines crashing tests Fix test issues on Ruby 3.4 Jan 28, 2025
Aparently Hash#to_s formatting changed with Ruby 3.4 so we need to track
that now.
Error message formatting got rid of the backtick.
`rake test` depends on `rake build` and CI was then triggering the full
rebuild twice.
The current windows-latest runners are unbelievably slow.
@jasonroelofs
Copy link
Collaborator Author

The Windows runners are awful, but tests are passing (Windows apparently still needs the ruby_sysinit call).

@jasonroelofs jasonroelofs merged commit 966d737 into master Jan 29, 2025
11 checks passed
@jasonroelofs jasonroelofs deleted the ruby-3-4-gc-issue branch January 29, 2025 00:46
@cfis
Copy link
Collaborator

cfis commented Jan 29, 2025

Cool. Its also easy for me to test windows locally (both mswin and mingw).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants