Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty thread names based on best-effort parsing of Thread#inspect output #54

Closed
wants to merge 3 commits into from

Conversation

dalehamel
Copy link
Contributor

@dalehamel dalehamel commented Feb 23, 2024

Problem

It looks like nothing sets the thread name in the output anymore, so the tracks in firefox-profiler will not show any name:

Screenshot 2024-02-23 at 1 24 39 PM

Checking out the json file that is going into this, the thread name is "". If no name was ever set, this is expected. However if a name was set, it is still not making it to the gecko output.

I originally suspected #46, but this is happening for me (on ruby 3.3.0) even with older gems (0.3.0, 0.3.1) that tried to get it from the pthread. In any case, the code to get this from the native extension is currently commented out and nothing seems to set it anywhere.

Proposed solution

As @casperisfine has pointed out, calling "inspect" on a Thread object gives us a lot of useful information, including the thread name if it is set. So, even though we cannot directly get the name from the native extension, we should be able to inspect the thread objects we already have access to.

However, we need to be careful about this as calling rb_inspect requires the GVL is acquired. Within the native extension then, after our callback is fired for RUBY_INTERNAL_THREAD_EVENT_RESUMED, we should be able to safely call it. We will just directly store the result as the thread's "name" if we are able to do so.

If we were never able to set this within the native code before we attempt to get the gecko output, we'll try looking up the thread via its object id and calling inspect from ruby land. Either way, the output will be fed into a pretty_name function to try and display the name as nicely as possible:

  • Prefer to have the thread name as the left-most value, followed by the path if specified
  • Trim the gem directories from the paths, if they were set, so they won't be quite as long

With this proposed change, here is what the output looks like:

Screenshot 2024-02-26 at 3 36 41 PM

if tr && tr.name && !tr.name.empty?
@name = tr.name
else
@name = "UNNAMED [#{@ruby_thread_id}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@name = "UNNAMED [#{@ruby_thread_id}]"
@name = tr.inspect

Thread#inspect contains the location at which it was spawed, which is generally enough to figure out what it is:

"#<Thread:0x00000001459196f0@/tmp/foo.rb:1 run>"

You could even massage this to only extract the location with a split('@')

Copy link
Contributor Author

@dalehamel dalehamel Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperisfine love it - i'll give this a go on monday and update the test. Much better context, way more useful. Will also encourage people to possibly set the name of the thread, as they'll know where it originated 👍

Note to self, format originates here https://docs.ruby-lang.org/en/master/Thread.html#method-i-to_s

   str = rb_sprintf("#<%"PRIsVALUE":%p", cname, (void *)thread);
    if (!NIL_P(target_th->name)) {
        rb_str_catf(str, "@%"PRIsVALUE, target_th->name);
    }
    if ((loc = threadptr_invoke_proc_location(target_th)) != Qnil) {
        rb_str_catf(str, " %"PRIsVALUE":%"PRIsVALUE,
                    RARRAY_AREF(loc, 0), RARRAY_AREF(loc, 1));
    }
    rb_str_catf(str, " %s>", status);

@@ -165,6 +165,15 @@ def initialize(ruby_thread_id, profile, categorizer, name:, tid:, samples:, weig
@tid = tid
@name = name

if @name == ""
tr = ObjectSpace._id2ref(@ruby_thread_id)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC, but this seems like an okay fallback.

One problem though is _id2ref will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC

Yeah, we could just record rb_inspect(thread), I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.

The obvious choice would be in the callback for the thread start, but the thread name is assigned after it's created, so might be a bit weird.

So perhaps it should be collected on thread exit callback, and then for all other still alive threads at the end of the profile.

Copy link
Contributor Author

@dalehamel dalehamel Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem though is _id2ref will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?

Added a rescue for RangeError to cover this case

Yeah, we could just record rb_inspect(thread), I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.

I decided to take a crack at this, basically set the thread name the first time it gets to run, and theRESUME callback gets fired (since we should have the GVL).

If we never got the inspect output in the native code, it falls back to a best-effort attempt to grab it from ruby land.

Either way, whatever we got from inspect either in native or ruby call is then parsed to a prettier value in ruby during the output generation.

@dalehamel dalehamel force-pushed the name-threads-in-output branch from 583db15 to 2e2e9fc Compare February 26, 2024 17:24
tr = ObjectSpace._id2ref(@ruby_thread_id)
name = tr.inspect if tr
end
return name unless name.start_with?("#<Thread")
Copy link
Contributor Author

@dalehamel dalehamel Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case to cover "retained memory" name set for :retained mode, or if we ever got another name somehow. Also if we couldn't get any inspect output, no sense in doing the parsing below.

@@ -385,6 +385,20 @@ def string_table

private

def pretty_name(name)
if name.empty?
tr = ObjectSpace._id2ref(@ruby_thread_id)
Copy link
Contributor Author

@dalehamel dalehamel Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we somehow didn't get the inspect value in the native extension (ie, the resume hook never ran?), fallback to doing it in ruby on a "best effort" basis, based on the object id.

return name unless name.start_with?("#<Thread")
pretty = []
obj_address = name[/Thread:(0x\d+)/,1]
best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1]
Copy link
Contributor Author

@dalehamel dalehamel Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i originally wanted to do this just splitting on whitespace, but realized that this could get complicated very fast:

  • the thread name itself can contain spaces
  • the file path can contain spaces
  • both of these are optional

So instead, I try to just anchor on what we know will be present at the start (#<Thread:ADDRESS), and end ( STATE>)

If the thread was named, then the match for @? should catch this. Otherwise, try to just capture everything between the start and end anchors.

@dalehamel dalehamel marked this pull request as draft February 26, 2024 17:31
@dalehamel
Copy link
Contributor Author

Set this back to draft while I update the description and address the great comments so far.

@dalehamel dalehamel force-pushed the name-threads-in-output branch 2 times, most recently from d23f585 to d22290f Compare February 26, 2024 20:54
@dalehamel dalehamel changed the title Fetch thread names from ruby when generating output Pretty thread names based on best-effort parsing of Thread#inspect output Feb 26, 2024
@dalehamel dalehamel marked this pull request as ready for review February 26, 2024 21:02
@dalehamel
Copy link
Contributor Author

Ok, i think this is ready for another look.

I think I've got a way to safely get the thread name from native code via rb_inspect. I can remove that native code if it is contentious, as ultimately the same value is attempted to be grabbed in ruby land and is parsed in the same way. The native code just covers getting us a name in the case where the thread was GC'd by the time we got to ruby.

I made the tests more explicit to cover the different cases of the inspect output having a name, file path and line number, etc.

pretty = []
obj_address = name[/Thread:(0x\w+)/,1]
best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1] || ""
Gem.path.each { |gem_dir| best_id = best_id.gsub(gem_dir, "...") }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not totally necessary I guess, but otherwise the paths can end up being pretty huge and we probably only care about the portion of the path after the gem actually starts.

Comment on lines 942 to 948
if (thread.name == "") {
// rb_inspect should be safe here, as RUNNING should correspond to RESUMED hook from internal_thread_event_cb
// which is called with GVL per https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/thread.h#L247-L248
VALUE thread_str = rb_inspect(th);
if (RTEST(thread_str))
thread.name = StringValueCStr(thread_str);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe here as both of these can raise and we're not ready for that (and holding onto some locks). I'll remove this before merging.

Copy link
Contributor Author

@dalehamel dalehamel Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhawthorn ah my bad, makes sense though. Thanks!

Just pushed a commit to delete it to save you the trouble.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! It's helpful actually as I was able to use this for a regression test 😊

@dalehamel dalehamel force-pushed the name-threads-in-output branch from d22290f to 8d58a16 Compare February 27, 2024 22:38
@jhawthorn
Copy link
Owner

Merged. Thank you!

@jhawthorn jhawthorn closed this Feb 27, 2024
@jhawthorn
Copy link
Owner

@dalehamel released as part of 0.5.0

@dalehamel
Copy link
Contributor Author

cheers, thanks @jhawthorn 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants