Pretty thread names based on best-effort parsing of Thread#inspect output #54

dalehamel · 2024-02-23T18:37:22Z

Problem

It looks like nothing sets the thread name in the output anymore, so the tracks in firefox-profiler will not show any name:

Checking out the json file that is going into this, the thread name is "". If no name was ever set, this is expected. However if a name was set, it is still not making it to the gecko output.

I originally suspected #46, but this is happening for me (on ruby 3.3.0) even with older gems (0.3.0, 0.3.1) that tried to get it from the pthread. In any case, the code to get this from the native extension is currently commented out and nothing seems to set it anywhere.

Proposed solution

As @casperisfine has pointed out, calling "inspect" on a Thread object gives us a lot of useful information, including the thread name if it is set. So, even though we cannot directly get the name from the native extension, we should be able to inspect the thread objects we already have access to.

However, we need to be careful about this as calling rb_inspect requires the GVL is acquired. Within the native extension then, after our callback is fired for RUBY_INTERNAL_THREAD_EVENT_RESUMED, we should be able to safely call it. We will just directly store the result as the thread's "name" if we are able to do so.

If we were never able to set this within the native code before we attempt to get the gecko output, we'll try looking up the thread via its object id and calling inspect from ruby land. Either way, the output will be fed into a pretty_name function to try and display the name as nicely as possible:

Prefer to have the thread name as the left-most value, followed by the path if specified
Trim the gem directories from the paths, if they were set, so they won't be quite as long

With this proposed change, here is what the output looks like:

casperisfine · 2024-02-23T21:41:55Z

lib/vernier/output/firefox.rb

+            if tr && tr.name && !tr.name.empty?
+              @name = tr.name
+            else
+              @name = "UNNAMED [#{@ruby_thread_id}]"


Suggested change

@name = "UNNAMED [#{@ruby_thread_id}]"

@name = tr.inspect

Thread#inspect contains the location at which it was spawed, which is generally enough to figure out what it is:

"#<Thread:0x00000001459196f0@/tmp/foo.rb:1 run>"

You could even massage this to only extract the location with a split('@')

@casperisfine love it - i'll give this a go on monday and update the test. Much better context, way more useful. Will also encourage people to possibly set the name of the thread, as they'll know where it originated 👍

Note to self, format originates here https://docs.ruby-lang.org/en/master/Thread.html#method-i-to_s

str = rb_sprintf("#<%"PRIsVALUE":%p", cname, (void *)thread); if (!NIL_P(target_th->name)) { rb_str_catf(str, "@%"PRIsVALUE, target_th->name); } if ((loc = threadptr_invoke_proc_location(target_th)) != Qnil) { rb_str_catf(str, " %"PRIsVALUE":%"PRIsVALUE, RARRAY_AREF(loc, 0), RARRAY_AREF(loc, 1)); } rb_str_catf(str, " %s>", status);

jhawthorn · 2024-02-24T00:35:39Z

lib/vernier/output/firefox.rb

@@ -165,6 +165,15 @@ def initialize(ruby_thread_id, profile, categorizer, name:, tid:, samples:, weig
          @tid = tid
          @name = name

+          if @name == ""
+            tr = ObjectSpace._id2ref(@ruby_thread_id)


I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC, but this seems like an okay fallback.

One problem though is _id2ref will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?

I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC

Yeah, we could just record rb_inspect(thread), I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.

The obvious choice would be in the callback for the thread start, but the thread name is assigned after it's created, so might be a bit weird.

So perhaps it should be collected on thread exit callback, and then for all other still alive threads at the end of the profile.

One problem though is _id2ref will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?

Added a rescue for RangeError to cover this case

Yeah, we could just record rb_inspect(thread), I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.

I decided to take a crack at this, basically set the thread name the first time it gets to run, and theRESUME callback gets fired (since we should have the GVL).

If we never got the inspect output in the native code, it falls back to a best-effort attempt to grab it from ruby land.

Either way, whatever we got from inspect either in native or ruby call is then parsed to a prettier value in ruby during the output generation.

dalehamel · 2024-02-26T17:25:37Z

lib/vernier/output/firefox.rb

+            tr = ObjectSpace._id2ref(@ruby_thread_id)
+            name = tr.inspect if tr
+          end
+          return name unless name.start_with?("#<Thread")


case to cover "retained memory" name set for :retained mode, or if we ever got another name somehow. Also if we couldn't get any inspect output, no sense in doing the parsing below.

dalehamel · 2024-02-26T17:25:58Z

lib/vernier/output/firefox.rb

@@ -385,6 +385,20 @@ def string_table

        private

+        def pretty_name(name)
+          if name.empty?
+            tr = ObjectSpace._id2ref(@ruby_thread_id)


if we somehow didn't get the inspect value in the native extension (ie, the resume hook never ran?), fallback to doing it in ruby on a "best effort" basis, based on the object id.

dalehamel · 2024-02-26T17:28:49Z

lib/vernier/output/firefox.rb

+          return name unless name.start_with?("#<Thread")
+          pretty = []
+          obj_address = name[/Thread:(0x\d+)/,1]
+          best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1]


i originally wanted to do this just splitting on whitespace, but realized that this could get complicated very fast:

the thread name itself can contain spaces

the file path can contain spaces

both of these are optional

So instead, I try to just anchor on what we know will be present at the start (#<Thread:ADDRESS), and end ( STATE>)

If the thread was named, then the match for @? should catch this. Otherwise, try to just capture everything between the start and end anchors.

dalehamel · 2024-02-26T17:52:36Z

Set this back to draft while I update the description and address the great comments so far.

dalehamel · 2024-02-26T21:06:09Z

Ok, i think this is ready for another look.

I think I've got a way to safely get the thread name from native code via rb_inspect. I can remove that native code if it is contentious, as ultimately the same value is attempted to be grabbed in ruby land and is parsed in the same way. The native code just covers getting us a name in the case where the thread was GC'd by the time we got to ruby.

I made the tests more explicit to cover the different cases of the inspect output having a name, file path and line number, etc.

dalehamel · 2024-02-26T21:06:52Z

lib/vernier/output/firefox.rb

+          pretty = []
+          obj_address = name[/Thread:(0x\w+)/,1]
+          best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1] || ""
+          Gem.path.each { |gem_dir| best_id = best_id.gsub(gem_dir, "...") }


not totally necessary I guess, but otherwise the paths can end up being pretty huge and we probably only care about the portion of the path after the gem actually starts.

jhawthorn · 2024-02-27T22:34:46Z

ext/vernier/vernier.cc

+                        if (thread.name == "") {
+                            // rb_inspect should be safe here, as RUNNING should correspond to RESUMED hook from internal_thread_event_cb
+                            // which is called with GVL per https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/thread.h#L247-L248
+                            VALUE thread_str = rb_inspect(th);
+                            if (RTEST(thread_str))
+                                thread.name = StringValueCStr(thread_str);
+                        }


I don't think this is safe here as both of these can raise and we're not ready for that (and holding onto some locks). I'll remove this before merging.

@jhawthorn ah my bad, makes sense though. Thanks!

Just pushed a commit to delete it to save you the trouble.

All good! It's helpful actually as I was able to use this for a regression test 😊

jhawthorn · 2024-02-27T22:48:29Z

Merged. Thank you!

jhawthorn · 2024-02-27T23:43:49Z

@dalehamel released as part of 0.5.0

dalehamel · 2024-02-27T23:51:10Z

cheers, thanks @jhawthorn 🎉

dalehamel mentioned this pull request Feb 23, 2024

Fetch thread names from ruby when outputting dalehamel/vernier#1

Closed

casperisfine reviewed Feb 23, 2024

View reviewed changes

jhawthorn reviewed Feb 24, 2024

View reviewed changes

Best effort thread naming based on inspect output

2e2e9fc

dalehamel force-pushed the name-threads-in-output branch from 583db15 to 2e2e9fc Compare February 26, 2024 17:24

dalehamel commented Feb 26, 2024

View reviewed changes

dalehamel marked this pull request as draft February 26, 2024 17:31

Handle case where thread was GC'd by the time vernier tries to output

2d72492

dalehamel force-pushed the name-threads-in-output branch 2 times, most recently from d23f585 to d22290f Compare February 26, 2024 20:54

dalehamel changed the title ~~Fetch thread names from ruby when generating output~~ Pretty thread names based on best-effort parsing of Thread#inspect output Feb 26, 2024

dalehamel marked this pull request as ready for review February 26, 2024 21:02

dalehamel commented Feb 26, 2024

View reviewed changes

jhawthorn reviewed Feb 27, 2024

View reviewed changes

Don't attempt to inspect thread from native code

8d58a16

dalehamel force-pushed the name-threads-in-output branch from d22290f to 8d58a16 Compare February 27, 2024 22:38

jhawthorn closed this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretty thread names based on best-effort parsing of Thread#inspect output #54

Pretty thread names based on best-effort parsing of Thread#inspect output #54

dalehamel commented Feb 23, 2024 •

edited

Loading

casperisfine Feb 23, 2024

dalehamel Feb 23, 2024 •

edited

Loading

jhawthorn Feb 24, 2024

casperisfine Feb 26, 2024

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel commented Feb 26, 2024

dalehamel commented Feb 26, 2024

dalehamel Feb 26, 2024

jhawthorn Feb 27, 2024

dalehamel Feb 27, 2024 •

edited

Loading

jhawthorn Feb 27, 2024

jhawthorn commented Feb 27, 2024

jhawthorn commented Feb 27, 2024

dalehamel commented Feb 27, 2024

Pretty thread names based on best-effort parsing of Thread#inspect output #54

Pretty thread names based on best-effort parsing of Thread#inspect output #54

Conversation

dalehamel commented Feb 23, 2024 • edited Loading

Problem

Proposed solution

casperisfine Feb 23, 2024

Choose a reason for hiding this comment

dalehamel Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

jhawthorn Feb 24, 2024

Choose a reason for hiding this comment

casperisfine Feb 26, 2024

Choose a reason for hiding this comment

dalehamel Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

dalehamel Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

dalehamel Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

dalehamel Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

dalehamel commented Feb 26, 2024

dalehamel commented Feb 26, 2024

dalehamel Feb 26, 2024

Choose a reason for hiding this comment

jhawthorn Feb 27, 2024

Choose a reason for hiding this comment

dalehamel Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

jhawthorn Feb 27, 2024

Choose a reason for hiding this comment

jhawthorn commented Feb 27, 2024

jhawthorn commented Feb 27, 2024

dalehamel commented Feb 27, 2024

dalehamel commented Feb 23, 2024 •

edited

Loading

dalehamel Feb 23, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 26, 2024 •

edited

Loading

dalehamel Feb 27, 2024 •

edited

Loading