-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty thread names based on best-effort parsing of Thread#inspect output #54
Conversation
lib/vernier/output/firefox.rb
Outdated
if tr && tr.name && !tr.name.empty? | ||
@name = tr.name | ||
else | ||
@name = "UNNAMED [#{@ruby_thread_id}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@name = "UNNAMED [#{@ruby_thread_id}]" | |
@name = tr.inspect |
Thread#inspect
contains the location at which it was spawed, which is generally enough to figure out what it is:
"#<Thread:0x00000001459196f0@/tmp/foo.rb:1 run>"
You could even massage this to only extract the location with a split('@')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperisfine love it - i'll give this a go on monday and update the test. Much better context, way more useful. Will also encourage people to possibly set the name of the thread, as they'll know where it originated 👍
Note to self, format originates here https://docs.ruby-lang.org/en/master/Thread.html#method-i-to_s
str = rb_sprintf("#<%"PRIsVALUE":%p", cname, (void *)thread);
if (!NIL_P(target_th->name)) {
rb_str_catf(str, "@%"PRIsVALUE, target_th->name);
}
if ((loc = threadptr_invoke_proc_location(target_th)) != Qnil) {
rb_str_catf(str, " %"PRIsVALUE":%"PRIsVALUE,
RARRAY_AREF(loc, 0), RARRAY_AREF(loc, 1));
}
rb_str_catf(str, " %s>", status);
lib/vernier/output/firefox.rb
Outdated
@@ -165,6 +165,15 @@ def initialize(ruby_thread_id, profile, categorizer, name:, tid:, samples:, weig | |||
@tid = tid | |||
@name = name | |||
|
|||
if @name == "" | |||
tr = ObjectSpace._id2ref(@ruby_thread_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC, but this seems like an okay fallback.
One problem though is _id2ref
will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's probably a better way to do this from within the C extension so that we can capture thread names before they GC
Yeah, we could just record rb_inspect(thread)
, I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.
The obvious choice would be in the callback for the thread start, but the thread name is assigned after it's created, so might be a bit weird.
So perhaps it should be collected on thread exit callback, and then for all other still alive threads at the end of the profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem though is _id2ref will raise if the thread is GC'd (they should be collectible when stopped even with Vernier running). Could we make sure that is handled properly?
Added a rescue for RangeError to cover this case
Yeah, we could just record rb_inspect(thread), I'm unfamiliar with Vernier internals though, so not sure where it would be best to call this.
I decided to take a crack at this, basically set the thread name the first time it gets to run, and theRESUME
callback gets fired (since we should have the GVL).
If we never got the inspect output in the native code, it falls back to a best-effort attempt to grab it from ruby land.
Either way, whatever we got from inspect either in native or ruby call is then parsed to a prettier value in ruby during the output generation.
583db15
to
2e2e9fc
Compare
tr = ObjectSpace._id2ref(@ruby_thread_id) | ||
name = tr.inspect if tr | ||
end | ||
return name unless name.start_with?("#<Thread") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case to cover "retained memory" name set for :retained
mode, or if we ever got another name somehow. Also if we couldn't get any inspect output, no sense in doing the parsing below.
lib/vernier/output/firefox.rb
Outdated
@@ -385,6 +385,20 @@ def string_table | |||
|
|||
private | |||
|
|||
def pretty_name(name) | |||
if name.empty? | |||
tr = ObjectSpace._id2ref(@ruby_thread_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we somehow didn't get the inspect value in the native extension (ie, the resume
hook never ran?), fallback to doing it in ruby on a "best effort" basis, based on the object id.
return name unless name.start_with?("#<Thread") | ||
pretty = [] | ||
obj_address = name[/Thread:(0x\d+)/,1] | ||
best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i originally wanted to do this just splitting on whitespace, but realized that this could get complicated very fast:
- the thread name itself can contain spaces
- the file path can contain spaces
- both of these are optional
So instead, I try to just anchor on what we know will be present at the start (#<Thread:ADDRESS
), and end ( STATE>
)
If the thread was named, then the match for @?
should catch this. Otherwise, try to just capture everything between the start and end anchors.
Set this back to draft while I update the description and address the great comments so far. |
d23f585
to
d22290f
Compare
Ok, i think this is ready for another look. I think I've got a way to safely get the thread name from native code via I made the tests more explicit to cover the different cases of the inspect output having a name, file path and line number, etc. |
lib/vernier/output/firefox.rb
Outdated
pretty = [] | ||
obj_address = name[/Thread:(0x\w+)/,1] | ||
best_id = name[/\#<Thread:0x\w+@?\s?(.*)\s+\S+>/,1] || "" | ||
Gem.path.each { |gem_dir| best_id = best_id.gsub(gem_dir, "...") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not totally necessary I guess, but otherwise the paths can end up being pretty huge and we probably only care about the portion of the path after the gem actually starts.
ext/vernier/vernier.cc
Outdated
if (thread.name == "") { | ||
// rb_inspect should be safe here, as RUNNING should correspond to RESUMED hook from internal_thread_event_cb | ||
// which is called with GVL per https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/thread.h#L247-L248 | ||
VALUE thread_str = rb_inspect(th); | ||
if (RTEST(thread_str)) | ||
thread.name = StringValueCStr(thread_str); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is safe here as both of these can raise and we're not ready for that (and holding onto some locks). I'll remove this before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhawthorn ah my bad, makes sense though. Thanks!
Just pushed a commit to delete it to save you the trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good! It's helpful actually as I was able to use this for a regression test 😊
d22290f
to
8d58a16
Compare
Merged. Thank you! |
@dalehamel released as part of 0.5.0 |
cheers, thanks @jhawthorn 🎉 |
Problem
It looks like nothing sets the thread name in the output anymore, so the tracks in firefox-profiler will not show any name:
Checking out the json file that is going into this, the thread name is "". If no name was ever set, this is expected. However if a name was set, it is still not making it to the gecko output.
I originally suspected #46, but this is happening for me (on ruby 3.3.0) even with older gems (0.3.0, 0.3.1) that tried to get it from the pthread. In any case, the code to get this from the native extension is currently commented out and nothing seems to set it anywhere.
Proposed solution
As @casperisfine has pointed out, calling "inspect" on a
Thread
object gives us a lot of useful information, including the thread name if it is set. So, even though we cannot directly get the name from the native extension, we should be able to inspect the thread objects we already have access to.However, we need to be careful about this as calling
rb_inspect
requires the GVL is acquired. Within the native extension then, after our callback is fired forRUBY_INTERNAL_THREAD_EVENT_RESUMED
, we should be able to safely call it. We will just directly store the result as the thread's "name" if we are able to do so.If we were never able to set this within the native code before we attempt to get the gecko output, we'll try looking up the thread via its object id and calling inspect from ruby land. Either way, the output will be fed into a
pretty_name
function to try and display the name as nicely as possible:With this proposed change, here is what the output looks like: