Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track GC total time #868

Closed
9 of 10 tasks
tombruijn opened this issue Jul 29, 2022 · 1 comment
Closed
9 of 10 tasks

Track GC total time #868

tombruijn opened this issue Jul 29, 2022 · 1 comment
Assignees

Comments

@tombruijn
Copy link
Member

tombruijn commented Jul 29, 2022

In PR #852 we added tracking for gc_total_time, renamed to gc_time in PR #874 . But then in PR appsignal/public_config#38 I've removed the GC total time graph from the Ruby VM dashboard because it's empty by default. To avoid confusion and an increase in support conversations I've removed the graph.

To make this metric work, the app needs to call GC::Profiler.enable to enable the GC profiler that Appsignal::GarbageCollectionProfiler relies on. This is not on by default in any app.

We have a way to enable this in the Ruby gem with the undocumented enable_gc_instrumentation config option. We never shipped and documented this. My understanding was that it wasn't done (nothing is done with the data on the frontend) and it wasn't tested enough. I don't fully know the impact of enabling the GC Profiler by default and in the moment (Thursday afternoon 2022-07-28) I didn't want to ship this to every app using our Ruby gem.

We need to figure out the impact of enabling the GC profiler, decide if we want to enable it by default or put it behind the enable_gc_instrumentation config option, and document that. We have decided not to enable it by default. (#876)

Also required for this is PR #867 that fixes the gc_time metric. To update the dashboard on the server side we're blocked on https://github.com/appsignal/appsignal-server/issues/4770.

To do

  • Decide what to do with the Transaction level GC instrumentation (ask Thijs)
  • Add docs for:
  • Deploy new graph to dashboard
    • Merge linked PR
    • Submit PR with updated submodule to appsignal-server repo
  • Release Ruby gem 3.1.4
  • Deploy server and run dashboard updater for Ruby VM dashboard
  • Merge docs PR

And then decide how and when to update the existing dashboards.

tombruijn added a commit that referenced this issue Aug 4, 2022
Update naming to only be about the time measured.

It no longer reports the total time, but reports the delta of two
measurements.

Part of #868
tombruijn added a commit that referenced this issue Aug 4, 2022
Update naming to only be about the time measured.

It no longer reports the total time, but reports the delta of two
measurements.

Part of #868
@tombruijn tombruijn self-assigned this Aug 4, 2022
tombruijn added a commit to appsignal/public_config that referenced this issue Aug 4, 2022
Add the Garbage Collection time graph back to the Ruby VM dashboard
after it was added and removed in PR
#38.

This graph will not have any data by default. We need a way to redirect
users to our docs on how to enable this. We also need to write docs for
this, basically saying: Add `GC::Profiler.enable` to your app, but not
in production or not for long in production because it may adversely
impact the performance of your app.

Also, to update existing dashboards, I refer to the dashboard update
issue we should pick up:
appsignal/appsignal-server#4770

Part of appsignal/appsignal-ruby#868
tombruijn added a commit to appsignal/public_config that referenced this issue Aug 4, 2022
Add the Garbage Collection time graph back to the Ruby VM dashboard
after it was added and removed in PR
#38.

I've updated the metric name to `gc_time` in Ruby gem PR:
appsignal/appsignal-ruby#874

This graph will not have any data by default. We need a way to redirect
users to our docs on how to enable this. We also need to write docs for
this, basically saying: Add `GC::Profiler.enable` to your app, but not
in production or not for long in production because it may adversely
impact the performance of your app.

Also, to update existing dashboards, I refer to the dashboard update
issue we should pick up:
appsignal/appsignal-server#4770

Part of appsignal/appsignal-ruby#868
tombruijn added a commit to appsignal/public_config that referenced this issue Aug 4, 2022
Add the Garbage Collection time graph back to the Ruby VM dashboard
after it was added and removed in PR
#38.

I've updated the metric name to `gc_time` in Ruby gem PR:
appsignal/appsignal-ruby#874

This graph will not have any data by default. We need a way to redirect
users to our docs on how to enable this. We cannot link to this from the
graph description.

We also need to write docs for this, basically saying: Add
`GC::Profiler.enable` to your app, but not in production or not for long
in production because it may adversely impact the performance of your
app.

Also, to update existing dashboards, I refer to the dashboard update
issue we should pick up:
appsignal/appsignal-server#4770

Part of appsignal/appsignal-ruby#868
tombruijn added a commit to appsignal/public_config that referenced this issue Aug 5, 2022
Add the Garbage Collection time graph back to the Ruby VM dashboard
after it was added and removed in PR
#38.

I've updated the metric name to `gc_time` in Ruby gem PR because it
doesn't report the total time anymore:
appsignal/appsignal-ruby#874
appsignal/appsignal-ruby#867

This graph will not have any data by default. We need a way to direct
users to our docs on how to enable this. We cannot link to this from the
graph description that I could tell, the graph description does not
support any link format.

We need to write some docs for this, basically saying: Add
`GC::Profiler.enable` to your app, but not in production or not for long
in production because it may negatively impact the performance of your
app.

Also, to update existing dashboards, I refer to the dashboard update
issue we should pick up:
appsignal/appsignal-server#4770

Part of appsignal/appsignal-ruby#868
tombruijn added a commit that referenced this issue Aug 5, 2022
The MRI probe always initialized the profiler needed to actually profile
the garbage collection. If `enable_gc_instrumentation` is set to
`false`, which it is by default, it has some overhead that's not needed.

Instead it should use the `NilProfiler` like we had in the Transaction
class.

Move this logic to a new `GarbageCollection` wrapper module in which the
`GarbageCollection.profiler` will check the config option value and
return the suitable profiler.

Part of #868
tombruijn added a commit that referenced this issue Aug 5, 2022
Let's only report the `gc_time` metric when there's an actual value to
report to avoid confusion of very small GC times.

If the Profiler is disabled, it reports 0.0 as the value for this
metric. We know when the profiler is enabled by asking it
`GC::Profiler.enabled?`.

By directly asking the Ruby GC profiler, instead of our wrapper based on
the `enable_gc_instrumentation` config option, we allow users to enable
and disable the profiler temporarily with `GC::Profiler.enable` and
`GC::Profiler.disable` calls in their app.

If the profiler is enabled, disabled, and then enabled again, it
remembers the total time from the last time it was enabled. By not
reporting 0 when it is disabled we don't reset the cache and don't
trigger a very high duration being reported the next time it's enabled.

```ruby
> GC::Profiler.total_time
# => 0.0

> GC::Profiler.enable
> GC::Profiler.total_time
=> 0.001
> 10000.times { SecureRandom.uuid }
# => Does stuff for a long time
> GC::Profiler.total_time
# => 58.93548299997067

> GC::Profiler.disable
> GC::Profiler.total_time
# => 0.0
> 10000.times { SecureRandom.uuid }
# => Does stuff for a long time
> GC::Profiler.total_time
# => 0.0
> GC::Profiler.enable
> GC::Profiler.total_time
# => 58.93548299997067
# Same as before it was disabled without the GC time of the code that
# was run when it was disabled
```

Part of #868
tombruijn added a commit that referenced this issue Aug 5, 2022
I don't think we should provide this in our gem, but have users add this
to their own app manually in a Rails's `application.rb` file,
`development.rb` file or deeper in their app code.

This way they also have more control over when to disable it again, by
using `GC::Profiler.disable` if they only want to enable it for certain
parts of their app. It leaves control with the app developer.

The `enable_gc_instrumentation` config option should remain so that we
fetch the Garbage Collection profiler wrapper or the nil wrapper
`NilProfiler`.

Part of #868
@tombruijn tombruijn removed the discuss label Aug 5, 2022
tombruijn added a commit that referenced this issue Aug 5, 2022
Let's only report the `gc_time` metric when there's an actual value to
report to avoid confusion of very small GC times.

If the Profiler is disabled, it reports 0.0 as the value for this
metric. We know when the profiler is enabled by asking it
`GC::Profiler.enabled?`.

By directly asking the Ruby GC profiler, instead of our wrapper based on
the `enable_gc_instrumentation` config option, we allow users to enable
and disable the profiler temporarily with `GC::Profiler.enable` and
`GC::Profiler.disable` calls in their app.

If the profiler is enabled, disabled, and then enabled again, it
remembers the total time from the last time it was enabled. By not
reporting 0 when it is disabled we don't reset the cache and don't
trigger a very high duration being reported the next time it's enabled.

```ruby
> GC::Profiler.total_time
# => 0.0

> GC::Profiler.enable
> GC::Profiler.total_time
=> 0.001
> 10000.times { SecureRandom.uuid }
# => Does stuff for a long time
> GC::Profiler.total_time
# => 58.93548299997067

> GC::Profiler.disable
> GC::Profiler.total_time
# => 0.0
> 10000.times { SecureRandom.uuid }
# => Does stuff for a long time
> GC::Profiler.total_time
# => 0.0
> GC::Profiler.enable
> GC::Profiler.total_time
# => 58.93548299997067
# Same as before it was disabled without the GC time of the code that
# was run when it was disabled
```

Part of #868
tombruijn added a commit that referenced this issue Aug 5, 2022
I don't think we should provide this in our gem, but have users add this
to their own app manually in a Rails's `application.rb` file,
`development.rb` file or deeper in their app code.

This way they also have more control over when to disable it again, by
using `GC::Profiler.disable` if they only want to enable it for certain
parts of their app. It leaves control with the app developer.

The `enable_gc_instrumentation` config option should remain so that we
fetch the Garbage Collection profiler wrapper or the nil wrapper
`NilProfiler`.

Part of #868
tombruijn added a commit that referenced this issue Aug 10, 2022
When the app calls `GC::Profiler.enable` we will now automatically
collect the Garbage Collection time for the Ruby VM magic dashboard.

We check if an app has enabled the GC profiler by calling
`GC::Profiler.enabled?`. This operation is only checking a boolean value
in Ruby, which is also much faster the previous implementation of
checking the now removed `enable_gc_instrumentation` config option.

Previously we had the undocumented `enable_gc_instrumentation` config
option, which called `GC::Profiler.enable`. We removed this in PR #876.

We want to give developers more flexibility and allow to enable and
disable it at will. This way they can choose to only profile certain
parts of their app without having it enabled on app start.

Part of #868
tombruijn added a commit that referenced this issue Aug 10, 2022
When the app calls `GC::Profiler.enable` we will now automatically
collect the Garbage Collection time for the Ruby VM magic dashboard.

We check if an app has enabled the GC profiler by calling
`GC::Profiler.enabled?`. This operation is only checking a boolean value
in Ruby, which is also much faster the previous implementation of
checking the now removed `enable_gc_instrumentation` config option.

Previously we had the undocumented `enable_gc_instrumentation` config
option, which called `GC::Profiler.enable`. We removed this in PR #876.

We want to give developers more flexibility and allow to enable and
disable it at will. This way they can choose to only profile certain
parts of their app without having it enabled on app start.

Part of #868
tombruijn added a commit to appsignal/public_config that referenced this issue Aug 15, 2022
Add the Garbage Collection time graph back to the Ruby VM dashboard
after it was added and removed in PR
#38.

I've updated the metric name to `gc_time` in Ruby gem PR because it
doesn't report the total time anymore:
appsignal/appsignal-ruby#874
appsignal/appsignal-ruby#867

This graph will not have any data by default. We need a way to direct
users to our docs on how to enable this. We cannot link to this from the
graph description that I could tell, the graph description does not
support any link format.

We need to write some docs for this, basically saying: Add
`GC::Profiler.enable` to your app, but not in production or not for long
in production because it may negatively impact the performance of your
app.

Also, to update existing dashboards, I refer to the dashboard update
issue we should pick up:
appsignal/appsignal-server#4770

Part of appsignal/appsignal-ruby#868
@backlog-helper
Copy link

While performing the daily checks some issues were found with this issue.

  • This issue has not had any activity in 14 days. Please provide a status update if it is still relevant. Closed it if it is no longer relevant. Or move it to another column if it's blocked or requires another look at it. - (More info)

New issue guide | Backlog management | Rules | Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant