Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Polling of metrics in custom plugins stops if an error is raised inside the mfa function for the metric #236

Closed
fedme opened this issue Jun 6, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@fedme
Copy link

fedme commented Jun 6, 2024

Describe the bug
Polling of metrics in custom plugins stops if an error is raised inside the mfa function for the metric.

To Reproduce
Steps to reproduce the behavior:

  1. Clone this example repository: https://github.com/fedme/prom_ex_issue
    The sample application defines a custom PromEx plugin here: https://github.com/fedme/prom_ex_issue/blob/main/lib/prom_ex_issue/custom_prom_ex_plugin.ex

  2. Start the sample application with mix phx.server and look at the logs in the terminal

  3. Observe the logs showing that the mfa function for the metric is called at every polling interval, you should see the following output:

  ######################################################################
MFA execute_ping_metrics called for the 1 time.
######################################################################

######################################################################
MFA execute_ping_metrics called for the 2 time.
######################################################################

[...]
  1. The plugin is written so that the mfa function raises an error the 6th time it is polled, you should see something like the following output in the console:
[...]

######################################################################
MFA execute_ping_metrics called for the 4 time.
######################################################################

######################################################################
MFA execute_ping_metrics called for the 5 time.
######################################################################

[error] Error when calling MFA defined by measurement: PromExIssue.CustomPromExPlugin :execute_ping_metrics [#PID<0.676.0>]
Class=:error
Reason=%RuntimeError{
  message: "Something is not working correctly, I can't return the metrics right now!"
}
Stacktrace=[
  {PromExIssue.CustomPromExPlugin, :execute_ping_metrics, 1,
   [
     file: ~c"lib/prom_ex_issue/custom_prom_ex_plugin.ex",
     line: 48,
     error_info: %{module: Exception}
   ]},
  {:telemetry_poller, :make_measurement, 1,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 336
   ]},
  {:telemetry_poller, :"-make_measurements_and_filter_misbehaving/1-lc$^0/1-0-",
   1,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 332
   ]},
  {:telemetry_poller, :handle_info, 2,
   [
     file: ~c"/Users/fedme/code/prom_ex_issue/deps/telemetry_poller/src/telemetry_poller.erl",
     line: 354
   ]},
  {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1095]},
  {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1183]},
  {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}
]
  1. Notice that the metric is not polled anymore after the exception, no more logs in the console alerting us that the function is being polled.

Expected behavior
Even if the metric function raises an error at a certain poll invocation, the polling should not stop and rather keep going so that future values of the metric can be collected after the error (hopefully) goes away.

Environment

Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.16.0 (compiled with Erlang/OTP 26)

Additional context
First raised on Slack.

@fedme fedme added the bug Something isn't working label Jun 6, 2024
@akoutmos
Copy link
Owner

akoutmos commented Jun 7, 2024

Thanks for the detailed issue! I should be able to knock this out over the weekend. I'm currently working on another open source library....so I should be able to tackle this as well given I am in open source mode 😄

@akoutmos
Copy link
Owner

akoutmos commented Aug 9, 2024

Thanks for the repro project. I was able to incorporate my additions to the Polling metric type in your repo and avoided having the MFA detached (specifically the detach_on_error: false option):

defmodule PromExIssue.CustomPromExPlugin do
   ...

    @impl true
    def polling_metrics(opts) do
      poll_rate = Keyword.get(opts, :poll_rate, @default_poll_rate)
      debug_agent = opts[:debug_agent]

    Polling.build(
      :custom_prom_ex_plugin_ping_metrics,
      poll_rate,
      {__MODULE__, :execute_ping_metrics, [debug_agent]},
      [
        last_value(
          [:custom, :prom_ex, :plugin, :metrics],
          event_name: @ping_event_name,
          measurement: :count,
          description: "Ping for debugging",
          tags: [:state]
        )
      ],
      detach_on_error: false
    )
  end

  ...
end

Should be cutting a release soon with these changes!

akoutmos added a commit that referenced this issue Aug 9, 2024
@akoutmos
Copy link
Owner

akoutmos commented Aug 9, 2024

Closing this ticket for now as a release will be cut soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants