Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clock Producer received a too early ping event, rescheduling #439

Closed
jean-smaug opened this issue Jun 15, 2020 · 24 comments
Closed

Clock Producer received a too early ping event, rescheduling #439

jean-smaug opened this issue Jun 15, 2020 · 24 comments
Assignees

Comments

@jean-smaug
Copy link
Contributor

jean-smaug commented Jun 15, 2020

Hello 👋

I created a Telegram bot.
One of the feature is sending a message everyday.

But sometimes, there is a problème with Quantum and it's blocking the other modules of the app. So the CRON is working but all the slash commands are broken.

When I inspect logs with journalctl -ef, I get these messages :

Jun 15 11:14:45 raspberrypi my_app[446]: 11:14:45.992 [warn]  [:my_app@raspberrypi][Elixir.Quantum.ClockBroadcaster] Clock Producer received a too early ping event, rescheduling
Jun 15 11:14:46 raspberrypi my_app[446]: 11:14:46.997 [warn]  [:my_app@raspberrypi][Elixir.Quantum.ClockBroadcaster] Clock Producer received a too early ping event, rescheduling
Jun 15 11:14:47 raspberrypi my_app[446]: 11:14:46.995 [warn]  [:my_app@raspberrypi][Elixir.Quantum.ClockBroadcaster] Clock Producer received a too early ping event, rescheduling

Here is my config

config :my_app, MyApp.Scheduler,
  timezone: "Europe/Paris",
  jobs: [
    {"2 10 * * 1-5", {MyApp, :send_msg, []}},
  ]

My Scheduler

defmodule MyApp.Scheduler do
  use Quantum, otp_app: :my_app
end

Thx.

@maennchen
Copy link
Member

@jean-smaug What version of quantum are you using?

@jean-smaug
Copy link
Contributor Author

I'm using the RC version.

I didn't notice that you released the 3.0.
Congrats and thx.

I will upgrade, I think we can close the issue for now and if the problem happen again I'll re-open this one.

@maennchen
Copy link
Member

@jean-smaug I don't think that this will resolve it. There are no relevant changes since rc.

Are you sure that this does block the execution for real? When examining the code I can see that the execution should continue normally. This is only a warning.

As for the warning itself: It would make sense to switch to use abs and monotonic time. This should prevent it from sending the message too early. I'll see when I can come up with a PR.

@silvadanilo
Copy link

in my case it happens, in a dev environment, when I have the PC in suspension at the moment when the job should have been executed, and at the "wake up" I get flooded with this warning.

@kotsius
Copy link

kotsius commented Jun 19, 2020

Elixir 1.10.3, Quantum 3.0.1, Windows 10 here. The most frequently scheduled event occurs every minute, yet these warnings are constant:

sCapture-082

Had no issues with Quantum v2.x.

@maennchen
Copy link
Member

@kotsius Do you basically receive this every second or do those messages trickle in over time?

@kotsius
Copy link

kotsius commented Jun 19, 2020

@kotsius Do you basically receive this every second or do those messages trickle in over time?

I seem to be getting one every few seconds. The interval varies widely (say from 1″ to 40″).

@maennchen
Copy link
Member

@kotsius Ok, then the send_after seems to be imprecise and should be converted to use abs.

Does this block anything or does the scheduler continue normal for you?

@kotsius
Copy link

kotsius commented Jun 19, 2020

Thanks. I can locate send_after/4 under Process, but not anywhere in the Quantum docs. Is this something that requires tweaking on my side?

The scheduler seems to be working fine.

@maennchen
Copy link
Member

@kotsius Ah no, this is something that quantum needs to solve.

Quantum calculated the diff between the next second to trigger and now and then schedules a message with Process.send_after/3 here: https://github.com/quantum-elixir/quantum-core/blob/master/lib/quantum/clock_broadcaster.ex#L136

To make it more reliable the next time should be converted to erlang monotonic time and then supplied to send_after as abs.
https://hexdocs.pm/elixir/Process.html#send_after/4-options

@kotsius
Copy link

kotsius commented Jun 19, 2020

Good to know, thanks again.

@ijunaid8989
Copy link

do anyone face any issues in the whole application due to this?

our application's HTTP requests got so slow due to this.

@maennchen
Copy link
Member

@ijunaid8989 As i mentioned before this warning should not cause a slow down on anything.

Do you get thousands of messages a second or do you sometimes get it?

If it is the latter one, then the only thing that will really change after this is fixed, is that you’re no longer going to see a warning.

If you get thousands of messages (some kind of infinite loop) then I would be really interested in a reproduction repo and debug logs.

@ghost
Copy link

ghost commented Jul 4, 2020

I'm seeing the same issue as @kotsius in the same interval on Windows (but haven't seen yet on Linux). Is there any specific reason why this even needs to be a warning? Clock drifts can happen and shouldn't be anything that needs to be warned about. I would think this log line should be degraded to debug.

@maennchen WDYT?

@janpieper
Copy link

I have the same issue with quantum 3.0.1 running in a docker container (elixir:1.10-slim). I get the message once per second.

I am able to reproduce this error by keeping a phoenix application running (mix phx.server) while suspending my computer and starting it again.

@johannesE
Copy link

I'm also only getting it on my development machine (Ubuntu) after suspending it. A reboot of the application fixes the issue for me. Is this something I should be worried about?

@maennchen
Copy link
Member

As I said before: This is not something anyone should ve worried about.

I‘ll push a fix that should prevent it from happening in the future.

If that warning is logged, that only means that the event was triggered too early and that the clock producer will wait again until it’s the right time.

@maennchen
Copy link
Member

The problem seems to lie deeper than expected. The Process.send_after/3 seems to trigger too early.

I only get the problem to go away if i don't try to trigger as closely to the start of the second as possible but add a few milliseconds.

next_event_abs =
  %{time | microsecond: {0, 0}}
  |> NaiveDateTime.add(1, :second)
  |> NaiveDateTime.diff(now, :native)
  |> Kernel.max(0)
  |> IO.inspect(label: "diff")
  |> Kernel.+(System.monotonic_time())
  |> Kernel.+(:erlang.convert_time_unit(10, :millisecond, :native)) # Round Up
  |> :erlang.convert_time_unit(:native, :millisecond)
  |> IO.inspect(label: "abs")

Process.send_after(self(), :ping, next_event_abs, abs: true)

I'd like to get the inputs from the people in this issue (@jean-smaug, @kotsius, @ijunaid8989, @CharlotteDunois, @janpieper, @johannesE):

If a job is triggered every second, which result would you expect?

  1. Triggered at start of second (or as close as possible): ~N[2020-08-04 09:59:15.000057]
  2. Triggered at middle of second (or as close as possible): ~N[2020-08-04 09:59:15.500032]
  3. Triggered at end of second (or as close as possible): ~N[2020-08-04 09:59:15.999967]
  4. Triggered at any point in that second (would also mean that the jobs don't have to be roughly a second apart but could also be faster / slower.

Both options 1 & 3 are hard since Process.send_after/4 seems to be too imprecise.

Options 2 & 4 should be easy since we don't have to hit the mark very closely.

@maennchen
Copy link
Member

I personally think that option 4 is the best. #449 shows how this could be improved for that scenario.

I'd like to get some feedback and you can also check out that branch and see if it works how you'd expect.

@kotsius
Copy link

kotsius commented Aug 5, 2020

Thanks for looking into this, Jonatan. Responding to your call for feedback, I would say that my intuitive expectation would be option 1. I suppose, however, that the constancy between any two jobs (expected to be equal to one second as per your question) would be more important than the position of each one within the window of every second.

@maennchen
Copy link
Member

@kotsius

however, that the constancy between any two jobs (expected to be equal to one second as per your question) would be more important

I completely agree on that.

I myself would lean towards option 4 while trying to still space the seconds apart the same more or less.

@maennchen
Copy link
Member

Released as v3.0.2

@maennchen maennchen self-assigned this Aug 18, 2020
@jean-smaug
Copy link
Contributor Author

@maennchen thanks for your work !

@kotsius
Copy link

kotsius commented Aug 25, 2020

No more command line pollution, thanks Jonatan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants