Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Report scheduler assumes timers can never fire early #28929

Closed
bzbarsky-apple opened this issue Aug 28, 2023 · 5 comments
Closed

[BUG] Report scheduler assumes timers can never fire early #28929

bzbarsky-apple opened this issue Aug 28, 2023 · 5 comments
Assignees
Labels
bug Something isn't working cert blocker darwin needs triage spec Mismatch between spec and implementation v1.2

Comments

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Aug 28, 2023

Reproduction steps

Per #28434 (comment) the report scheduler assumes timers never fire even a little bit early. This is not actually guaranteed by platform timers for all our platforms.

The report scheduler needs to be robust to timers firing a tiny bit early, just like the previous ReadHandler setup was. It can't miss its max-interval deadlines when this happens, which is what #28434 describes as happening.

Bug prevalence

Sometimes (e.g. on Darwin whenever there is a backwards NTP adjustment to wall-clock time).

GitHub hash of the SDK that was being used

abe94e7

Platform

core

Platform Version(s)

No response

Anything else?

No response

@lpbeliveau-silabs
Copy link
Contributor

@bzbarsky-apple Would adding a jitter period to compare against say (now >= mMaxTimestamp - JITTER) be considered robust? Otherwise we could leverage the ReadHandlerNodeFlags::CanBeSynced flag (or another one made specifically for that) to make it so that if the timer is fired the report scheduler would find it reportable.

@bzbarsky-apple
Copy link
Contributor Author

I would not consider jitter robust, no.

It seems like when the timer fires:

  1. For per-node timers, we should reschedule the timer for that node as needed.
  2. For across-node timers we should reschedule based on when the next thing needs to happen.

I'm actually pretty surprised this is not happening already.

@lpbeliveau-silabs
Copy link
Contributor

lpbeliveau-silabs commented Aug 29, 2023

The current behaviour (that is based on the previous one) has the reschedule happen once a report gets emitted.

My understanding is now on certain systems this timer fires early -> ScheduleRun -> Engine::Run() checks IsReportableNow for each timer and since the timer fired early the ReadHandler is not seen as reportable.

Because of that, we enter a state where the Scheduler will not start another timer because the ReadHandler never emitted a report.

We could start the timer as soon as the timer is fired but that would not solve the problem as this would only cause another run to be scheduled after another max interval expires, which would still cause subscription to be dropped.

For per-node timers, we could use the a flag like the EngineRunScheduled flag on the IsReportableNow condition as such:
if( ( (now >= minIntreval) && ( IsDirdty || now >= maxInterval ...) ) || IsEngineRunScheduled() )

That way if a timer expired and caused an Engine::Run() to be scheduled, we will have a guarantee a report will be emitted, which will ensure another one will be scheduled when OnSubscriptionReportSent() is called.

Does this approach makes sense to you?

@bzbarsky-apple
Copy link
Contributor Author

@lpbeliveau-silabs Yes, that makes sense, I think. It's hard to tell about all the edge cases, but seems plausible....

@github-project-automation github-project-automation bot moved this from Todo to Done in [Platform] Darwin Oct 2, 2023
@github-project-automation github-project-automation bot moved this from Open Cert Blockers to Complete in [Certification] Blockers Oct 2, 2023
@lpbeliveau-silabs
Copy link
Contributor

#28949 fixes this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cert blocker darwin needs triage spec Mismatch between spec and implementation v1.2
Projects
Archived in project
Archived in project
Development

No branches or pull requests

2 participants