Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Handle incorrect 404, delay making requests to new IDs #1470

Merged
merged 6 commits into from
Aug 23, 2024

Conversation

nan-li
Copy link
Contributor

@nan-li nan-li commented Aug 14, 2024

Description

One Line Summary

Handle incorrect 404 responses from the OneSignal's backend by adding a delay after creates, when new subscription IDs or OneSignal IDs can be created on the backend.

Details

See Android Implementation here

Backend Issue

OneSignal sometimes returns a 404 on GET and PATCH requests if you are accessing something immediately after it was created. Normally the OneSignal backend can accept fetch/updates right way, but it all depends on it's server load and replication process. There isn't a guaranteed amount of time a client can wait, so SDKs has to work around this problem.

SDK's work around strategy

Add a minimum delay after creating records (User or Subscription) before allowing any requests to fetch or update that specific record, based on the onesignalId or subscriptionId.

  • We picked 3 seconds as this "cool down" period. This is a best guess, we are balancing SDK responsiveness with the chance of running into this issue, as well as timing for Operation Repo's 5 second flush interval.
  • Also noting that backend team said "it looks like we can have peaks of up to 300ms for database replication lag. If you added 0.5s delay between each network request, that would likely resolve some of these issues"

Motivation

404/410 responses are being handled by recreating the missing record (User or Subscription), however this isn't always the right way to handle these, give the backend caveat noted above. Recreating these records can create side-effects and extra load we want to avoid.

Scope

Add a delay processing user-based or subscription-based requests.

Future Considerations

  • We want to refactor Operation Repo and Executors relationship down the line, so this PR is minimal to prevent the primary errors. The 3 second delay should be sufficient to address the overwhelming majority of incorrect 404 cases, especially as backend team mentioned a delay of 0.5 seconds should be sufficient.
  • As the 404/410 may still happen when the backend is under unusual load (a 3 second delay may not always be enough) we can also retry on 404/410 within a longer but still short window.
  • Calls that do not go through the Operation Repo and Executors could have this same logic applied.
    • Live Activities requests, Fetch IAM, and Outcomes requests.
  • The new parameter for the delay interval could be read from remote params, so it can be tweaked later.

Testing

Unit testing

🚧 New unit tests are WIP

Manual testing

Physical iPhone 13 on iOS 17.4

  • New install
  • New install with immediate login to new user
  • New install with immediate login to existing user

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
  • I have reviewed this PR myself, ensuring it meets each checklist item

This change is Reviewable

@nan-li nan-li force-pushed the improve/delay_updates_after_creates branch from fb762a5 to 03ab84b Compare August 14, 2024 18:02
@nan-li nan-li changed the title [Fix] Add a delay making requests to new subscription or user IDs [Fix] Handle incorrect 404, delay making requests to new IDs Aug 14, 2024
@nan-li nan-li changed the title [Fix] Handle incorrect 404, delay making requests to new IDs [Fix] Handle incorrect 404s, delay making requests to new IDs Aug 14, 2024
@nan-li nan-li changed the title [Fix] Handle incorrect 404s, delay making requests to new IDs [Fix] Handle incorrect 404, delay making requests to new IDs Aug 14, 2024
@emawby emawby self-requested a review August 14, 2024 22:35
@nan-li nan-li force-pushed the improve/delay_updates_after_creates branch from 03ab84b to 4390d1b Compare August 14, 2024 23:46
@emawby emawby self-assigned this Aug 15, 2024
nan-li added 4 commits August 16, 2024 07:51
* This rule `opening_brace` declares opening braces should be on the same line as the declaration
* However for multiline if / let statements, it makes it harder to read:

if let blah = blah,
   let foo = blah.foo {
    // do something, less readable
}

if let blah = blah,
   let foo = blah.foo
{
   // do something, more readable
}

* See linked PR for an upcoming option, so we can re-enable this rule
* This is a new behavior which waits a specific amount of time after something is created (such as a User or Subscription) before making a network call to get or update it.

* The main motivation is that the OneSignal's server may return a 404 if you attempt to GET or PATCH on something that was just created. This is due fact that OneSignal's backend server replication sometimes has a delay under load. This may be considered a bug in the backend, but the SDK has a responsibility to handle this case as well.

Additional Details:
* User Manager owns an instance of OSNewRecordsState
* Requests will check their records within the `prepareForExecution()` check
* Push subscription IDs and onesignal IDs can be added to the new records storage after they are created
* Requests can be delayed by a specific “cool down” period plus a small buffer - they are flushed after a delay when they need to wait for the "cool down" period to access a user or subscription after its creation.
* Anytime `prepareForExecution()` fails, the executor will put a delayed task to flush the requests again again after a specific delay.
* In CreateUser, new IDs will only be added to the new records storage if the Creates were truly Creates, and not “faux” Creates meant to get an onesignal ID for a given external ID. This latter case happens after a 409 Identify conflict, meaning the user definitely exists and the Onesignal ID will not be new.
* In IdentifyUser successful, the executor adds onesignal ID to new records again because an immediate fetch for hydration may not return the newly-applied external ID. We will encounter a problem with hydration in this case.
* User fetch will always be called after a delay unless it is to refresh the user state on a new session. This is because a fetch that is not on a new session is always for getting user data after a create or identify.
* Nits: Also moved some opening braces to new lines for improved readability of multi-line “if / let / guard” statements
* Executors init take in additional argument
* A failing test needs more time to complete
@nan-li nan-li force-pushed the improve/delay_updates_after_creates branch from 4390d1b to 80d940f Compare August 16, 2024 14:52
@nan-li nan-li force-pushed the improve/delay_updates_after_creates branch from 997e548 to c6db876 Compare August 16, 2024 21:45
nan-li added 2 commits August 16, 2024 14:48
* The canAccess method should really check a non-optional string
* Adjust the use of this method at the calling site
@nan-li nan-li force-pushed the improve/delay_updates_after_creates branch from c6db876 to 529a148 Compare August 16, 2024 21:48
@nan-li nan-li requested a review from emawby August 16, 2024 23:09
@nan-li
Copy link
Contributor Author

nan-li commented Aug 23, 2024

Badge test started failing this week, but can look at later.

It looks like the runner-image has changed, and the new tests are run on ios 18.1.

Previous passing runner image used:

Included Software: https://github.com/actions/runner-images/blob/macos-14/20240811.3/images/macos/macos-14-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/macos-14%2F20240811.3

Failing runner image used:

Included Software: https://github.com/actions/runner-images/blob/macos-14/20240818.2/images/macos/macos-14-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/macos-14%2F20240818.2

@nan-li nan-li merged commit eaf4855 into main Aug 23, 2024
3 of 5 checks passed
@nan-li nan-li deleted the improve/delay_updates_after_creates branch August 23, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants