Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORNode: stability of websocket connection and synchronization #3

Open
sim31 opened this issue Oct 2, 2024 · 2 comments
Open

ORNode: stability of websocket connection and synchronization #3

sim31 opened this issue Oct 2, 2024 · 2 comments
Labels
bug Something isn't working ornode

Comments

@sim31
Copy link
Owner

sim31 commented Oct 2, 2024

Infura drops idle websocket connections after a while. This is a problem for ornode, since for this use case there's sometimes no relevant events emitted for couple of days or even more. Already experienced this reliably with all ornode deployments.

A related issue here is what to do when ornode misses some events (because it crashes or connection was dropped). It should be informed of all events that happened while it weren't listening and it should process it.

@sim31 sim31 added bug Something isn't working ornode labels Oct 2, 2024
@sim31 sim31 changed the title OrNode: stability of websocket connection and synchronization ORNode: stability of websocket connection and synchronization Oct 2, 2024
@sim31
Copy link
Owner Author

sim31 commented Oct 2, 2024

Potential solution

  • Store a synchronization status document in db that would at least store block number of latest recorded event;
  • With every event received, update sync status doc with the block number of a block that generated the event;
  • Implement resilient websocket connection as suggested in this issue except make these changes
    • instead of general ping, do a query of a current block number (pong is the response). Update sync status with that block number.
    • if reconection does happen, call sync function (see below)
  • When launching ornode, call sync function
  • Sync function
    • Read sync status from db
    • Query event logs to receive all events since block number specified in sync status;
    • Feed these events to ornode one by one as if they are happening now;

The update to sync status should be in the same transaction as any other writes that happen while processing the event.

Risks

RPC API provider does not notify of some events;

This would not protect against that. But any protections using the same API are unlikely to be reliable in this case. You will have to rely on other means to detect that there's an issue in this case. But this would a pretty serious fault on behalf of API provider so it is unlikely to happen and not be noticeable.

There are gaps in the events processed

This sync function assumes that all events since sync status were missed. If that is not true for some reason (because of API provider failure or bug in the code), when this sync function might even mess up some records in DB or crash... The point is that whole ornode was coded with assumption that events happen once, and if that does not hold true, then there might be unexpected issues.

The gaps in the events processed are currently likely because of dropped websocket connections. But this solution presented here makes gaps in the events processed very unlikely. Assuming this solution is coded correctly, the only thing I can think of that would cause this would be fault in API provider.

In case gaps does happen, you would have to use custom solution (script?) to sync up or update the records in DB manually.

@sim31
Copy link
Owner Author

sim31 commented Oct 5, 2024

Made an initial fix that is a lot more simple than what's suggested in above comment. It's implemented in ornode-sync branch and now merged in main.

  • It implements manual syncing to enable fixing any missed events. It works by launching ornode with configuration that specifies sync object with block range to sync;
  • It implements pings and reconnections using this solution.

This is not foolproof:

  • If reconnection happens and some events were missed while the connection was dropped, it does not sync the missed events (although pings (connection checks) happen quite often and don't seem to cost infura credits);
  • The general issue that missed events are not detected automatically;
  • This manual syncing is prone to mistakes - you could enter wrong block range and there's no checking if events being retrieved were not already processed, potentially causing some documents to be replicated in DB.
  • When syncing all events are added as if they are happening now. Not sure if it might cause any issues (there might be an issue if for example event processing code reads current blockchain state and derives some information from that to store);

On the upside syncing is quite flexible in that you can fix gaps in processed events: You are able to not just sync from block N to now, but make it process events from any block range. Also exactly the same code is used to process events normally as when syncing, which avoids having to maintain two versions of event procesing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ornode
Projects
None yet
Development

No branches or pull requests

1 participant