-
-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guaranteed eventual consistency #3561
Comments
I'm thinking something similar. Consistency is very important. What if an important pinned post doesn't get federated? Or your response to my post? What happens if a moderator action to remove an illegal post doesn't get federated? What happens if Lemmy grows to 1% the size of Reddit, with some communities with 500,000 subscribers? There could be thousands of federated actions each second from one community. The solution needs to be robust enough to handle this. Outgoing messages should be stored in the database - Let's call this FederatedActionsQueue. For each remote instance, a record is kept which message was the last successfully sent one. - Let's call this the FederatedServersQueue. Every federation cycle, the instance runs through FederatedServersQueue and distributes actions to every subscribed server. The cycle length might start at one second, but could scale back during times of congestion. If a server fails to respond, there is only one worker waiting for the timeout instead of thousands. We could mark that server as unresponsive in FederatedServersQueue and do an exponential backoff, until it starts responding again. On the FederatedActionsQueue table, this is necessary with the current data model. However, if each action was stored in the main tables with a timestamp, we might be able to use the timestamp instead of the incrementing ID. I haven't though enough about this. |
Yeah, timestamps could work if the resolution is high enough that there are never two events with the exact same timestamp. Retrying per server and not per event should scale much better. |
It appears there is a pull request working on the issues brought up here. |
Yes, #3605 will (if bugless) make federation guaranteed reliable, up until the point where an instance is down longer than the clear-scheduled-task clears activities (currently 3 months). If it goes up again at some point, federation activity is fully replayed from the point where it went down. |
Requirements
Is your proposal related to a problem?
To avoid desyncs when user numbers keep climbing, guaranteed eventual consistency should be added.
Describe the solution you'd like.
An option to handle that would be to permanently store each outgoing message in a table, with an incrementing ID.
For each remote instance, a record is kept which message was the last successfully sent one.
For each instance that still has unreceived messages, periodically check whether the instance is reachable. If it is reachable, send messages starting from the oldest.
Describe alternatives you've considered.
I don't know the code well enough to give a serious recommendation/analysis. The one above was also more of an informed guess than anything else.
Additional context
No response
The text was updated successfully, but these errors were encountered: